Project Eval

replit
application badge
Created by team chAI on December 24, 2022

Eval aims to address the problem of subjectively evaluating test answers. Traditionally, this task has been carried out manually by human graders, which can be time-consuming and prone to bias. To address this issue, the project utilizes Cohere powered APIs to automate the evaluation process. The use of Cohere APIs allows for the integration of advanced natural language processing techniques, enabling the system to accurately understand and analyze the content of test answers. The custom model built upon these APIs then scores the answers based on suitable metrics, which can be tailored to the specific requirements of the test or assessment. One potential application of this technology is in the field of education, where it could be used to grade assignments or exams in a more efficient and unbiased manner. It could also be utilized in professional settings for evaluating job applications or performance evaluations. In addition to increasing efficiency and reducing bias, the use of automated evaluation techniques has the potential to provide more consistent and reliable scoring. This can help to ensure that test-takers receive fair and accurate assessments of their knowledge and skills. The model for the same was evaluated based on 4 major metrics: - Semantic Search: this is the primary scoring strategy of Eval. It is used to semantically understand the answer given and evaluate based on content rather than simply scoring based on textual similarities. Cohere Embed was used to generate embeddings for 5 suggested answers for the question and the answer to be checked. Then we find the distance from the nearest neighbor out of the 5 suggestions and the answer. This distance is used to grade the answer. - Duplication Check: partially correct answers with duplication of text tended to get higher similarity scores compared to the ones without duplication. To stop students from using this exploit to gain extra marks, a duplication checker was implemented based on Jaccard-Similarity between sentences within the answer. - Grammar Check: this strategy aims to check the grammar of the answer and assign a score based on the number of grammatical errors. We used Cohere Generate endpoint to generate a grammatically correct version of the answer, then check for cosine similarity of the generated version with original version to check if the original version was grammatically correct. - Toxicity Check: this aims to detect for toxic content in the answer and penalize an answer if it is toxic. We trained a custom classification model on Cohere using the Social Media Toxicity Dataset by SurgeAI which gave a 98% precision on the test split. We also implemented a Custom Checks which allows users to give different weights to each of the three different metrics based on how important they are for the evaluation of the answer. This allows for a more personalized evaluation of the answer. We built our custom model into a Flask-based REST API server deployed on Replit to streamline usage and allow people to access the full-functionality of the model. We also built a highly interactive UI that allows for users to easily interact with the API and evaluate their answers as well as submit questions.

Category tags:

Explore more applications
Streamlit
application badge

sdffasdfas ds df asdf sd d d

sdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d dsdffasdfas ds df asdf sd d d

sdfasdfasdf

BabyAGI
Streamlit
application badge

Google Vertex AI Hacka

Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon Google Vertex AI Hackathon

Google Vertex AI Hackathon

BabyAGI
Streamlit
application badge

fsadfasdf asdf

asd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd fasd fasd fasdf asd fasd fasd f

test team

BabyAGI
replit
application badge

xcvvbsdffgedf

DSFASDF ASDF sd fasd asdfasdf asdf asdf asdf asd fasd fasdfdfdsafasdf asdfasdf asdfasd asdfadsf ddasd asd asd ad as DSFASDF ASDF sd fasd asdfasdf asdf asdf asdf asd fasd fasdfdfdsafasdf asdfasdf asdfasd asdfadsf ddasd asd asd ad asDSFASDF ASDF sd fasd asdfasdf asdf asdf asdf asd fasd fasdfdfdsafasdf asdfasdf asdfasd asdfadsf ddasd asd asd ad asDSFASDF ASDF sd fasd asdfasdf asdf asdf asdf asd fasd fasdfdfdsafasdf asdfasdf asdfasd asdfadsf ddasd asd asd ad asDSFASDF ASDF sd fasd asdfasdf asdf asdf asdf asd fasd fasdfdfdsafasdf asdfasdf asdfasd asdfadsf ddasd asd asd ad asDSFASDF ASDF sd fasd asdfasdf asdf asdf asdf asd fasd fasdfdfdsafasdf asdfasdf asdfasd asdfadsf ddasd asd asd ad as

wdGFASDFFGA

OpenAI

Lolllll

gfdgdfgfdgdf gfd gdfg dfg dfg dfg dfgfd g dfg fdg dfg df

testingoo musi

GPT-3.5