Eval aims to address the problem of subjectively evaluating test answers. Traditionally, this task has been carried out manually by human graders, which can be time-consuming and prone to bias. To address this issue, the project utilizes Cohere powered APIs to automate the evaluation process. The use of Cohere APIs allows for the integration of advanced natural language processing techniques, enabling the system to accurately understand and analyze the content of test answers. The custom model built upon these APIs then scores the answers based on suitable metrics, which can be tailored to the specific requirements of the test or assessment. One potential application of this technology is in the field of education, where it could be used to grade assignments or exams in a more efficient and unbiased manner. It could also be utilized in professional settings for evaluating job applications or performance evaluations. In addition to increasing efficiency and reducing bias, the use of automated evaluation techniques has the potential to provide more consistent and reliable scoring. This can help to ensure that test-takers receive fair and accurate assessments of their knowledge and skills. The model for the same was evaluated based on 4 major metrics: - Semantic Search: this is the primary scoring strategy of Eval. It is used to semantically understand the answer given and evaluate based on content rather than simply scoring based on textual similarities. Cohere Embed was used to generate embeddings for 5 suggested answers for the question and the answer to be checked. Then we find the distance from the nearest neighbor out of the 5 suggestions and the answer. This distance is used to grade the answer. - Duplication Check: partially correct answers with duplication of text tended to get higher similarity scores compared to the ones without duplication. To stop students from using this exploit to gain extra marks, a duplication checker was implemented based on Jaccard-Similarity between sentences within the answer. - Grammar Check: this strategy aims to check the grammar of the answer and assign a score based on the number of grammatical errors. We used Cohere Generate endpoint to generate a grammatically correct version of the answer, then check for cosine similarity of the generated version with original version to check if the original version was grammatically correct. - Toxicity Check: this aims to detect for toxic content in the answer and penalize an answer if it is toxic. We trained a custom classification model on Cohere using the Social Media Toxicity Dataset by SurgeAI which gave a 98% precision on the test split. We also implemented a Custom Checks which allows users to give different weights to each of the three different metrics based on how important they are for the evaluation of the answer. This allows for a more personalized evaluation of the answer. We built our custom model into a Flask-based REST API server deployed on Replit to streamline usage and allow people to access the full-functionality of the model. We also built a highly interactive UI that allows for users to easily interact with the API and evaluate their answers as well as submit questions.
๐๏ธ This will be a 48-Hour Virtual Hackathon from 2 - 4 December ๐ป Technology: You will build applications with generative AI supplied by Cohere โ๏ธ Level: All levels are welcome ๐ For whom?: Builders, creators & innovators! ๐ฒ The Event is totally free! ๐ $2000 cash prize pool!
๐๏ธ This will be a 7-day virtual hackathon from 16-23 December ๐ป Build AI application with the latest large language model-powered technology by Cohere ๐ก Get the chance to work with the best AI professionals in the industry and learn from them โ๏ธEntry level = 0. Youโve just started with AI? Are you an experienced Data Scientist? Or maybe you are a Designer or Business Developer? Join us! We need your domain knowledge! ๐ฑโ๐ป Register now and let's get started! Itโs free!