"MIND INTERFACES team on Fine-Tuning 24-hours Challenge Hackathon"

Team Idea

🤖 ELIZA based EVOL Instruct (Weizenbaum Transformer) 🤖
BrightCoder
ABDELILAH AKHMIM
BrightCoder

Zartashia
Zartashia Afzal
Zartashia

mindinterfaces56
MIND INTERFACES
mindinterfaces56

Life Science Research (AI)

evolvedcivilian622
Don Duval
evolvedcivilian622

Hacker // Hustler

juuce360
Juushy Juush
juuce360

solo dev

devonte_rogers629
Devonte Rogers
devonte_rogers629

Developer

Submission

ELIZA EVOL INSTRUCT - Fine-Tuning

ELIZA EVOL INSTRUCT - Fine-Tuning

We attempted to instill the deterministic, rule-based reasoning found in ELIZA into a more advanced, probabilistic model like an LLM. This serves a dual purpose: To introduce a controlled variable in the form of ELIZA's deterministic logic into the more "fuzzy" neural network-based systems. To create a synthetic dataset that can be used for various Natural Language Processing (NLP) tasks, beyond fine-tuning the LLM. [ https://huggingface.co/datasets/MIND-INTERFACES/ELIZA-EVOL-INSTRUCT ] [ https://www.kaggle.com/code/wjburns/pippa-filter/ ] ELIZA Implementation: We implemented the script meticulously retaining its original transformational grammar and keyword matching techniques. Synthetic Data Generation: ELIZA then generated dialogues based on a seed dataset. These dialogues simulated both sides of a conversation and were structured to include the reasoning steps ELIZA took to arrive at its responses. Fine-tuning: This synthetic dataset was then used to fine-tune the LLM. The LLM learned not just the structure of human-like responses but also the deterministic logic that went into crafting those responses. Validation: We subjected the fine-tuned LLM to a series of tests to ensure it had successfully integrated ELIZA's deterministic logic while retaining its ability to generate human-like text. Challenges Dataset Imbalance: During the process, we encountered issues related to data imbalance. Certain ELIZA responses occurred more frequently in the synthetic dataset, risking undue bias. We managed this through rigorous data preprocessing. Complexity Management: Handling two very different types of language models—rule-based and neural network-based—posed its unique set of challenges. Significance This project offers insights into how the strength of classic models like ELIZA can be combined with modern neural network-based systems to produce a model that is both logically rigorous and contextually aware.