December 2024

Cognitive Memory Recognition Task

About Project

This groundbreaking project benchmarks human cognitive performance against advanced language models. Designed to evaluate memory retention and recall, the task pushes the boundaries of AI by comparing human performance with state-of-the-art models.

Technologies used

Utilizing Python as the core language, the project harnessed advanced machine learning libraries, Transformers, and prompt engineering techniques. Integration with models like Google’s FLAN-T5 Large enabled rigorous testing across five cognitive tasks.

My Contribution

I devised task-specific datasets and refined contextual prompts. My work optimized the FLAN-T5 Large model's performance, achieving a 96% accuracy in free recall and a 70% accuracy in cued recall—demonstrating a 2.2x improvement over human benchmarks in cued recall. Additionally, I conducted a detailed analysis of performance gaps between human cognition and LLMs across all five memory tasks, providing key insights into areas where models underperform (such as single-item and associative recognition) and where they excel. These contributions not only advanced our understanding of LLM memory capabilities but also set the stage for future work involving larger language models, fine-tuning strategies, and enhanced prompt engineering techniques.