Optimizing Artificial Intelligence Chatbots: A Study on the Overfitting Pitfalls in Fine-Tuning Large Language Models for Specialized Tasks
Keywords:
Large Language Models, Artificial Intelligence, Chatbots, Fine-Tuning, OverfittingAbstract
This paper presented an empirical investigation into the hyperparameter optimization of the Meta LLaMA 3.2 3B Instruct model, conducted during the POLYCC LLM League 2025 competition. The study utilized a parameter-efficient fine-tuning approach via AWS SageMaker Jumpstart to adapt the artificial intelligence for specialized conversational tasks. The research demonstrated a critical disconnect between traditional computer science metrics specifically training and evaluation loss and the actual conversational success rate, measured as Win Rate (WR). Experimental data revealed that minimizing training loss to near-zero on small datasets (under 500 examples) induced catastrophic overfitting, severely degrading the chatbot's real-world performance to as low as 10%. The optimal configuration was identified at a moderate dataset scale of 1000 examples trained for 20 epochs, achieving a peak Win Rate of 36%.

