Optimizing Contextual Chatbots via Synthetic Data Fine-Tuning and Selective Grid Search: Explorations from an LLM Competition
Keywords:
Large Language Model, Fine-tuning, Synthetic Data, Selective Grid Search, AI CompetitionAbstract
This paper explores the optimization of contextual chatbots through a strategic combination of synthetic data fine-tuning and selective hyperparameter tuning. Developed within the competition of the POLYCC LLM League 2025, the paper addresses the challenge of enhancing lower-tier Large Language Models (LLMs) under stringent architectural and computational constraints. The proposed methodology integrates a three-layer pipeline: (1) multi-model synthetic data generation, (2) Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning, and (3) a multi-stage competition evaluation. Moving beyond exhaustive search methods, a selective grid search strategy was implemented to identify the optimal balance between performance gains and training overhead. Utilizing AWS SageMaker, the model was rigorously evaluated through an automated qualification phase followed by a multi-dimensional final assessment involving AI metrics, expert validation, and audience sentiment. Our findings reveal that data quality and targeted LoRA parameter selection (r and α) yield superior performance compared to simply increasing dataset volume. The resulting model demonstrated significantly improved contextual grounding and generalization, ultimately securing the highest overall ranking (1st Place) among the competition finalists. These results provide a strategic roadmap for deploying high-performance LLMs in resource-constrained and applied domain-specific environments.

