A Contextual Chatbot for Polytechnic Education Using Synthetic Data Generation and LoRA Fine-Tuning
Keywords:
Contextual Chatbot, Synthetic Data, LoRA, Large Language Model, TVET EducationAbstract
This study presents the development of a contextual chatbot designed to support Technical and Vocational Education and Training (TVET) in polytechnic environments. The proposed system integrates synthetic dataset generation and parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) to enhance domain-specific response accuracy. A total of 10,000 question–answer pairs were generated through prompt engineering based on TECC 4.0 and Maker Market themes, followed by validation to ensure data quality. The chatbot architecture adopts a Retrieval-Augmented Generation (RAG) approach, enabling the system to provide context-aware responses by combining semantic search with a fine-tuned Large Language Models (LLMs). The model was evaluated using benchmark testing on a separate dataset, with performance measured using accuracy and F1 score. Experimental results show that the fine-tuned LoRA model achieved an accuracy of 85%, compared to 60% for the baseline model, representing an improvement of approximately 25%. These findings demonstrate that the integration of synthetic data and LoRA fine-tuning significantly improves chatbot performance while reducing computational cost. This study highlights the potential of scalable and cost-efficient AI solutions to support digital learning in TVET institutions. Future work may focus on integrating real student interaction data and multilingual capabilities to further enhance contextual understanding and usability.

