Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.

The adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about...

Deskribapen osoa

Gorde:

Xehetasun bibliografikoak
Egile nagusia:	Jiménez Merino, Edy Francisco (author)
Formatua:	bachelorThesis
Hizkuntza:	spa
Argitaratua:	2024
Gaiak:	MODELO QA DISTILBERT DATASET SQUAD1.0 CRISP-ML(Q) ROUGE
Sarrera elektronikoa:	https://dspace.unl.edu.ec/jspui/handle/123456789/30620
Etiketak:	Etiketa erantsi Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!

_version_	1857832993902559232
author	Jiménez Merino, Edy Francisco
author_facet	Jiménez Merino, Edy Francisco
author_role	author
collection	Repositorio Universidad Nacional de Loja
dc.contributor.none.fl_str_mv	Cumbicus Pineda, Oscar Miguel
dc.creator.none.fl_str_mv	Jiménez Merino, Edy Francisco
dc.date.none.fl_str_mv	2024-09-21T01:17:20Z 2024-09-21T01:17:20Z 2024-09-20
dc.format.none.fl_str_mv	103 P. application/pdf
dc.identifier.none.fl_str_mv	https://dspace.unl.edu.ec/jspui/handle/123456789/30620
dc.language.none.fl_str_mv	spa
dc.publisher.none.fl_str_mv	Universidad Nacional de Loja
dc.rights.none.fl_str_mv	http://creativecommons.org/licenses/by-nc-sa/3.0/ec/ info:eu-repo/semantics/openAccess
dc.source.none.fl_str_mv	reponame:Repositorio Universidad Nacional de Loja instname:Universidad Nacional de Loja instacron:UNL
dc.subject.none.fl_str_mv	MODELO QA DISTILBERT DATASET SQUAD1.0 CRISP-ML(Q) ROUGE
dc.title.none.fl_str_mv	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. QA model based on DistilBERT to answer questions about content extracted from academic assignments of the Computer Science course at UNL
dc.type.none.fl_str_mv	info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/bachelorThesis
description	The adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about the content extracted from academic tasks of the Computer Science Department of the National University of Loja. To develop this work, the CRISP-ML (Q) methodology was used as a reference framework, making use of its first four phases, in which the following was done: a compilation of 30 academic tasks obtained from 6 different subjects, from which 80 questions about their content were generated through crowdsourcing, which served as a basis for creating a dataset in SQuAD1.0 format with 1410 data, of which 800 were generated through paraphrasing and the Few-shot learning approach, and the remaining 610 with the direct contribution of the author. This dataset was divided into 90% for training (train) and 10% for evaluation (test), with an additional subdivision of the train set (75% for train and 25% for validation). Having the data prepared, DistilBERT hyperparameters were adjusted to train four different models using TensorFlow on the Google Colab platform with the GPU T4 runtime environment, selecting the best model based on its level of response extraction and F1-score. Once the QA model was chosen, an evaluation was performed using the ROUGE metric, including A/B testing. The QA model was deployed in Hugging Face and achieved an accuracy of 86.93% during its training with 51 epochs, a learning rate of 1 -5, and a batch size of 32, which through evaluation achieved a maximum F-measure in ROUGE-L of 60.96. These values demonstrate the importance of applying Fine-Tuning in the development of QA models for specific contexts. Keywords: QA model, DistilBERT, SQuAD 1.0 dataset, CRISP-ML(Q), ROUGE
eu_rights_str_mv	openAccess
format	bachelorThesis
id	UNL_e7284cb9ccfe50d218becafd1059c3ec
instacron_str	UNL
institution	UNL
instname_str	Universidad Nacional de Loja
language	spa
network_acronym_str	UNL
network_name_str	Repositorio Universidad Nacional de Loja
oai_identifier_str	oai:dspace.unl.edu.ec:123456789/30620
publishDate	2024
publisher.none.fl_str_mv	Universidad Nacional de Loja
reponame_str	Repositorio Universidad Nacional de Loja
repository.mail.fl_str_mv	*
repository.name.fl_str_mv	Repositorio Universidad Nacional de Loja - Universidad Nacional de Loja
repository_id_str	0
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/3.0/ec/
spelling	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.QA model based on DistilBERT to answer questions about content extracted from academic assignments of the Computer Science course at UNLJiménez Merino, Edy FranciscoMODELO QADISTILBERTDATASET SQUAD1.0CRISP-ML(Q)ROUGEThe adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about the content extracted from academic tasks of the Computer Science Department of the National University of Loja. To develop this work, the CRISP-ML (Q) methodology was used as a reference framework, making use of its first four phases, in which the following was done: a compilation of 30 academic tasks obtained from 6 different subjects, from which 80 questions about their content were generated through crowdsourcing, which served as a basis for creating a dataset in SQuAD1.0 format with 1410 data, of which 800 were generated through paraphrasing and the Few-shot learning approach, and the remaining 610 with the direct contribution of the author. This dataset was divided into 90% for training (train) and 10% for evaluation (test), with an additional subdivision of the train set (75% for train and 25% for validation). Having the data prepared, DistilBERT hyperparameters were adjusted to train four different models using TensorFlow on the Google Colab platform with the GPU T4 runtime environment, selecting the best model based on its level of response extraction and F1-score. Once the QA model was chosen, an evaluation was performed using the ROUGE metric, including A/B testing. The QA model was deployed in Hugging Face and achieved an accuracy of 86.93% during its training with 51 epochs, a learning rate of 1 -5, and a batch size of 32, which through evaluation achieved a maximum F-measure in ROUGE-L of 60.96. These values demonstrate the importance of applying Fine-Tuning in the development of QA models for specific contexts. Keywords: QA model, DistilBERT, SQuAD 1.0 dataset, CRISP-ML(Q), ROUGELa adaptación de modelos pre-entrenados Question Answering (QA) es una tarea esencial para que estos puedan ser implementados en diferentes escenarios. El objetivo de esta investigación es obtener el valor de la métrica ROUGE al aplicar la técnica Fine-Tuning sobre el modelo DistilBERT para dar respuesta a preguntas sobre el contenido extraído de tareas académicas de la Carrera de Computación de la Universidad Nacional de Loja. Para desarrollar este trabajo se usó la metodología CRISP-ML(Q) como marco de referencia, haciendo uso de sus cuatro primeras fases, en las que se realizó: una recopilación de 30 tareas académicas obtenidas de 6 materias diferentes, de las que se generó 80 preguntas sobre su contenido a través de crowdsourcing, las cuales sirvieron como base para crear un dataset en formato SQuAD1.0 con 1410 datos, de los cuales 800 se generaron mediante paráfrasis y el enfoque Few-shot learning, y los 610 restantes con el aporte directo del autor, este dataset se dividió en 90% para entrenamiento (train) y 10% para evaluación (test), con una subdivisión adicional del conjunto train (75% train y 25% validation), teniendo los datos preparados se ajustó hiperparámetros de DistilBERT para entrenar cuatro modelos diferentes usando TensorFlow en la plataforma Google Colab con el entorno de ejecución GPU T4, seleccionando el mejor modelo en base a su nivel de extracción de respuestas y F1-score. Una vez elegido el modelo QA, se realizó una evaluación mediante la métrica ROUGE incluida una prueba A/B testing. El modelo QA se desplegó en Hugging Face y logró una precisión de 86,93% durante su entrenamiento con 51 épocas, learning_rate de 1e^(-5) y batch_size de 32, el cual mediante la evaluación logró un F-measure máximo en ROUGE-L de 60,96. Estos valores demuestran la importancia de aplicar el Fine-Tuning en el desarrollo de modelos QA para contextos específicos. Palabras Clave: modelo QA, DistilBERT, dataset SQuAD1.0, CRISP-ML(Q), ROUGEUniversidad Nacional de LojaCumbicus Pineda, Oscar Miguel2024-09-21T01:17:20Z2024-09-21T01:17:20Z2024-09-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesis103 P.application/pdfhttps://dspace.unl.edu.ec/jspui/handle/123456789/30620spahttp://creativecommons.org/licenses/by-nc-sa/3.0/ec/info:eu-repo/semantics/openAccessreponame:Repositorio Universidad Nacional de Lojainstname:Universidad Nacional de Lojainstacron:UNL2025-05-02T14:28:51Zoai:dspace.unl.edu.ec:123456789/30620Institucionalhttps://dspace.unl.edu.ec/Universidad públicahttps://unl.edu.ec/https://dspace.unl.edu.ec/oaiEcuador**opendoar:02025-05-02T14:28:51falseInstitucionalhttps://dspace.unl.edu.ec/Universidad públicahttps://unl.edu.ec/https://dspace.unl.edu.ec/oaiEcuador***opendoar:02025-05-02T14:28:51Repositorio Universidad Nacional de Loja - Universidad Nacional de Lojafalse
spellingShingle	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. Jiménez Merino, Edy Francisco MODELO QA DISTILBERT DATASET SQUAD1.0 CRISP-ML(Q) ROUGE
status_str	publishedVersion
title	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_full	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_fullStr	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_full_unstemmed	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_short	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_sort	Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
topic	MODELO QA DISTILBERT DATASET SQUAD1.0 CRISP-ML(Q) ROUGE
url	https://dspace.unl.edu.ec/jspui/handle/123456789/30620

Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.

Antzeko izenburuak