Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.

The adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about...

Deskribapen osoa

Gorde:
Xehetasun bibliografikoak
Egile nagusia: Jiménez Merino, Edy Francisco (author)
Formatua: bachelorThesis
Hizkuntza:spa
Argitaratua: 2024
Gaiak:
Sarrera elektronikoa:https://dspace.unl.edu.ec/jspui/handle/123456789/30620
Etiketak: Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!
_version_ 1857832993902559232
author Jiménez Merino, Edy Francisco
author_facet Jiménez Merino, Edy Francisco
author_role author
collection Repositorio Universidad Nacional de Loja
dc.contributor.none.fl_str_mv Cumbicus Pineda, Oscar Miguel
dc.creator.none.fl_str_mv Jiménez Merino, Edy Francisco
dc.date.none.fl_str_mv 2024-09-21T01:17:20Z
2024-09-21T01:17:20Z
2024-09-20
dc.format.none.fl_str_mv 103 P.
application/pdf
dc.identifier.none.fl_str_mv https://dspace.unl.edu.ec/jspui/handle/123456789/30620
dc.language.none.fl_str_mv spa
dc.publisher.none.fl_str_mv Universidad Nacional de Loja
dc.rights.none.fl_str_mv http://creativecommons.org/licenses/by-nc-sa/3.0/ec/
info:eu-repo/semantics/openAccess
dc.source.none.fl_str_mv reponame:Repositorio Universidad Nacional de Loja
instname:Universidad Nacional de Loja
instacron:UNL
dc.subject.none.fl_str_mv MODELO QA
DISTILBERT
DATASET SQUAD1.0
CRISP-ML(Q)
ROUGE
dc.title.none.fl_str_mv Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
QA model based on DistilBERT to answer questions about content extracted from academic assignments of the Computer Science course at UNL
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/bachelorThesis
description The adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about the content extracted from academic tasks of the Computer Science Department of the National University of Loja. To develop this work, the CRISP-ML (Q) methodology was used as a reference framework, making use of its first four phases, in which the following was done: a compilation of 30 academic tasks obtained from 6 different subjects, from which 80 questions about their content were generated through crowdsourcing, which served as a basis for creating a dataset in SQuAD1.0 format with 1410 data, of which 800 were generated through paraphrasing and the Few-shot learning approach, and the remaining 610 with the direct contribution of the author. This dataset was divided into 90% for training (train) and 10% for evaluation (test), with an additional subdivision of the train set (75% for train and 25% for validation). Having the data prepared, DistilBERT hyperparameters were adjusted to train four different models using TensorFlow on the Google Colab platform with the GPU T4 runtime environment, selecting the best model based on its level of response extraction and F1-score. Once the QA model was chosen, an evaluation was performed using the ROUGE metric, including A/B testing. The QA model was deployed in Hugging Face and achieved an accuracy of 86.93% during its training with 51 epochs, a learning rate of 1 -5, and a batch size of 32, which through evaluation achieved a maximum F-measure in ROUGE-L of 60.96. These values demonstrate the importance of applying Fine-Tuning in the development of QA models for specific contexts. Keywords: QA model, DistilBERT, SQuAD 1.0 dataset, CRISP-ML(Q), ROUGE
eu_rights_str_mv openAccess
format bachelorThesis
id UNL_e7284cb9ccfe50d218becafd1059c3ec
instacron_str UNL
institution UNL
instname_str Universidad Nacional de Loja
language spa
network_acronym_str UNL
network_name_str Repositorio Universidad Nacional de Loja
oai_identifier_str oai:dspace.unl.edu.ec:123456789/30620
publishDate 2024
publisher.none.fl_str_mv Universidad Nacional de Loja
reponame_str Repositorio Universidad Nacional de Loja
repository.mail.fl_str_mv *
repository.name.fl_str_mv Repositorio Universidad Nacional de Loja - Universidad Nacional de Loja
repository_id_str 0
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/3.0/ec/
spelling Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.QA model based on DistilBERT to answer questions about content extracted from academic assignments of the Computer Science course at UNLJiménez Merino, Edy FranciscoMODELO QADISTILBERTDATASET SQUAD1.0CRISP-ML(Q)ROUGEThe adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about the content extracted from academic tasks of the Computer Science Department of the National University of Loja. To develop this work, the CRISP-ML (Q) methodology was used as a reference framework, making use of its first four phases, in which the following was done: a compilation of 30 academic tasks obtained from 6 different subjects, from which 80 questions about their content were generated through crowdsourcing, which served as a basis for creating a dataset in SQuAD1.0 format with 1410 data, of which 800 were generated through paraphrasing and the Few-shot learning approach, and the remaining 610 with the direct contribution of the author. This dataset was divided into 90% for training (train) and 10% for evaluation (test), with an additional subdivision of the train set (75% for train and 25% for validation). Having the data prepared, DistilBERT hyperparameters were adjusted to train four different models using TensorFlow on the Google Colab platform with the GPU T4 runtime environment, selecting the best model based on its level of response extraction and F1-score. Once the QA model was chosen, an evaluation was performed using the ROUGE metric, including A/B testing. The QA model was deployed in Hugging Face and achieved an accuracy of 86.93% during its training with 51 epochs, a learning rate of 1 -5, and a batch size of 32, which through evaluation achieved a maximum F-measure in ROUGE-L of 60.96. These values demonstrate the importance of applying Fine-Tuning in the development of QA models for specific contexts. Keywords: QA model, DistilBERT, SQuAD 1.0 dataset, CRISP-ML(Q), ROUGELa adaptación de modelos pre-entrenados Question Answering (QA) es una tarea esencial para que estos puedan ser implementados en diferentes escenarios. El objetivo de esta investigación es obtener el valor de la métrica ROUGE al aplicar la técnica Fine-Tuning sobre el modelo DistilBERT para dar respuesta a preguntas sobre el contenido extraído de tareas académicas de la Carrera de Computación de la Universidad Nacional de Loja. Para desarrollar este trabajo se usó la metodología CRISP-ML(Q) como marco de referencia, haciendo uso de sus cuatro primeras fases, en las que se realizó: una recopilación de 30 tareas académicas obtenidas de 6 materias diferentes, de las que se generó 80 preguntas sobre su contenido a través de crowdsourcing, las cuales sirvieron como base para crear un dataset en formato SQuAD1.0 con 1410 datos, de los cuales 800 se generaron mediante paráfrasis y el enfoque Few-shot learning, y los 610 restantes con el aporte directo del autor, este dataset se dividió en 90% para entrenamiento (train) y 10% para evaluación (test), con una subdivisión adicional del conjunto train (75% train y 25% validation), teniendo los datos preparados se ajustó hiperparámetros de DistilBERT para entrenar cuatro modelos diferentes usando TensorFlow en la plataforma Google Colab con el entorno de ejecución GPU T4, seleccionando el mejor modelo en base a su nivel de extracción de respuestas y F1-score. Una vez elegido el modelo QA, se realizó una evaluación mediante la métrica ROUGE incluida una prueba A/B testing. El modelo QA se desplegó en Hugging Face y logró una precisión de 86,93% durante su entrenamiento con 51 épocas, learning_rate de 1e^(-5) y batch_size de 32, el cual mediante la evaluación logró un F-measure máximo en ROUGE-L de 60,96. Estos valores demuestran la importancia de aplicar el Fine-Tuning en el desarrollo de modelos QA para contextos específicos. Palabras Clave: modelo QA, DistilBERT, dataset SQuAD1.0, CRISP-ML(Q), ROUGEUniversidad Nacional de LojaCumbicus Pineda, Oscar Miguel2024-09-21T01:17:20Z2024-09-21T01:17:20Z2024-09-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesis103 P.application/pdfhttps://dspace.unl.edu.ec/jspui/handle/123456789/30620spahttp://creativecommons.org/licenses/by-nc-sa/3.0/ec/info:eu-repo/semantics/openAccessreponame:Repositorio Universidad Nacional de Lojainstname:Universidad Nacional de Lojainstacron:UNL2025-05-02T14:28:51Zoai:dspace.unl.edu.ec:123456789/30620Institucionalhttps://dspace.unl.edu.ec/Universidad públicahttps://unl.edu.ec/https://dspace.unl.edu.ec/oaiEcuador***opendoar:02025-05-02T14:28:51falseInstitucionalhttps://dspace.unl.edu.ec/Universidad públicahttps://unl.edu.ec/https://dspace.unl.edu.ec/oai*Ecuador***opendoar:02025-05-02T14:28:51Repositorio Universidad Nacional de Loja - Universidad Nacional de Lojafalse
spellingShingle Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
Jiménez Merino, Edy Francisco
MODELO QA
DISTILBERT
DATASET SQUAD1.0
CRISP-ML(Q)
ROUGE
status_str publishedVersion
title Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_full Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_fullStr Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_full_unstemmed Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_short Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
title_sort Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
topic MODELO QA
DISTILBERT
DATASET SQUAD1.0
CRISP-ML(Q)
ROUGE
url https://dspace.unl.edu.ec/jspui/handle/123456789/30620