Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.
The adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about...
Gorde:
| Egile nagusia: | |
|---|---|
| Formatua: | bachelorThesis |
| Hizkuntza: | spa |
| Argitaratua: |
2024
|
| Gaiak: | |
| Sarrera elektronikoa: | https://dspace.unl.edu.ec/jspui/handle/123456789/30620 |
| Etiketak: |
Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!
|
| _version_ | 1857832993902559232 |
|---|---|
| author | Jiménez Merino, Edy Francisco |
| author_facet | Jiménez Merino, Edy Francisco |
| author_role | author |
| collection | Repositorio Universidad Nacional de Loja |
| dc.contributor.none.fl_str_mv | Cumbicus Pineda, Oscar Miguel |
| dc.creator.none.fl_str_mv | Jiménez Merino, Edy Francisco |
| dc.date.none.fl_str_mv | 2024-09-21T01:17:20Z 2024-09-21T01:17:20Z 2024-09-20 |
| dc.format.none.fl_str_mv | 103 P. application/pdf |
| dc.identifier.none.fl_str_mv | https://dspace.unl.edu.ec/jspui/handle/123456789/30620 |
| dc.language.none.fl_str_mv | spa |
| dc.publisher.none.fl_str_mv | Universidad Nacional de Loja |
| dc.rights.none.fl_str_mv | http://creativecommons.org/licenses/by-nc-sa/3.0/ec/ info:eu-repo/semantics/openAccess |
| dc.source.none.fl_str_mv | reponame:Repositorio Universidad Nacional de Loja instname:Universidad Nacional de Loja instacron:UNL |
| dc.subject.none.fl_str_mv | MODELO QA DISTILBERT DATASET SQUAD1.0 CRISP-ML(Q) ROUGE |
| dc.title.none.fl_str_mv | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. QA model based on DistilBERT to answer questions about content extracted from academic assignments of the Computer Science course at UNL |
| dc.type.none.fl_str_mv | info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/bachelorThesis |
| description | The adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about the content extracted from academic tasks of the Computer Science Department of the National University of Loja. To develop this work, the CRISP-ML (Q) methodology was used as a reference framework, making use of its first four phases, in which the following was done: a compilation of 30 academic tasks obtained from 6 different subjects, from which 80 questions about their content were generated through crowdsourcing, which served as a basis for creating a dataset in SQuAD1.0 format with 1410 data, of which 800 were generated through paraphrasing and the Few-shot learning approach, and the remaining 610 with the direct contribution of the author. This dataset was divided into 90% for training (train) and 10% for evaluation (test), with an additional subdivision of the train set (75% for train and 25% for validation). Having the data prepared, DistilBERT hyperparameters were adjusted to train four different models using TensorFlow on the Google Colab platform with the GPU T4 runtime environment, selecting the best model based on its level of response extraction and F1-score. Once the QA model was chosen, an evaluation was performed using the ROUGE metric, including A/B testing. The QA model was deployed in Hugging Face and achieved an accuracy of 86.93% during its training with 51 epochs, a learning rate of 1 -5, and a batch size of 32, which through evaluation achieved a maximum F-measure in ROUGE-L of 60.96. These values demonstrate the importance of applying Fine-Tuning in the development of QA models for specific contexts. Keywords: QA model, DistilBERT, SQuAD 1.0 dataset, CRISP-ML(Q), ROUGE |
| eu_rights_str_mv | openAccess |
| format | bachelorThesis |
| id | UNL_e7284cb9ccfe50d218becafd1059c3ec |
| instacron_str | UNL |
| institution | UNL |
| instname_str | Universidad Nacional de Loja |
| language | spa |
| network_acronym_str | UNL |
| network_name_str | Repositorio Universidad Nacional de Loja |
| oai_identifier_str | oai:dspace.unl.edu.ec:123456789/30620 |
| publishDate | 2024 |
| publisher.none.fl_str_mv | Universidad Nacional de Loja |
| reponame_str | Repositorio Universidad Nacional de Loja |
| repository.mail.fl_str_mv | * |
| repository.name.fl_str_mv | Repositorio Universidad Nacional de Loja - Universidad Nacional de Loja |
| repository_id_str | 0 |
| rights_invalid_str_mv | http://creativecommons.org/licenses/by-nc-sa/3.0/ec/ |
| spelling | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL.QA model based on DistilBERT to answer questions about content extracted from academic assignments of the Computer Science course at UNLJiménez Merino, Edy FranciscoMODELO QADISTILBERTDATASET SQUAD1.0CRISP-ML(Q)ROUGEThe adaptation of pre-trained question-answering (QA) models is an essential task so that they can be implemented in different scenarios. The objective of this research is to obtain the value of the rough metric by applying the Fine-Tuning technique to the DistilBERT model to answer questions about the content extracted from academic tasks of the Computer Science Department of the National University of Loja. To develop this work, the CRISP-ML (Q) methodology was used as a reference framework, making use of its first four phases, in which the following was done: a compilation of 30 academic tasks obtained from 6 different subjects, from which 80 questions about their content were generated through crowdsourcing, which served as a basis for creating a dataset in SQuAD1.0 format with 1410 data, of which 800 were generated through paraphrasing and the Few-shot learning approach, and the remaining 610 with the direct contribution of the author. This dataset was divided into 90% for training (train) and 10% for evaluation (test), with an additional subdivision of the train set (75% for train and 25% for validation). Having the data prepared, DistilBERT hyperparameters were adjusted to train four different models using TensorFlow on the Google Colab platform with the GPU T4 runtime environment, selecting the best model based on its level of response extraction and F1-score. Once the QA model was chosen, an evaluation was performed using the ROUGE metric, including A/B testing. The QA model was deployed in Hugging Face and achieved an accuracy of 86.93% during its training with 51 epochs, a learning rate of 1 -5, and a batch size of 32, which through evaluation achieved a maximum F-measure in ROUGE-L of 60.96. These values demonstrate the importance of applying Fine-Tuning in the development of QA models for specific contexts. Keywords: QA model, DistilBERT, SQuAD 1.0 dataset, CRISP-ML(Q), ROUGELa adaptación de modelos pre-entrenados Question Answering (QA) es una tarea esencial para que estos puedan ser implementados en diferentes escenarios. El objetivo de esta investigación es obtener el valor de la métrica ROUGE al aplicar la técnica Fine-Tuning sobre el modelo DistilBERT para dar respuesta a preguntas sobre el contenido extraído de tareas académicas de la Carrera de Computación de la Universidad Nacional de Loja. Para desarrollar este trabajo se usó la metodología CRISP-ML(Q) como marco de referencia, haciendo uso de sus cuatro primeras fases, en las que se realizó: una recopilación de 30 tareas académicas obtenidas de 6 materias diferentes, de las que se generó 80 preguntas sobre su contenido a través de crowdsourcing, las cuales sirvieron como base para crear un dataset en formato SQuAD1.0 con 1410 datos, de los cuales 800 se generaron mediante paráfrasis y el enfoque Few-shot learning, y los 610 restantes con el aporte directo del autor, este dataset se dividió en 90% para entrenamiento (train) y 10% para evaluación (test), con una subdivisión adicional del conjunto train (75% train y 25% validation), teniendo los datos preparados se ajustó hiperparámetros de DistilBERT para entrenar cuatro modelos diferentes usando TensorFlow en la plataforma Google Colab con el entorno de ejecución GPU T4, seleccionando el mejor modelo en base a su nivel de extracción de respuestas y F1-score. Una vez elegido el modelo QA, se realizó una evaluación mediante la métrica ROUGE incluida una prueba A/B testing. El modelo QA se desplegó en Hugging Face y logró una precisión de 86,93% durante su entrenamiento con 51 épocas, learning_rate de 1e^(-5) y batch_size de 32, el cual mediante la evaluación logró un F-measure máximo en ROUGE-L de 60,96. Estos valores demuestran la importancia de aplicar el Fine-Tuning en el desarrollo de modelos QA para contextos específicos. Palabras Clave: modelo QA, DistilBERT, dataset SQuAD1.0, CRISP-ML(Q), ROUGEUniversidad Nacional de LojaCumbicus Pineda, Oscar Miguel2024-09-21T01:17:20Z2024-09-21T01:17:20Z2024-09-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesis103 P.application/pdfhttps://dspace.unl.edu.ec/jspui/handle/123456789/30620spahttp://creativecommons.org/licenses/by-nc-sa/3.0/ec/info:eu-repo/semantics/openAccessreponame:Repositorio Universidad Nacional de Lojainstname:Universidad Nacional de Lojainstacron:UNL2025-05-02T14:28:51Zoai:dspace.unl.edu.ec:123456789/30620Institucionalhttps://dspace.unl.edu.ec/Universidad públicahttps://unl.edu.ec/https://dspace.unl.edu.ec/oaiEcuador***opendoar:02025-05-02T14:28:51falseInstitucionalhttps://dspace.unl.edu.ec/Universidad públicahttps://unl.edu.ec/https://dspace.unl.edu.ec/oai*Ecuador***opendoar:02025-05-02T14:28:51Repositorio Universidad Nacional de Loja - Universidad Nacional de Lojafalse |
| spellingShingle | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. Jiménez Merino, Edy Francisco MODELO QA DISTILBERT DATASET SQUAD1.0 CRISP-ML(Q) ROUGE |
| status_str | publishedVersion |
| title | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. |
| title_full | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. |
| title_fullStr | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. |
| title_full_unstemmed | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. |
| title_short | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. |
| title_sort | Modelo QA basado en DistilBERT para responder a preguntas sobre el contenido extraído de tareas académicas de la carrera de Computación de la UNL. |
| topic | MODELO QA DISTILBERT DATASET SQUAD1.0 CRISP-ML(Q) ROUGE |
| url | https://dspace.unl.edu.ec/jspui/handle/123456789/30620 |