The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database
This thesis presents the Yggdrasil project, which aims to apply machine learning to predict organic chemical reactions by developing and validating of a neural network model. The project is structured into two main phases: Cognitio and Vaticinor. In the Cognitio phase, a subset of the USPTO_MIT data...
Guardado en:
Autor principal: | |
---|---|
Formato: | bachelorThesis |
Lenguaje: | eng |
Publicado: |
2024
|
Materias: | |
Acceso en línea: | http://repositorio.yachaytech.edu.ec/handle/123456789/736 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
_version_ | 1840070392233852928 |
---|---|
author | Castro Angamarca, Jonnathan Ariel |
author_facet | Castro Angamarca, Jonnathan Ariel |
author_role | author |
collection | Repositorio Universidad Yachay Tech |
dc.contributor.none.fl_str_mv | Terencio, Thibault |
dc.creator.none.fl_str_mv | Castro Angamarca, Jonnathan Ariel |
dc.date.none.fl_str_mv | 2024-04-02T17:39:33Z 2024-04-02T17:39:33Z 2024-04 |
dc.format.none.fl_str_mv | application/pdf |
dc.identifier.none.fl_str_mv | http://repositorio.yachaytech.edu.ec/handle/123456789/736 |
dc.language.none.fl_str_mv | eng |
dc.publisher.none.fl_str_mv | Universidad de Investigación de Tecnología Experimental Yachay |
dc.rights.none.fl_str_mv | info:eu-repo/semantics/openAccess |
dc.source.none.fl_str_mv | reponame:Repositorio Universidad Yachay Tech instname:Universidad Yachay Tech instacron:Yachay |
dc.subject.none.fl_str_mv | Redes neuronales Aprendizaje automático Cognitio Neural network Machine learning Vaticinor |
dc.title.none.fl_str_mv | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database |
dc.type.none.fl_str_mv | info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/bachelorThesis |
description | This thesis presents the Yggdrasil project, which aims to apply machine learning to predict organic chemical reactions by developing and validating of a neural network model. The project is structured into two main phases: Cognitio and Vaticinor. In the Cognitio phase, a subset of the USPTO_MIT database was processed and tailored for machine learning, augmented with Density Functional Theory calculations, to form a suitable dataset for model training. The Vaticinor phase involved the design, development, and validation of the neural network model, focusing on the model's ability to predict organic reactions accurately. Using only 0.375% of the USPTO_MIT database, the project achieved a test accuracy of 32.33% and a cross-validation accuracy of 29.39%. The analysis identified the 'Strong Correlation' feature set as yielding the best performance, emphasizing the importance of strategic feature selection in enhancing the model's predictive accuracy and generalization capability. The results illustrate the feasibility and potential of using machine learning for organic reaction prediction. Future direction for the Yggdrasil project includes: • Expanding the database to improve model robustness. • Integrating all Cognitio's scripts for process optimization. • Adding stereochemical information to the dataset. • Refining the model to cover a broader range of organic reactions. This thesis highlights the importance of data preparation, feature selection, and model validation in machine learning and computational chemistry. The codes that compose the Yggdrasil Project were written in Python3.10 and Bash and are available in the GitHub repository: https://github.com/jcastro7732/Yggdrasil-Project |
eu_rights_str_mv | openAccess |
format | bachelorThesis |
id | Yachay_6f089bf5d4179cd5cc379a990df3b4a9 |
instacron_str | Yachay |
institution | Yachay |
instname_str | Universidad Yachay Tech |
language | eng |
network_acronym_str | Yachay |
network_name_str | Repositorio Universidad Yachay Tech |
oai_identifier_str | oai:repositorio.yachaytech.edu.ec:123456789/736 |
publishDate | 2024 |
publisher.none.fl_str_mv | Universidad de Investigación de Tecnología Experimental Yachay |
reponame_str | Repositorio Universidad Yachay Tech |
repository.mail.fl_str_mv | . |
repository.name.fl_str_mv | Repositorio Universidad Yachay Tech - Universidad Yachay Tech |
repository_id_str | 10284 |
spelling | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored databaseCastro Angamarca, Jonnathan ArielRedes neuronalesAprendizaje automáticoCognitioNeural networkMachine learningVaticinorThis thesis presents the Yggdrasil project, which aims to apply machine learning to predict organic chemical reactions by developing and validating of a neural network model. The project is structured into two main phases: Cognitio and Vaticinor. In the Cognitio phase, a subset of the USPTO_MIT database was processed and tailored for machine learning, augmented with Density Functional Theory calculations, to form a suitable dataset for model training. The Vaticinor phase involved the design, development, and validation of the neural network model, focusing on the model's ability to predict organic reactions accurately. Using only 0.375% of the USPTO_MIT database, the project achieved a test accuracy of 32.33% and a cross-validation accuracy of 29.39%. The analysis identified the 'Strong Correlation' feature set as yielding the best performance, emphasizing the importance of strategic feature selection in enhancing the model's predictive accuracy and generalization capability. The results illustrate the feasibility and potential of using machine learning for organic reaction prediction. Future direction for the Yggdrasil project includes: • Expanding the database to improve model robustness. • Integrating all Cognitio's scripts for process optimization. • Adding stereochemical information to the dataset. • Refining the model to cover a broader range of organic reactions. This thesis highlights the importance of data preparation, feature selection, and model validation in machine learning and computational chemistry. The codes that compose the Yggdrasil Project were written in Python3.10 and Bash and are available in the GitHub repository: https://github.com/jcastro7732/Yggdrasil-ProjectEsta tesis presenta el proyecto Yggdrasil, cuyo objetivo es aplicar el aprendizaje automático para predecir reacciones químicas orgánicas mediante el desarrollo y validación de un modelo de red neuronal. El proyecto se estructura en dos fases principales: Cognitio y Vaticinor. En la fase Cognitio, un subconjunto de la base de datos USPTO_MIT fue procesado y adaptado para el aprendizaje automático, aumentado con cálculos de la Teoría del Funcional Densidad, para formar un conjunto de datos adecuado para el entrenamiento del modelo. La fase Vaticinor consistió en el diseño, desarrollo y validación del modelo de red neuronal, centrándose en la capacidad del modelo para predecir reacciones orgánicas con precisión. Utilizando sólo el 0,375% de la base de datos USPTO_MIT, el proyecto logró una precisión de prueba del 32,33% y una precisión de validación cruzada del 29,39%. El análisis determinó que el conjunto de características "Correlación fuerte" era el que ofrecía el mejor rendimiento, lo que subraya la importancia de la selección estratégica de características para mejorar la precisión predictiva y la capacidad de generalización del modelo. Los resultados ilustran la viabilidad y el potencial del uso del aprendizaje automático para la predicción de reacciones orgánicas. La dirección futura del proyecto Yggdrasil incluye: • La ampliación de la base de datos para mejorar la robustez del modelo. • La integración de todos los scripts de Cognitio para la optimización de procesos. • La adición de información estereoquímica al conjunto de datos. • El perfeccionamiento del modelo para cubrir una gama más amplia de reacciones orgánicas. Esta tesis destaca la importancia de la preparación de datos, la selección de características y la validación de modelos en el aprendizaje automático y la química computacional. Los códigos que componen el Proyecto Yggdrasil fueron escritos en Python3.10 y Bash y están disponibles en el repositorio GitHub: https://github.com/jcastro7732/Yggdrasil-ProjectQuímico/aUniversidad de Investigación de Tecnología Experimental YachayTerencio, Thibault2024-04-02T17:39:33Z2024-04-02T17:39:33Z2024-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisapplication/pdfhttp://repositorio.yachaytech.edu.ec/handle/123456789/736enginfo:eu-repo/semantics/openAccessreponame:Repositorio Universidad Yachay Techinstname:Universidad Yachay Techinstacron:Yachay2025-07-08T17:56:17Zoai:repositorio.yachaytech.edu.ec:123456789/736Institucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oaiEcuador...opendoar:102842025-07-08T17:56:17falseInstitucionalhttps://repositorio.yachaytech.edu.ec/Universidad públicahttps://www.yachaytech.edu.ec/https://repositorio.yachaytech.edu.ec/oai.Ecuador...opendoar:102842025-07-08T17:56:17Repositorio Universidad Yachay Tech - Universidad Yachay Techfalse |
spellingShingle | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database Castro Angamarca, Jonnathan Ariel Redes neuronales Aprendizaje automático Cognitio Neural network Machine learning Vaticinor |
status_str | publishedVersion |
title | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database |
title_full | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database |
title_fullStr | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database |
title_full_unstemmed | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database |
title_short | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database |
title_sort | The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database |
topic | Redes neuronales Aprendizaje automático Cognitio Neural network Machine learning Vaticinor |
url | http://repositorio.yachaytech.edu.ec/handle/123456789/736 |