The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database

This thesis presents the Yggdrasil project, which aims to apply machine learning to predict organic chemical reactions by developing and validating of a neural network model. The project is structured into two main phases: Cognitio and Vaticinor. In the Cognitio phase, a subset of the USPTO_MIT data...

Volledige beschrijving

Bewaard in:
Bibliografische gegevens
Hoofdauteur: Castro Angamarca, Jonnathan Ariel (author)
Formaat: bachelorThesis
Taal:eng
Gepubliceerd in: 2024
Onderwerpen:
Online toegang:http://repositorio.yachaytech.edu.ec/handle/123456789/736
Tags: Voeg label toe
Geen labels, Wees de eerste die dit record labelt!
Omschrijving
Samenvatting:This thesis presents the Yggdrasil project, which aims to apply machine learning to predict organic chemical reactions by developing and validating of a neural network model. The project is structured into two main phases: Cognitio and Vaticinor. In the Cognitio phase, a subset of the USPTO_MIT database was processed and tailored for machine learning, augmented with Density Functional Theory calculations, to form a suitable dataset for model training. The Vaticinor phase involved the design, development, and validation of the neural network model, focusing on the model's ability to predict organic reactions accurately. Using only 0.375% of the USPTO_MIT database, the project achieved a test accuracy of 32.33% and a cross-validation accuracy of 29.39%. The analysis identified the 'Strong Correlation' feature set as yielding the best performance, emphasizing the importance of strategic feature selection in enhancing the model's predictive accuracy and generalization capability. The results illustrate the feasibility and potential of using machine learning for organic reaction prediction. Future direction for the Yggdrasil project includes: • Expanding the database to improve model robustness. • Integrating all Cognitio's scripts for process optimization. • Adding stereochemical information to the dataset. • Refining the model to cover a broader range of organic reactions. This thesis highlights the importance of data preparation, feature selection, and model validation in machine learning and computational chemistry. The codes that compose the Yggdrasil Project were written in Python3.10 and Bash and are available in the GitHub repository: https://github.com/jcastro7732/Yggdrasil-Project