The Yggdrasil Project: organic reaction prediction with a custom-developed feedforward neural network and a tailored database

This thesis presents the Yggdrasil project, which aims to apply machine learning to predict organic chemical reactions by developing and validating of a neural network model. The project is structured into two main phases: Cognitio and Vaticinor. In the Cognitio phase, a subset of the USPTO_MIT data...

Mô tả đầy đủ

Đã lưu trong:
Chi tiết về thư mục
Tác giả chính: Castro Angamarca, Jonnathan Ariel (author)
Định dạng: bachelorThesis
Ngôn ngữ:eng
Được phát hành: 2024
Những chủ đề:
Truy cập trực tuyến:http://repositorio.yachaytech.edu.ec/handle/123456789/736
Các nhãn: Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!
Miêu tả
Tóm tắt:This thesis presents the Yggdrasil project, which aims to apply machine learning to predict organic chemical reactions by developing and validating of a neural network model. The project is structured into two main phases: Cognitio and Vaticinor. In the Cognitio phase, a subset of the USPTO_MIT database was processed and tailored for machine learning, augmented with Density Functional Theory calculations, to form a suitable dataset for model training. The Vaticinor phase involved the design, development, and validation of the neural network model, focusing on the model's ability to predict organic reactions accurately. Using only 0.375% of the USPTO_MIT database, the project achieved a test accuracy of 32.33% and a cross-validation accuracy of 29.39%. The analysis identified the 'Strong Correlation' feature set as yielding the best performance, emphasizing the importance of strategic feature selection in enhancing the model's predictive accuracy and generalization capability. The results illustrate the feasibility and potential of using machine learning for organic reaction prediction. Future direction for the Yggdrasil project includes: • Expanding the database to improve model robustness. • Integrating all Cognitio's scripts for process optimization. • Adding stereochemical information to the dataset. • Refining the model to cover a broader range of organic reactions. This thesis highlights the importance of data preparation, feature selection, and model validation in machine learning and computational chemistry. The codes that compose the Yggdrasil Project were written in Python3.10 and Bash and are available in the GitHub repository: https://github.com/jcastro7732/Yggdrasil-Project