Deep learning neural network development for the classification of bacteriocin sequences produced by lactic acid bacteria

The rise of antibiotic-resistant bacteria presents a pressing need for exploring new natural compounds with innovative mechanisms to replace existing antibiotics. Bacteriocins offer promising alternatives for developing therapeutic and preventive strategies in livestock, aquaculture, and human healt...

Full description

Saved in:
Bibliographic Details
Main Author: González Bohórquez, Lady Laura (author)
Format: bachelorThesis
Language:eng
Published: 2024
Subjects:
Online Access:http://repositorio.yachaytech.edu.ec/handle/123456789/734
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The rise of antibiotic-resistant bacteria presents a pressing need for exploring new natural compounds with innovative mechanisms to replace existing antibiotics. Bacteriocins offer promising alternatives for developing therapeutic and preventive strategies in livestock, aquaculture, and human health. Specifically, those produced by LAB are recognized as GRAS and QPS. This study was used a deep learning neural network for binary classification of bacteriocin amino acid sequences, distinguishing those produced by LAB. This type of network can learn complex patterns and representations of data. The features were extracted using the k-mer method and vector embedding. Ten different groups were tested, combining embedding vectors and k-mers: EV, 'EV+3-mers', 'EV+5-mers', 'EV+7-mers', 'EV+15-mers', 'EV+20-mers', 'EV+3-mers+5-mers', 'EV+3-mers+7-mers', 'EV+5-mers+7-mers', and 'EV+15-mers+20-mers'. As results, five sets of 100 characteristic k-mers unique to bacteriocins produced by LAB were obtained for values of k = 3, 5, 7, 15, and 20. Significant difference was observed between the EV group and '5-mers+7-mers+EV', showing superior accuracy and loss results in the last group. Employing k-fold cross-validation with k=30, the average results for loss, accuracy, precision, recall, and F1 score were 9.900%, 90.143%, 90.300%, 90.100%, and 90.100% respectively. Folder 22 stood out with 8.500% loss, 91.471% accuracy, and 91.000% precision, recall, and F1 score. Presenting a performance that agrees with the existing literature.