An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets

Ensuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also resul...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Santos, Fabián (author)
مؤلفون آخرون: Acosta, Nicole (author)
التنسيق: article
اللغة:eng
منشور في: 2023
الوصول للمادة أونلاين:https://www.mdpi.com/2077-0472/13/5/1015
https://hdl.handle.net/20.500.14809/5356
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1858415124646199296
author Santos, Fabián
author2 Acosta, Nicole
author2_role author
author_facet Santos, Fabián
Acosta, Nicole
author_role author
collection Repositorio Universidad Tecnológica Indoamérica
dc.creator.none.fl_str_mv Santos, Fabián
Acosta, Nicole
dc.date.none.fl_str_mv 2023-06-12T14:49:24Z
2023-06-12T14:49:24Z
2023
dc.identifier.none.fl_str_mv https://www.mdpi.com/2077-0472/13/5/1015
https://hdl.handle.net/20.500.14809/5356
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv Agriculture (Switzerland). Volume 13, Issue 5
dc.rights.none.fl_str_mv https://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
dc.source.none.fl_str_mv reponame:Repositorio Universidad Tecnológica Indoamérica
instname:Universidad Tecnológica Indoamérica
instacron:UTI
dc.title.none.fl_str_mv An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
description Ensuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also results in incomplete cases. These tasks are often labor-intensive since they require a case-wise review to obtain the requested and completed information. To address these problems, an approach based on Selenium web-scraping software and the multiple imputation denoising autoencoders (MIDAS) algorithm is presented for a case study in Ecuador. The objective was to produce a multidimensional database, free of data gaps, with 72 species of food crops based on the data from 3 different open data web databases. This methodology resulted in an analysis-ready dataset with 43 parameters describing plant traits, nutritional composition, and planted areas of food crops, whose imputed data obtained an R-square of 0.84 for a control numerical parameter selected for validation. This enriched dataset was later clustered with K-means to report unprecedented insights into food crops cultivated in Ecuador. The methodology is useful for users who need to collect and curate data from different sources in a semi-automatic fashion.
eu_rights_str_mv openAccess
format article
id UTI_ec4d64bc8e6c3a03fb04b55cee69b2fb
instacron_str UTI
institution UTI
instname_str Universidad Tecnológica Indoamérica
language eng
network_acronym_str UTI
network_name_str Repositorio Universidad Tecnológica Indoamérica
oai_identifier_str oai:repositorio.uti.edu.ec:20.500.14809/5356
publishDate 2023
publisher.none.fl_str_mv Agriculture (Switzerland). Volume 13, Issue 5
reponame_str Repositorio Universidad Tecnológica Indoamérica
repository.mail.fl_str_mv .
repository.name.fl_str_mv Repositorio Universidad Tecnológica Indoamérica - Universidad Tecnológica Indoamérica
repository_id_str 0
rights_invalid_str_mv https://creativecommons.org/licenses/by/4.0/
spelling An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security DatasetsSantos, FabiánAcosta, NicoleEnsuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also results in incomplete cases. These tasks are often labor-intensive since they require a case-wise review to obtain the requested and completed information. To address these problems, an approach based on Selenium web-scraping software and the multiple imputation denoising autoencoders (MIDAS) algorithm is presented for a case study in Ecuador. The objective was to produce a multidimensional database, free of data gaps, with 72 species of food crops based on the data from 3 different open data web databases. This methodology resulted in an analysis-ready dataset with 43 parameters describing plant traits, nutritional composition, and planted areas of food crops, whose imputed data obtained an R-square of 0.84 for a control numerical parameter selected for validation. This enriched dataset was later clustered with K-means to report unprecedented insights into food crops cultivated in Ecuador. The methodology is useful for users who need to collect and curate data from different sources in a semi-automatic fashion.Agriculture (Switzerland). Volume 13, Issue 52023-06-12T14:49:24Z2023-06-12T14:49:24Z2023info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://www.mdpi.com/2077-0472/13/5/1015https://hdl.handle.net/20.500.14809/5356enghttps://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessreponame:Repositorio Universidad Tecnológica Indoaméricainstname:Universidad Tecnológica Indoaméricainstacron:UTI2023-06-12T18:56:56Zoai:repositorio.uti.edu.ec:20.500.14809/5356Institucionalhttps://repositorio.uti.edu.ec/Institución privadahttps://indoamerica.edu.ec/https://repositorio.uti.edu.ec/oai.Ecuador...opendoar:02023-06-12T18:56:56Repositorio Universidad Tecnológica Indoamérica - Universidad Tecnológica Indoaméricafalse
spellingShingle An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
Santos, Fabián
status_str publishedVersion
title An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
title_full An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
title_fullStr An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
title_full_unstemmed An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
title_short An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
title_sort An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
url https://www.mdpi.com/2077-0472/13/5/1015
https://hdl.handle.net/20.500.14809/5356