An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets
Ensuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also resul...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | |
| التنسيق: | article |
| اللغة: | eng |
| منشور في: |
2023
|
| الوصول للمادة أونلاين: | https://www.mdpi.com/2077-0472/13/5/1015 https://hdl.handle.net/20.500.14809/5356 |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1858415124646199296 |
|---|---|
| author | Santos, Fabián |
| author2 | Acosta, Nicole |
| author2_role | author |
| author_facet | Santos, Fabián Acosta, Nicole |
| author_role | author |
| collection | Repositorio Universidad Tecnológica Indoamérica |
| dc.creator.none.fl_str_mv | Santos, Fabián Acosta, Nicole |
| dc.date.none.fl_str_mv | 2023-06-12T14:49:24Z 2023-06-12T14:49:24Z 2023 |
| dc.identifier.none.fl_str_mv | https://www.mdpi.com/2077-0472/13/5/1015 https://hdl.handle.net/20.500.14809/5356 |
| dc.language.none.fl_str_mv | eng |
| dc.publisher.none.fl_str_mv | Agriculture (Switzerland). Volume 13, Issue 5 |
| dc.rights.none.fl_str_mv | https://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess |
| dc.source.none.fl_str_mv | reponame:Repositorio Universidad Tecnológica Indoamérica instname:Universidad Tecnológica Indoamérica instacron:UTI |
| dc.title.none.fl_str_mv | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets |
| dc.type.none.fl_str_mv | info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/article |
| description | Ensuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also results in incomplete cases. These tasks are often labor-intensive since they require a case-wise review to obtain the requested and completed information. To address these problems, an approach based on Selenium web-scraping software and the multiple imputation denoising autoencoders (MIDAS) algorithm is presented for a case study in Ecuador. The objective was to produce a multidimensional database, free of data gaps, with 72 species of food crops based on the data from 3 different open data web databases. This methodology resulted in an analysis-ready dataset with 43 parameters describing plant traits, nutritional composition, and planted areas of food crops, whose imputed data obtained an R-square of 0.84 for a control numerical parameter selected for validation. This enriched dataset was later clustered with K-means to report unprecedented insights into food crops cultivated in Ecuador. The methodology is useful for users who need to collect and curate data from different sources in a semi-automatic fashion. |
| eu_rights_str_mv | openAccess |
| format | article |
| id | UTI_ec4d64bc8e6c3a03fb04b55cee69b2fb |
| instacron_str | UTI |
| institution | UTI |
| instname_str | Universidad Tecnológica Indoamérica |
| language | eng |
| network_acronym_str | UTI |
| network_name_str | Repositorio Universidad Tecnológica Indoamérica |
| oai_identifier_str | oai:repositorio.uti.edu.ec:20.500.14809/5356 |
| publishDate | 2023 |
| publisher.none.fl_str_mv | Agriculture (Switzerland). Volume 13, Issue 5 |
| reponame_str | Repositorio Universidad Tecnológica Indoamérica |
| repository.mail.fl_str_mv | . |
| repository.name.fl_str_mv | Repositorio Universidad Tecnológica Indoamérica - Universidad Tecnológica Indoamérica |
| repository_id_str | 0 |
| rights_invalid_str_mv | https://creativecommons.org/licenses/by/4.0/ |
| spelling | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security DatasetsSantos, FabiánAcosta, NicoleEnsuring food security requires the publication of data in a timely manner, but often this information is not properly documented and evaluated. Therefore, the combination of databases from multiple sources is a common practice to curate the data and corroborate the results; however, this also results in incomplete cases. These tasks are often labor-intensive since they require a case-wise review to obtain the requested and completed information. To address these problems, an approach based on Selenium web-scraping software and the multiple imputation denoising autoencoders (MIDAS) algorithm is presented for a case study in Ecuador. The objective was to produce a multidimensional database, free of data gaps, with 72 species of food crops based on the data from 3 different open data web databases. This methodology resulted in an analysis-ready dataset with 43 parameters describing plant traits, nutritional composition, and planted areas of food crops, whose imputed data obtained an R-square of 0.84 for a control numerical parameter selected for validation. This enriched dataset was later clustered with K-means to report unprecedented insights into food crops cultivated in Ecuador. The methodology is useful for users who need to collect and curate data from different sources in a semi-automatic fashion.Agriculture (Switzerland). Volume 13, Issue 52023-06-12T14:49:24Z2023-06-12T14:49:24Z2023info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://www.mdpi.com/2077-0472/13/5/1015https://hdl.handle.net/20.500.14809/5356enghttps://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessreponame:Repositorio Universidad Tecnológica Indoaméricainstname:Universidad Tecnológica Indoaméricainstacron:UTI2023-06-12T18:56:56Zoai:repositorio.uti.edu.ec:20.500.14809/5356Institucionalhttps://repositorio.uti.edu.ec/Institución privadahttps://indoamerica.edu.ec/https://repositorio.uti.edu.ec/oai.Ecuador...opendoar:02023-06-12T18:56:56Repositorio Universidad Tecnológica Indoamérica - Universidad Tecnológica Indoaméricafalse |
| spellingShingle | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets Santos, Fabián |
| status_str | publishedVersion |
| title | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets |
| title_full | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets |
| title_fullStr | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets |
| title_full_unstemmed | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets |
| title_short | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets |
| title_sort | An Approach Based on Web Scraping and Denoising Encoders to Curate Food Security Datasets |
| url | https://www.mdpi.com/2077-0472/13/5/1015 https://hdl.handle.net/20.500.14809/5356 |