“Procedimiento algorítmico basado en técnicas del procesamiento del lenguaje natural para el análisis del CORPUS de artículos científicos de la plataforma EcuCiencia.”
Today to analyze an excessive amount of documents in electronic format that are found on the web is a complicated and tiring task for any person, in the scientific platform ECUCIENCIA when analyzing a scientific article is based only on the title, summary and keywords, there are documents in pdf for...
Guardat en:
| Autor principal: | |
|---|---|
| Altres autors: | |
| Format: | bachelorThesis |
| Idioma: | spa |
| Publicat: |
2020
|
| Matèries: | |
| Accés en línia: | http://repositorio.utc.edu.ec/handle/27000/8612 |
| Etiquetes: |
Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
| Sumari: | Today to analyze an excessive amount of documents in electronic format that are found on the web is a complicated and tiring task for any person, in the scientific platform ECUCIENCIA when analyzing a scientific article is based only on the title, summary and keywords, there are documents in pdf format with much more information in the body of the document, where it is possible to visualize data with greater accuracy since we are living in an era where technology and the Internet have allowed us to generate and collect large volumes of information. For the study of the project, the objective was to establish an algorithmic procedure through natural language processing techniques that allowed the analysis of the corpus of scientific articles of the research professors of the Technical University of Cotopaxi stored in the ECUCIENCIA platform; There were two phases to fulfill the development of the project, the methodology KDD (Knowledge Discovery in Databases) was used for the first phase that leads to the extraction of knowledge which is the methodological process to find a valid, useful and understandable model that describes patterns according to the extracted information, On the other hand, for the second stage, the scrum methodology was used, which allowed a direct communication between the client and the development team, thus having a higher quality of the final product. In this way, the project grew from iteration to iteration without problems and the logic acquired from the first stage was joined to the development of a module, where Python libraries were applied that allowed the analysis of the corpus of the scientific articles in pdf format obtaining from them the lexical richness, word frequency, stop words, similarity and distances of the texts that are represented by means of graphics for the users to visualize the content of the data analysis without difficulty. |
|---|