Algoritmo para la clasificación de aspectos de lenguaje natural basados en web semántica.

The present researching refers a design of an algorithm to classify aspects of natural language based on semantic web. For doing this, a literature review of search algorithms was carried out, this revision resulted in the need to propose new search alternatives to improve the results of the same. I...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Álvarez Lasso, Francisco Bolívar (author)
Weitere Verfasser: Mayo Pazuña, Lenyn Santiago (author)
Format: bachelorThesis
Sprache:spa
Veröffentlicht: 2019
Schlagworte:
Online Zugang:http://repositorio.utc.edu.ec/handle/27000/5339
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The present researching refers a design of an algorithm to classify aspects of natural language based on semantic web. For doing this, a literature review of search algorithms was carried out, this revision resulted in the need to propose new search alternatives to improve the results of the same. It was also observed that currently, there are few proposals that solve this problem using artificial intelligence tools efficiently. Therefore, this work proposes using Ramdon Forrest and K-Nearest Neighbors (k-NN) algorithms in web searches using data based on natural language. For the development of the proposed algorithm, Python was used as the programming language for the creation and prototyping of the proposed classification algorithm. To this end, the Spyder tool of the Anaconda suite and the Pandas, Sklearn library were used, where the Random Forest Classifier and KNeighbors Classifier, algorithms classified for Random Forest and Knn respectively are used. Random Forest consists of random forests formed by a set of randomly chosen classification trees constructed with N data from the sample with replacement k-NN is based simply on "remembering" all the examples that were seen in the training stage. Therefore, when a new data is presented to the learning system, it is classified according to the behavior of the closest data, the main difficulty of this method is to determine the value k, because if it takes a large value the risk is to do the classification according to the majority. The experimental process used four dataset extracted from the web, the same are GBvideos, which contains the comments on YouTube music, vg1, which corresponds to the sales of video games, zomato that shows the comments on restaurants and AppStore that contains the comments of the mobile applications. The number of instances analyzed corresponds to 57956 instances. The analysis resulted in a prediction rate of the classification in Random Forest 0.7 or 70% and k-NN 0.6 or 60%. To evaluate the proposed algorithm, Auc Roc was used, which obtained 0.7 of accuracy. With this analysis it is concluded that the use of an algorithm based on Random Forest is the most reliable and accurate for the classification of natural language. In addition, this algorithm could be considered as support for students in order to be established in future projects.