Análisis de sentimientos en twitter para descubrir contenido xenófobo hacia los inmigrantes venezolanos en Ecuador.
Ecuador registers a great number of Venezuelan immigrants, confirmed by the International Organization of Migration and indicates that it is the third country with the greatest number of Venezuelan immigrants, and that these are part of the countries that have presented the worst social indicators,...
Gespeichert in:
| 1. Verfasser: | |
|---|---|
| Format: | bachelorThesis |
| Sprache: | spa |
| Veröffentlicht: |
2021
|
| Schlagworte: | |
| Online Zugang: | https://dspace.unl.edu.ec/jspui/handle/123456789/23796 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Ecuador registers a great number of Venezuelan immigrants, confirmed by the International Organization of Migration and indicates that it is the third country with the greatest number of Venezuelan immigrants, and that these are part of the countries that have presented the worst social indicators, showing themselves as ethnic and racial discrimination, being these indicators the basis of xenophobic feelings among the countries of the region, this led to the need to determine their existence in the population to prevent hate crimes from being committed. The purpose of present final project (TT) was to determine the existence of xenophobic content in a group of tweets collected from Venezuelan immigrants in Ecuador. This was done through the phases of the Knowledge Discovery in Text (KDT) methodology, which were carried out in Python, with the most important libraries such as NLTK, Imbalanced-Learn and Scikit-Learn, as well as the application of Machine Translation for the translation of the tweets. Xenophobia being a complex feeling to identify through natural language processing, another set of tweets had to be used which had already been classified by crowdsourcing, that is, those tweets were classified by humans in a collaborative way to detect hate speech and offensive language, giving as a result a model already trained which was improved by fine-tuning, being this the base for the training of the algorithms used in the present TT, during the fine-tuning it was determined to use the Synthetic Minority Oversampling Technique (SMOTE) for the creation of synthetic data in the minority classes, this technique also allowed to balance the classes of the set of tweets of interest where a new classification divided in three feelings was obtained: xenophobic, offensive and others. To obtain predictions, three supervised classification algorithms were executed: Support Vector Machine (SVM), Naïve Bayes and Logistic Regression, with SVM being the algorithm with the best performance with an F1 score of 94%. Finally, it was found that 5.76% of the collected tweets contain xenophobic feelings, 31.23% offensive feelings and the remaining 63% contain other feelings that are directed towards Venezuelan immigrants in Ecuador. Keywords: Natural Language Processing, Sentiment Analysis, Python, Xenophobia, Venezuelan Immigrants in Ecuador. |
|---|