Illicit tweet detection using Transformers

Twitter is a very broad social network, allowing people to communicate with each other and express their ideas, thanks to its short and quick approach to posting. Unfortunately, it is not exempt from illicit affairs occurring on the platform. One arising problem in social networks, in general, is ho...

Cur síos iomlán

Sábháilte in:

Sonraí bibleagrafaíochta
Príomhchruthaitheoir:	Román Niemes, Stadyn Josué (author)
Formáid:	bachelorThesis
Teanga:	eng
Foilsithe / Cruthaithe:	2023
Ábhair:	Redes neuronales artificiales Twitter Procesamiento del lenguaje natural Artificial neural networks Natural language processing
Rochtain ar líne:	http://repositorio.yachaytech.edu.ec/handle/123456789/674
Clibeanna:	Cuir clib leis Níl clibeanna ann, Bí ar an gcéad duine le clib a chur leis an taifead seo!

Cur síos
Achoimre:	Twitter is a very broad social network, allowing people to communicate with each other and express their ideas, thanks to its short and quick approach to posting. Unfortunately, it is not exempt from illicit affairs occurring on the platform. One arising problem in social networks, in general, is how they are used to promote and spread illegal services, such as human trafficking, prostitution, illegal drugs, etc., thanks to those platforms' reach. Thus, it is important to identify those kinds of messages in order to detect illegal activities and act upon them. In this work, a framework for such detection is presented and developed using four Transformer models, the currently most powerful architecture to work in natural language processing. To feed and train the models, a dataset of Spanish tweets was curated and labeled to identify which tweets contained illicit offerings or content in their text. Two non-Transformer models were also used for comparison. The experiments showed that Transformer models are very good at adapting to the particularities of the Spanish language and the structure of tweets, with BERTweet and DistilBERT obtaining the highest results. Also, the Transformer models can adapt to not heavily imbalanced datasets (in this work, a proportion of near 2:1) and are not affected by the use of data augmentation.

Illicit tweet detection using Transformers

Míreanna comhchosúla