Artificial Intelligence based detection of manipulated audio for political forensics

The proliferation of deepfake audio poses significant challenges in political forensics, as it can be used to spread misinformation and manipulate public opinion. This thesis addresses these challenges by developing and evaluating AI-based models to detect manipulated audio. A systematic review of t...

תיאור מלא

שמור ב:
מידע ביבליוגרפי
מחבר ראשי: Mendoza Núñez, Patricio Joshue (author)
פורמט: bachelorThesis
שפה:eng
יצא לאור: 2024
נושאים:
גישה מקוונת:http://repositorio.yachaytech.edu.ec/handle/123456789/849
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!
תיאור
סיכום:The proliferation of deepfake audio poses significant challenges in political forensics, as it can be used to spread misinformation and manipulate public opinion. This thesis addresses these challenges by developing and evaluating AI-based models to detect manipulated audio. A systematic review of the literature on advanced techniques for detecting manipulated multimedia content was conducted, highlighting the difficulties posed by synthesis and editing techniques. Based on this analysis, a dataset of real and artificially fabricated political speeches was compiled, utilizing natural language processing (NLP) methods to extract feature vectors. Two neural network architectures were evaluated: Convolutional Neural Networks (CNN) and Transformers. The CNN model consists of a 7-layer network to process audio waveforms, while the Transformer model employs 12 or 24 Transformer blocks to capture global dependencies and contextual information. The study also analyzes acoustic features that distinguish real from fake audio, including spectrograms, decibel levels, and feature representations such as MFCC and Mel-Spectrogram. The results indicate that fake audio tends to be louder and less variable than real audio, and the feature representations confirm the synthetic nature of fake audio. The conclusions highlight the effectiveness of Transformer models in detecting manipulated audio, outperforming CNNs in accuracy and generalization capability, suggesting a promising path for future research in this area.