Computer-assisted mispronunciation detection system for L2 kichwa speech

An initial and experimental evaluation of a mispronunciation detection system was developed for the Kichwa language. The study implemented pretrained convolutional neural network architectures to classify spectrograms of accurately pronounced and inaccurately pronounced words. The initial model, kno...

詳細記述

保存先:
書誌詳細
第一著者: Velasco Silva, Ricardo Isaías (author)
フォーマット: bachelorThesis
言語:eng
出版事項: 2024
主題:
オンライン・アクセス:http://repositorio.yachaytech.edu.ec/handle/123456789/727
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
その他の書誌記述
要約:An initial and experimental evaluation of a mispronunciation detection system was developed for the Kichwa language. The study implemented pretrained convolutional neural network architectures to classify spectrograms of accurately pronounced and inaccurately pronounced words. The initial model, known as the CNN feature-based model, extracts features from the fully connected layers. It then employs a feature selection technique to separate discriminative features from non-discriminative ones. Finally, these features are classified using a KNN classifier. The second model, which is based on transfer learning with convolutional neural networks (CNNs), uses the knowledge from convolutional layers and adapts the classifier layer for binary classification, distinguishing between well-pronounced and mispronounced audios. When referring to the used dataset, two datasets were constructed and used in this study: a dataset with Kichwa words and synthetic words, and the same but with synthetic words for training. In conclusion, the CNN transfer learning-based method is superior to the CNN feature-based method in both datasets. Concretely, AlexNet with hyperparameter tuning achieves 0.90 and 0.92 in the balanced predictive value metric in both datasets, respectively.