Computer-assisted mispronunciation detection system for L2 kichwa speech
An initial and experimental evaluation of a mispronunciation detection system was developed for the Kichwa language. The study implemented pretrained convolutional neural network architectures to classify spectrograms of accurately pronounced and inaccurately pronounced words. The initial model, kno...
保存先:
第一著者: | |
---|---|
フォーマット: | bachelorThesis |
言語: | eng |
出版事項: |
2024
|
主題: | |
オンライン・アクセス: | http://repositorio.yachaytech.edu.ec/handle/123456789/727 |
タグ: |
タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
|
要約: | An initial and experimental evaluation of a mispronunciation detection system was developed for the Kichwa language. The study implemented pretrained convolutional neural network architectures to classify spectrograms of accurately pronounced and inaccurately pronounced words. The initial model, known as the CNN feature-based model, extracts features from the fully connected layers. It then employs a feature selection technique to separate discriminative features from non-discriminative ones. Finally, these features are classified using a KNN classifier. The second model, which is based on transfer learning with convolutional neural networks (CNNs), uses the knowledge from convolutional layers and adapts the classifier layer for binary classification, distinguishing between well-pronounced and mispronounced audios. When referring to the used dataset, two datasets were constructed and used in this study: a dataset with Kichwa words and synthetic words, and the same but with synthetic words for training. In conclusion, the CNN transfer learning-based method is superior to the CNN feature-based method in both datasets. Concretely, AlexNet with hyperparameter tuning achieves 0.90 and 0.92 in the balanced predictive value metric in both datasets, respectively. |
---|