Automatic Assessment of Prosody in Second Language Learning
The present thesis studies methods for automatically assessing the prosody of non-native speakers for the purpose of computer-assisted pronunciation training. We study the detection of word accent errors, and the general assessment of the appropriateness of a speaker’s rhythm. We propose a flexible, generic approach that is (a) very successful on these tasks, (b) competitive to other state-of-the-art result, and at the same time (c) flexible and easily adapted to new tasks.
For word accent error detection, we derive a measure for the probability of acceptable pronunciation which is ideal for a well-grounded decision whether or not to provide error feedback to the learner. Our best system achieves a true positive rate (TPR) of 71.5 % at a false positive rate (FPR) of 5 %, which is a result very competitive to the state-of-the art, and not too far away from human performance (TPR 61.9 % at 3.2 % FPR).
For scoring general prosody, we obtain a Spearman correlation of ρ = 0.773 to the human reference scores on the C-AuDiT database (sentences read by non-native speakers); this is slightly better than the average labeller on that data (comparable quality measure for machine performance: r = 0.71 vs. 0.66 for human performance). On speaker level, performance is more stable with ρ = 0.854. On AUWL (non-native speakers practising dialogues), the task is much harder for both human and machine. Our best system achieves a correlation of ρ = 0.619 to the reference scores; here, humans are better than the system (quality measure for humans: r = 0.58 vs. 0.51 for machine performance). On speaker level, correlation rises to ρ = 0.821. On both databases, the obtained results are competitive to the state-of-the-art.