Andreas Maier

January 5, 2009

Speech of children with Cleft Lip and Palate: Automatic Assessment

Abstract

This work investigates the use of automatic speech processing techniques for the automatic assessment of children’s speech disorders. The target group were children with cleft lip and palate (CLP). The speech processing techniques are applied to evaluate the children’s speech intelligibility and their articulation. Another goal of this work is to visualize the kind and degree of the pathology in the children’s speech. Tracking of the children’s therapy progress is also within the reach of the system.

Cleft lip and palate is the most common orofacial alteration. Even after adequate surgery, speech and hearing is still affected. The articulation or speech disorders of the children consist of typical misarticulations such as backing of consonants and enhanced nasal air emission.

State-of-the-art evaluation of speech disorders is performed perceptively by human listeners. This method, however, is hampered by inter- and intra-individual differences. Therefore, an automatic evaluation is desirable.

We developed PEAKS — the Program for the Evaluation of All Kinds of Speech disorders. With PEAKS one can record and evaluate speech data via the Internet. It runs in any web browser and features security concepts such as secure transmission and user level access control.

The agreement of PEAKS with different human experts is measured with different correlation coefficients, Kappa, and Alpha. The evaluation procedures for intelligibility employ Support Vector Machines and Regression. Furthermore, dimensionality reduction techniques such as LDA, PCA, and Sammon mapping are used for the visualization and the feature reduction. As input for these algorithms typical speech processing features such as MFCCs as well as specialized feature sets for prosody, pronunciation, and hypernasalization are employed. Another approach of this work is to use a children’s speech recognizer to model a naïve listener. If the recording conditions are kept constant, the speaker should be the only varying factor. Hence, the recognition rate should resemble the intelligibility of the speaker.

Collection of patient speech data was performed in Erlangen from 2002 until 2008. 312 children with CLP were recorded. Control groups were gathered in four major cities of Germany to cover several regions of dialect. 726 control data sets were acquired.

The experimental results showed that the automatic system yields a high and significant agreement to the human raters for global parameters such as intelligibility as well as single articulation disorders. The system is in the same range as the human raters. The intelligibility assessment was shown to be independent of the region of dialect. The visualization of the speech data also showed high agreement to perceptively rated criteria. Artifacts which were caused by the use of multiple microphones were removed.