Application of Automatic Speech Recognition to Quantitative Assessment of Tracheoesophageal Speech with Different Signal Quality

Document Type
Issue Date
Issue Year
Haderlein, Tino
Riedhammer, Korbinian
Nöth, Elmar
Toy, Hikmet
Schuster, Maria
Eysholdt, Ulrich
Hornegger, Joachim
Rosanowski, Frank

Objective: Tracheoesophageal voice is state-of-the-art in voice rehabilitation after laryngectomy. Intelligibility on a telephone is an important evaluation criterion as it is a crucial part of social life. An objective measure of intelligibility when talking on a telephone is desirable in the field of postlaryngectomy speech therapy and its evaluation. Patients and Methods: Based upon successful earlier studies with broadband speech, an automatic speech recognition (ASR) system was applied to 41 recordings of postlaryngectomy patients. Recordings were available in different signal qualities; quality was the crucial criterion for this study. Results: Compared to the intelligibility rating of 5 human experts, the ASR system had a correlation coefficient of r = –0.87 and Krippendorff’s _ of 0.65 when broadband speech was processed. The rater group alone achieved _ = 0.66. With the test recordings in telephone quality, the system reached r = –0.79 and _ = 0.67. Conclusion: For medical purposes, a comprehensive diagnostic approach to (substitute) voice has to cover both subjective and objective tests. An auto- matic recognition system such as the one proposed in this study can be used for objective intelligibility rating with results comparable to those of human experts. This holds for broadband speech as well as for automatic evaluation via telephone.

Journal Title
Folia Phoniatrica et Logopaedica 2009; 61: 12-17. <> © 2008 S. Karger AG, Basel
Folia Phoniatrica et Logopaedica 2009; 61: 12-17. <> © 2008 S. Karger AG, Basel
Document's Licence
Zugehörige ORCIDs