Stuttering detection using deep learning

Speaker: Shakeel Ahmad Sheikh

Data and place: January 13, 2022, at 10:30 – Videoconference

Abstract: Stuttering, also known as stammering, is a neuro-developmental speech disorder during which the flow of speech is interrupted by core behaviors such as involuntary blocks, prolongations, and repetitions. The conventional assessment of stuttering is to count manually the occurrences of stuttering types and indicate them as a proportion to the total number of words in a speech passage. The main drawback in this manual counting is that they are time-consuming and subjective which makes it inconsistent and prone to error across different speech therapists. Approximately 70 million people suffer from stuttering problems worldwide which constitute 1% of the world’s population. Among them, the stuttering is significant in males which is approximately four-fifths. Stuttering identification is a complex interdisciplinary problem that involves speech processing, signal processing, neuroscience, psychology, pathology, and machine learning. The recent advancements in machine and DL have significantly transformed the speech domain. Even though there are plenty of potential applications, stuttering detection has received less attention, especially from a signal processing and machine learning perspective. The core behaviors of stuttering impact the acoustic properties of speech which can help to discriminate from fluent voice. Studies show that different formant characteristics such as formant transitions, formant fluctuations are affected by stuttering. The existing methods for stuttering detection employ spectral features such as Mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs) or their variants that capture that formant-related information. An alternative strategy of stuttering detection is to apply ASR on the audio speech signal to get the spoken texts and then to use language models. Even though this method of detecting stuttering has achieved encouraging results and has been proven effective, the reliance on ASR makes it computationally expensive and prone to error. In this work, we are exploring deep neural networks (DNNs) for stuttering detection directly from speech. We recently proposed a StutterNet, a time-delay neural network-based stuttering detection method that solely relies on the acoustic input, and the results were very promising in the domain of stuttering detection.