Speech signals result from the movements of articulators. A good knowledge of their position with respect to sounds is essential to improve, on the one hand, articulatory speech synthesis, and on the other hand, the relevance of the diagnosis and of the associated feedback in computer assisted language learning. Production and perception processes are interrelated, so a better understanding of how humans perceive speech should lead to more relevant diagnoses in language learning as well as pointing out critical parameters for expressive speech synthesis.
An important objective is to reduce the mismatch between natural and synthetic speech generated with an articulatory model approximating the vocal tract through designing more precise articulatory models, developing new methods to acquire tridimensional Magnetic Resonance Imaging (MRI) data of the entire vocal tract together with denoised speech signals, and evaluating several approaches for acoustic simulation.
Expressive audiovisual synthesis
Speech is considered as a bimodal communication means involving acoustic and visual components. An important goal of audiovisual text-to-speech is to synthesize bimodal signals that are intelligible, both acoustically and visually. Thus, we continue working on the visual component through a tongue model and a lip model. Another challenging research goal is to add expressivity.
Categorization of sounds and prosody for native and non-native speech
For foreign language learning, the aim is to provide automatic feedback to language learners with respect to prosody and pronunciation of the sounds. Concerning the mother tongue we are interested in monitoring the sound categorization process in the long term (mainly at primary school) and its relation with the learning of reading and writing skills, especially for children with language deficiencies.