Global modeling of speech production for articulatory synthesis

SpeakerBenjamin Elie (post-doctoral fellow)

Date: January 21, 2016

Abstract:

Articulatory synthesis consists in the numerical simulation of the articulatory, mechanical and acoustic phenomena involved in speech production. Unlike the concatenative approach, it enables these phenomena to be investigated, the speech signal to be specifically designed by virtually modifying the physiological parameters of the speaker, and the acoustic clues of natural speech to be related to their articulatory origin. The global approach that is presented is based on fine modelings of speech production at several levels, namely an articualtory modeling of the deformation of the vocal tract shape as a function of time, a mechanical and geometric modeling of the glottis, as well as a numeric modeling of the acoustic propagation inside the vocal tract.

The articulatory modeling is based on the first deformation modes of the articulators (tongue, lips, jaw, velum, and larynx) computed from the contours extracted from midsagittal slices of the vocal tract obtained from static MRI. A new approach using reconstruction of articulatory films with high spatiotemporal resolution obtained by cineMRI is also presented. The midsagittal shape of the vocal tract is then modeled thanks to a small amount of parameters.

Then, a glottis model is designed in order to reproduce the self-sustaining nature of the vocal folds oscillations. Thus, their movements are driven by the aeroacoustic conditions at the glottis vicinity. The presented model allows a parallel glottal chink to be integrated to accurately simulate voiced fricatives and breathy voices. Acoustic propagation inside the vocal tract is guaranteed by solving the acoustic equations at each time step. It is based on the electric-analogy by Maeda that is adapted to a waveguide network.

Finally, a few examples of copy synthesis are presented. They aim at reproducing the natural speech of a speaker from the simultaneous acquisition of the audio signal and images of the vocal tract, or by recovering the geometry of the vocal tract using inverse techniques.