Speaker: Amal Houidhek
Date: November 30, 2017
The first part of the presentation investigates statistical parametric speech synthesis (SPSS) of Modern Standard Arabic (MSA): Hidden Markov Models (HMM)-based speech synthesis system relies on a description of speech segments corresponding to phonemes, with a large set of features that represent phonetic, phonologic, linguistic and contextual aspects. When applied to MSA two specific phenomena have to be taken into account, the vowel quantity and the consonant gemination. This work studies thoroughly the modeling of these phenomena through various approaches: as for example, the use of different units for modeling short vs. long vowels and the use of different units for modeling simple vs. geminated consonants. These approaches are compared to another one which merges short and long variants of a vowel into a single unit and, simple and geminated variants of a consonant into a single unit (these characteristics being handled through the features associated to the sound). The second part will be focusing on deep learning in speech synthesis. In the standard SPSS system, acoustic parameters are predicted from linguistic features using the decision tree, which seems to be not always efficient to model the complex dependencies between linguistic and acoustic features. A possible solution is to replace the decision tree with DNN, experiments show it improves the speech quality.