Expressive speech synthesis using deep learning

Speaker: Ajinkya Kulkarni

Date and place: September 10, 2020 at 10:30 -VISIO-CONFERENCE

Abstract:

At present the speaking style of the synthesized speech signal is neutral, as a result of the type of speech data used for training text-to-speech systems. Multi-speaker expressive speech synthesis is still an open problem due to the limited availability of expressive speech corpora and time involved in the collection and annotation of such corpora for a new speaker. The goal is to investigate the development of an expressive speech synthesis system in desired speaker’s voice without requiring to acquire expressive speech data from that speaker. Expressive text-to-speech synthesis using parametric approaches is constrained by the style of the speech corpus used. The focus of this work is on investigating deep learning transfer mechanism allowing to train expressive speech synthesis models on existing expressive speech multi-speaker corpus. Then these models will be adapted and applied on the neutral speech data of the TTS speaker voice, in order to ”transfer” the expressive speech characteristics on the TTS voice to yield expressive speech synthesis.