Return to Software

COMPRISE Voice Transformer

An open source tool for voice de-identification.

Overview

The COMPRISE Voice Transformer is part of the software package developed by COMPRISE.
This tool increases voice privacy by taking speech audio as input and converting the speaker’s voice to output the spoken contents of the input file but spoken with another person’s voice.
This conversion of the voice, anonymization, uses x-vectors and neural waveform models, modify the original speakers voice into an artificial voice while maintaining some of the pitch changes from the original utterances.

Features

  • The transformation can be run locally on a single audio file using script but more importantly, it also supports Dockerized RESTfull services.
  • The software ensures that any information extracted from the transformed voice can hardly be traced back to the original speaker, as validated through state-of-the-art biometric protocols
  • The software preserves the utility of the transformed data for training Speech-to-Text models
  • The software leverages cutting-edge deep learning and speech processing technology
  • The data output by this software can be passed on to a Voice Builder, which further discards sensitive words and expressions.

From the Voice Privacy Challenge 2020 repository :



The baseline system uses several independent models :

  1. ASR acoustic model to extract BN features (1_asr_am) – trained on LibriSpeech-train-clean-100 and LibriSpeech-train-other-500
  2. X-vector extractor (2_xvect_extr) – trained on VoxCeleb 1 & 2.
  3. Speech synthesis (SS) acoustic model (3_ss_am) – trained on LibriTTS-train-clean-100.
  4. Neural source filter (NSF) model (4_nsf) – trained on LibriTTS-train-clean-100.
Please visit the challenge website for more information about the Challenge and this method.


Requirements

System requirements are subject to change without notice.

Audio file requirements

  • Sampling rate : 16 kHz
  • File format : Flac

System requirements

Linux operating system with a CUDA-capable NVIDIA graphics card running Docker 18.09 or newer.

  • For CUDA-capable NVIDIA graphic cards refer here.
  • For NVIDIA Container Toolkit compatible Linux distributions refer here.

Documentation & Download

         Installation & Usage guide (video)

         Functionality (video)

         Code and documentation (Gitlab)

For scientific details and experimental results, refer to the following paper: B. M. L. Srivastava, N. Tomashenko, X. Wang, E. Vincent, J. Yamagishi, M. Maouche, A. Bellet, M. Tommasi, “Design choices for x-vector based speaker anonymization“, in Interspeech, 2020.

Support

comprise-vt@inria.fr