COMPRISE Weakly Supervised Speech-to-Text (STT) and COMPRISE Weakly Supervised Natural Language Understanding (NLU) have been released!

COMPRISE Weakly Supervised STT and COMPRISE Weakly Supervised NLU are now publicly available. Weakly supervised learning addresses the problem that, for many relevant tasks, classical supervised learning methods require the collection and manual transcription or labelling of large amounts of speech or text data. This classical approach is both time-consuming and expensive.

Main features

COMPRISE Weakly Supervised STT and COMPRISE Weakly Supervised NLU are innovative software tools developed by Inria and Saarland University for automatic data transcription or labelling and training of Speech-to-Text (STT) and Natural Language Understanding (NLU) models.

 

COMPRISE Weakly Supervised STT can be used as a standalone tool or as part of the COMPRISE Platform. It consists of two modules:

 

• an Automated Transcription module that processes untranscribed speech utterances and outputs one or more text transcriptions for every utterance that can exploit specific information about the dialogue domain;
• a Machine Learning module that takes the transcribed sentences as inputs (and possibly additional manually transcribed sentences), quantifies their reliability, and outputs trained acoustic and language models to be used by a Speech-to-Text system.

 

COMPRISE Weakly Supervised NLU can also be used as a standalone tool or as part of the  COMPRISE Platform. It consists of two modules:

 

• an Automated Sequence Labelling module that allows for automatic or semi-automatic labelling by means of Natural Language Processing technologies;
• a Machine Learning module that combines manually annotated and automatically annotated labels for the model to gradually learn what the differences are between the two and to better predict the real labels for a production system.

Business impact

COMPRISE Weakly Supervised STT and COMPRISE Weakly Supervised NLU are meant for companies in the field of speech and language technologies. Their main value is to reduce the need for manual transcription, data labelling or postediting.
They are especially useful for small and medium-sized companies, as they allow for a significant reduction to both the labelling cost and the time to market.
These tools are available to everyone as open source software and can also be licensed to any stakeholder in a proprietary form.

You can access the code and documentation at the following link:
https://www.compriseh2020.eu/software/.

Additionally, you can reach us via our social media accounts:

LinkedIn: https://www.linkedin.com/company/comprise-h2020
Twitter: https://twitter.com/compriseh2020