Verbal Multi Word Expressions identification on spoken transcription

Speaker: Nicolas Zampieri

Date and Room: January 16, 2020 at 10:30 – C005


Recent initiatives such as the PARSEME shared task have allowed the rapid development of Multi Word Eexpressions (MWE) identification systems.
Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input.
We study four related question on verbal MWE identification for Basque, French and Polish:
(a) the use of lemmas and/or surface forms as input features and the use of word-based or character-based embeddings to represent them
(b) the use of syntax information
(c) the convolution sub-model to improve the VMWE unseen identification
(d) the results of our system on French spoken transcription corpora.