Phylogenetic multi-lingual dependency parsing. We used the phylogenetic tree to guide the learning of multi-lingual dependency parsers leveraging languages structural similarities, drawing inspiration from multi-task learning. Experiments on data from the Universal Dependency project show that phylogenetic training is beneficial to low resourced languages and to well furnished languages families. As a side product of phylogenetic training, our model is able to perform zero-shot parsing of previously unseen languages.
Probabilistic end-to-end graph-based semi-supervised Learning. We have also worked on graph-based semi-supervised learning in settings where a graph describing the relationships between data points is not available. We propose a method to jointly learn the graph and the parameters of a semi-supervised model using a probabilistic framework. We empirically show our proposal achieves competitive results in a variety of datasets.
Learning word embeddings from multi-modal inputs. In our second work on this topic, led by our jointly supervised postdoc M. Ailem, we developed a new model which jointly learns word embeddings from text and extracts latent visual information, from pre-computed visual features, that could supplement the linguistic embeddings in modeling the co-occurrence of words and their contexts in a corpus. More specifically, we proposed a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. One advantage of our joint modeling is that it allows two-way interaction. On one hand, the linguistic information can guide the extraction of latent visual factors. On the other hand, the extracted visual factors can improve the modeling of word-context co-occurrences in text data. Another appealing property of our model is its natural ability to propagate perceptual information to the embeddings of words lacking visual features (e.g., abstract words) during learning. Extensive experimental studies validated the usefulness of proposed model on the tasks of assessing pairwise word similarity and image/caption retrieval.
Visits between the two groups
- Melissa Ailem: visit to Magnet – 2 weeks (June 2018) – brainstorming research ideas with A. Bellet, P. Denis and M. Tommasi on privacy
and fairness in NLP.
- Melissa Ailem: visit to Magnet – 1 week (November 2018) – working on learning word embeddings under fairness constraints with A. Bellet, P.
Denis and M. Tommasi.
During this second year of the project, we have focused most of our effort onto the first two research directions above, namely to derive methods to tailor/adapt embeddings to a given NLP task and use richer contexts for learning the embeddings. We have also started working on the new direction outlined in last year’s report: the use of multi-modal inputs to learn word embeddings. This progress is summarized below.
Transfer learning of word embeddings. We continued our main long term joint effort towards understanding and improving transfer learning of representations in natural language processing tasks, in particular during Mathieu Dehouck’s 1-month visit to USC. We worked with pairs of 8 main and auxiliary NLP tasks (extended form tasks mentioned in previous year’s report). More specifically, we looked at transfer learning from low-level tasks (such as part-of-speech tagging, named entity recognition, chunking, word polarity classification) to high-level tasks (e.g., semantic relatedness, textual entailment, sentiment analysis). In contrast to a common belief in the NLP community that transfer learning between these tasks should be possible, we discovered that the widely-used technique in which word representations act as a medium of transfer only leads to limited improvements. These results were presented by Fei Sha at the Inria@SiliconValley workshop (BIS’2017), and a first version was submitted to a major NLP conference, where it was considered promising (despite containing mainly negative results) but too preliminary. This work will be continued in the third year of the project, aiming for a new submission.
Online learning of task-specific word representations. We have finalized our work on learning jointly the word embeddings and the classifier in an online fashion. The problem is formalized as solving a bi-convex constrained optimization at each iteration with a Passive-Aggressive online learning algorithm. We provided a theoretical analysis of the algorithm and evaluated it on NLP classification problems (text classification, sentiment analysis). The empirical results show that our approach can greatly improve upon pre-trained word embeddings.
Word embeddings for cross-lingual parsing. We have finalized our work on the problem of cross-lingual dependency parsing by jointly learning language-universal word embeddings from morpho-syntactic information which can be trained on any set of languages. These embeddings are also learned based on structural, dependency-based contexts. This work is thus in line with our first and second research objectives. Combined with standard language-specific features, such embeddings can achieve significant improvements over monolingual baselines.
Learning word embeddings from multi-modal inputs. As planned, we have started investigating this new direction by combining word embeddings with visual information (images). Although the main goal of this work is to improve zero-shot learning of visual recognition models, this is done by projecting the class semantic representation (typically the word representation of the class name) and the visual examples into a common space where the transformed word representation can predict well the visual exemplars of the class.
Release of the Mangoes toolbox. We have released the first version of Mangoes (referred to as Magneto in last year’s report), an open-source Python toolbox for constructing and evaluating word embeddings. While this toolbox was developed by Magnet in the context of an ADT (Action de Développement Technologique), we expect it to serve as a joint development platform for Magnet and Fei Sha’s team in the future.
Visits between the two groups
- Mathieu Dehouck: visit to USC – 1 month (September 2017) – working on the transfer learning of word embeddings project, reviewing recent literature and discussing follow-up directions.
- Aurélien Bellet, Pascal Denis: visit to USC – 1 week (December 2017) – finalizing the work on transfer learning, and working with the new joint postdoc Melissa Ailem.
During this first year of the project, we have focused most of our effort on our first research direction above, namely to derive methods to tailor word embeddings to a given NLP task, either by adapting existing embeddings or by learning task-specific embeddings from scratch.
The main long term joint effort between Magnet and F. Sha’s team has been towards understanding how natural language processing tasks are related to each other in the context of word representations. Building on the significant experience of F. Sha’s team in transfer learning and of Magnet in linguistic representations, this ongoing work investigates two questions. First, starting from word embeddings that are learned in an unsupervised manner from large text corpora, how well does an adaptation of such embeddings to one task perform in another task? Second, potentially leveraging the knowledge about task relatedness, how can we learn high-quality task-agnostic word embeddings that are meaningful for various tasks at hand and/or easily transferable to a new task? During M. Dehouck’s 1-month visit to UCLA and S. Changpinyo’s 10-day visit to Magnet, several existing techniques for adapting word embeddings to multiple NLP tasks were reviewed and empirically compared, and a novel adaptation technique was proposed. Focusing on a set of semantic and syntactic NLP tasks (part-of-speech tagging, sentiment analysis, named entity recognition and dependency parsing), we started to conduct experiments to assess how initial and adapted word embeddings perform on those tasks and analyze task relatedness. This work will be continued in the second year of the project, aiming for a submission at a major NLP conference.
We have also worked on several more targeted problems. First, we have proposed a semi-supervised approach to learn word embeddings for the task of implicit discourse relation identification. The resulting representations outperform off-the-shelf word embeddings and achieve state-of-the-art performance on this problem. Second, we tackled the problem of cross-lingual dependency parsing by jointly learning language-universal word embeddings from morpho-syntactic information which can be trained on any set of languages. It is worth noting that these embeddings are learned based on structural, dependency-based contexts, in line with our second research objective. Combined with standard language-specific features, such embeddings can achieve significant improvements over monolingual baselines. Lastly, we have proposed approaches to adapt word embeddings for text classification tasks by learning an earth mover type of distance and by joint online learning of the embeddings and the classifier.
Finally, during the Master internship of T. Liétard, we have started working towards our third research objective by learning a weighted similarity graph between entities for the problem of coreference resolution, incorporating both task-specific features and off-the-shelf word embeddings.
We conclude by mentioning that in 2016, Magnet has started to develop, through an ADT (Action de Développement Technologique), a software called Magneto dedicated to learning and evaluating word embeddings. We expect that Magneto serves as a joint development platform for Magnet and F. Sha’s team towards the third year of LEGO.
Visits between the two groups
- A. Bellet, P. Denis: visit to UCLA – 1 week (July 2016) – defining research priorities in preparation of the student visits, half-day workshop on vision+language.
- M. Dehouck: visit to UCLA – 1 month (September 2016) – multi-task word embeddings, talk on cross-lingual dependency parsing.
- S. Changpinyo: visit to Magnet – 10 days (October 2016) – multi-task word embeddings, talk on zero-shot learning.