Claire Gardent: “Text and Knowledge Graphs: Similarity and Generation”

For a new seminar of the DAML pole of LIX, we will be happy to hear Claire Gardent, senior researcher at CNRS, on “Text and Knowledge Graphs: Similarity and Generation”.

The presentation will be held on Tuesday, April 29 at 9AM.  You can either attend in person, in room Henri Poincaré (RDC, Alan Turing building). Or online on Webex at the link

Abstract:

Models that can estimate the similarity between a text and a knowledge graph (KG) are useful both to evaluate and to guide KG-to-Text generation models.
In this talk, I will introduce three different ways of learning such models.

I will start by presenting an embedding model for (KG, English Text) pairs, showing (i) that while this model is trained on silver data, the model learns aligned representations that are suitable for retrieval and (ii) that fine-tuning it on gold data yields a similarity metrics that outperforms or matches state-of-the art metrics in terms of correlation with human judgments even though, unlike them, it does not require a reference text to compare against.

The second part of the talk will focus on showing how a similar model, trained on multilingual gold data, can be used to guide KG-to-Text Generation using Direct Preference Optimisation. We will show that DPO guided models trained on the preference data created using our KG-Text similarity model generalise better than standard fine-tuning.

Finally, I will present a multilingual evaluation framework that is reference-less (no need for test data) and that is more fine-grained than the other two models in that it permits estimating how much a KG-to-Text Model under- or over-generates with respect to the input Knowledge Graph. Focusing on two high (English, Russian) and five low (Breton, Irish, Maltese, Welsh, Xhosa) resource languages, I will show that our metric has fair to moderate correlation with reference-based metrics, positioning it as a consistent alternative when no references are available. I will also show that our metric outperforms prior reference-less metrics in terms of correlation with existing human judgments and that additional human evaluation shows moderate to strong correlation with human annotators in assessing precision and recall.

Presentation slides can be found here.

Short bio:

Claire Gardent is a senior research scientist at the French National Center for Scientific Research (CNRS), based at the LORIA Computer Science research unit in Nancy, France.
She has worked on syntactic, semantic and discourse parsing, on question answering, on Human-Machine dialog and on computer assisted language learning, and recently, on natural language generation. In 2017, she launched the WebNLG challenge, a shared task where the goal is to generate text from DBPedia data. She has proposed neural models for simplification and summarisation; for generation for long form, multidocument input question answering; for multilingual generation from Abstract Meaning Representations and for response generation for dialog.
In 2022, she was awarded the CNRS Silver Medal and was selected by the Association of Computational Linguistics (ACL) as ACL Fellow.

Comments are closed.