Internal Team Seminar: Kun Zhang

We continue our series of team seminars with our own Kun Zhang, who will present his PhD thesis titled “Contributions to Evaluating and Improving the Faithfulness for Text Generation” as a rehearsal for his PhD defense.
The seminar will take place on Monday 14th of April, 3-4pm, at the Thomas Flowers room on the 1st floor (close to our offices). The presentation will last approximately 45 minutes, followed by feedback and questions to Kun.

Abstract:

Ensuring the factual faithfulness in text generation is a critical challenge, especially in knowledge-grounded tasks where the generated text must accurately reflect the content given by structured or unstructured input data. While modern text generation models have achieved good fluency and coherence, they frequently generate hallucinated outputs, for instance, ignoring crucial details, misinterpreting relationships, or introducing contradictions. Towards addressing these issue, this thesis presents new methods for both evaluating and improving the factual faithfulness of text generation.

The first contribution of the thesis is FactSpotter, a novel metric designed to assess the faithfulness in graph-to-text generation. Unlike approaches based on n-grams or embeddings of language models, FactSpotter evaluates whether each key fact in the input knowledge graph is correctly verbalized in the generation. It leverages a self-supervised classifier to distinguish between faithful and unfaithful data representations. Beyond triple-level evaluation for graph-to-text generation, FactSpotter can also be integrated into beam search progress as a soft constraint, encouraging more faithful text generation without compromising fluency.

Second, this thesis explores cross-text factual consistency verification, assessing whether multiple texts convey the same factual content. A structured discourse representation format is introduced to address the limitations of traditional triple-based representations. This format captures not only richer atomic details, such as direct and indirect objects, adverbials, and complements, but also discourse-level relations, including temporality, comparison, and contingency. A new dataset, DiscInfer, is annotated to train entailment-based models to detect factual inconsistencies at both the atomic and discourse levels.

Empirical results show that the methods proposed in the thesis enhance factual consistency evaluation, and faithful natural language generation. FactSpotter-guided decoding effectively mitigates hallucinations in structured-text generation, while structured discourse representation strengthens factual consistency verification between texts. Together, these contributions help to build more trustworthy text generation systems, offering insights into improving fact-grounded generation across various applications.