PhD Defense: Shrey Mishra

Multimodal Extraction of Proofs and Theorems from the Scientific Literature

Thursday, July 4th, 2024 at 13:45 in Amphi Jaurès, 29 rue d’Ulm, Paris

The PhD will be defended in front of a committee formed of:

  • Elena Cabrio (Université Côte d’Azur), Reviewer
  • Mohammed Hasanuzzaman (Queen’s University Belfast), Examiner
  • Jean Ponce (ENS-PSL & NYU), Examiner
  • Pierre Senellart (ENS-PSL), PhD advisor
  • Fabian Suchanek (Télécom Paris, IP Paris), Reviewer
  • Eric Villemonte de la Clergerie (Inria Paris), Examiner

This thesis examines the extraction of mathematical statements and proofs from scholarly PDF articles by approaching it as a multimodal classification challenge. It is part of the broader TheoremKB project, which seeks to convert scientific literature into a comprehensive, open-access knowledge base of mathematical statements and their proofs. The research leverages a range of techniques from traditional machine learning to advanced deep learning architectures, including LSTMs, CNNs, Object detectors, CRFs, transformers, etc.

The study utilizes a novel combination of text, font characteristics, and bitmap images from PDF pages as separate input modalities. It proposes a modular, sequential, multimodal machine learning strategy incorporating a cross-modal attention mechanism to produce multimodal paragraph embeddings. These embeddings are processed through a novel multimodal sliding window transformer architecture that captures sequential data across paragraphs. This innovative approach does not rely on Optical Character Recognition (OCR) preprocessing, LaTeX sources during inference or custom pre-training on specialized losses, making it adept at handling multi-page documents and page breaks, typically in complex scientific, mathematical texts.

The findings indicate a marked performance improvement when moving from unimodal to multimodal processing and integrating sequential paragraph modelling, underscoring the effectiveness of the proposed method for handling intricate scholarly documents.

Comments are closed.