Software and Resources

ALMAnaCH tools and resources installation tool

alpi (ALMAnaCH Linguistic Processing Installer)

alpi is a perl script for installing some of ALMAnaCH’s tools and resources locally. It can also be used to install the complete ALMAnaCH’s processing pipeline for French based on SxPipe and FRMG (see below). The alpi script automatically detects and installs dependencies.

Software and tools

alpc (ALMAnaCH Linguistic Processing Chain)

The ALMAnaCH team develops and maintains a complete linguistic processing chain for French (see our online demo). It relies on DyALog, FRMG, Lefff and SxPipe (cf. ci-dessous).

DyALog

DyALog is an environment for compiling and using logic programs and tabular parsers for natural languages, which supports a variety of formal grammars (DCG, TAG, TIG, RCG).

SYNTAX

SYNTAX is a set of tools for the automatic creation of parsers based on grammar descriptions. Grammar formalisms handled by SYNTAX range from CFGs (both deterministic and ambiguous) to TAGs, LFGs, RCGs, and others.

SxPipe

SxPipe is a modular and customizable processing chain dedicated to applying to raw corpora a cascade of surface processing steps (tokenisation, wordform detection, non-deterministic spelling correction…). It is used as a preliminary step before ALMAnaCH’s parsers (e.g., FRMG) and for surface processing (named entities recognition, text normalization, unknown word extraction and processing…).

MElt

MElt is a freely available (LGPL) statistical labeller designed in particular to train morphosyntactic analysers (part-of-speech taggers) on annotated corpora and external lexicons. MElt is distributed with a state-of-the-art POS tagging model for French, as well as with models for other languages (English, Italian, Spanish, German). MElt also includes a normalisation wrapper targeted towards the analysis of noisy texts (e.g. web-based User-Generated Content; only available for French and English).

MetaGrammar Toolkit

The MetaGrammar Toolkit provides several tools for developing and compiling TAG metagrammars. It also includes a large-scale metagrammar for French, on which the FRMG parser is based.

Ressources lexicales

Alexina

Atelier pour les LEXiques INformatiques et leur Acquisition – Développement de lexiques morphologiques et syntaxiques pour le TAL. Inclut divers outils ainsi que les lexiques Lefff (français), Leffe (espagnol), Pollex (polonais), Sklex (slovaque), DeLex (allemand), PerLex (persan), KurLex (kurde kurmanji) et SoraLex (kurde sorani). Ont été également importés dans l’architecture Alexina les lexiques morphologiques pour le néerlandais et l’italien distribués respectivement au sein du projet Alpino et du lexique Morph-it!

WOLF

WOLF (Wordnet Libre du Français) is a freely available semantic lexicon (Wordnet) for French.

UDLexicons

UDLexicons is a collection of morphological lexicons in the CoNLL-UL format, an extension of the CoNLL-U format proposed in (More et al. 2018) in the context of the Universal Dependencies initiative.

EtymDB

EtymDB is an etymological database extracted from wiktionary.

Corpus

French Social Media Bank

Corpus arboré constitué de données extraites de réseaux sociaux (Facebook, Twitter) et de forums de discussion (Doctissimo, JeuxVideos.com). L’intérêt principal de ce corpus est de fournir des données annotées sur des textes de qualité moyenne à très bruitée.

Sequoia Treebank

Corpus arboré de 3200 phrases, provenant d’Europarl, du corpus l’Est Republicain, de la Wikipedia française, et de l’agence européenne du médicament (documents extraits du corpus EMEA).
Chaque phrase a été manuellement annotée pour les catégories morpho-syntaxiques et la structure syntagmatique, en suivant les guides d’annotation du French TreeBank.
Les arbres de constituants ont ensuite été automatiquement convertis en arbres de dépendances de surface.

OSCAR / goclassy

Huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus.

Comments are closed.