OSCAR / goclassy

OSCAR is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture, which is freely distributed here. You can download it here.

If you use OSCAR or goclassy, please cite this paper.

Comments are closed.