Title

The OPUS corpus - parallel and free:http://logos.uio.no/opus

Author(s)

Jörg Tiedemann (1), Lars Nygaard (2)

(1) Department of Linguistics and Philology, Uppsala University, Box 635, S-751 26 Uppsala, Sweden, joerg@stp.ling.uu.se; (2) Tekstlaboratoriet HF, University of Oslo, Postboks 1102 Blindern, 0317 Oslo, lars.nygaard@ilf.uio.no

Session

P12-W

Abstract

The OPUS corpus is a growing collection of translated documents collected from the internet. The current version contains about 30 million words in 60 languages. The entire corpus is sentence aligned and it also contains linguistic markup for certain languages.

Keyword(s)

Parallel corpora, web corpus, open source

Language(s)

60 languages

Full Paper

320.pdf