Back to Workshops
Proceedings of the 12th Web as Corpus Workshop
LREC 2020 Workshop
undefined, undefined 11 May 2020 - 16 May 2020 8 papers
Show20per page
1
Current Challenges in Web Corpus Building
Miloš Jakubíček, Vojtěch Kovář, Pavel Rychlý, Vit Suchomel
pp. 1-4 DOI: 10.63317/5hpusmwfudpu
2
Out-of-the-Box and into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools
Adrien Barbaresi, Gaël Lejeune
pp. 5-13 DOI: 10.63317/3vjqvd9vwtch
3
From Web Crawl to Clean Register-Annotated Corpora
Veronika Laippala, Samuel Rönnqvist, Saara Hellström, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi, Sampo Pyysalo
pp. 14-22 DOI: 10.63317/3ewgn53qox7h
4
Building Web Corpora for Minority Languages
Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén
pp. 23-32 DOI: 10.63317/29i28sv6ykra
5
The ELTE.DH Pilot Corpus – Creating a Handcrafted Gigaword Web Corpus with Metadata
Balázs Indig, Árpád Knap, Zsófia Sárközi-Lindner, Mária Timári, Gábor Palkó
pp. 33-41 DOI: 10.63317/3zmqt8f6rop7
6
Hypernym-LIBre: A Free Web-based Corpus for Hypernym Detection
Shaurya Rawat, Mariano Rico, Oscar Corcho
pp. 42-49 DOI: 10.63317/5ipbwy62x4z7
7
A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging
Shabnam Behzad, Amir Zeldes
pp. 50-56 DOI: 10.63317/2uksnupeninw
8
Streaming Language-Specific Twitter Data with Optimal Keywords
Tim Kreutz, Walter Daelemans
pp. 57-64 DOI: 10.63317/2cxom8fhdgb5
Showing all 8 papers