Back to Workshops

Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024

LREC-COLING 2024 Workshop

undefined, undefined 20 May 2024 - 25 May 2024 15 papers
Show20per page
01

On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Alignment of Embeddings

Guillem Ramírez, Rumen Dangovski, Preslav Nakov, Marin Soljacic

pp. 1-11 DOI: 10.63317/2bj7yqvykabj
02

Modeling Diachronic Change in English Scientific Writing over 300+ Years with Transformer-based Language Model Surprisal

Julius Steuer, Marie-Pauline Krielke, Stefan Fischer, Stefania Degaetano-Ortlieb, Marius Mosbach, Dietrich Klakow

pp. 12-23 DOI: 10.63317/3ubzf7o8hspf
03

PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese

Tomás Freitas Osório, Bernardo Leite, Henrique Lopes Cardoso, Luís Gomes, João Rodrigues, Rodrigo Santos, António Branco

pp. 24-34 DOI: 10.63317/58dx569uus43
04

Invited Talk: The Way Towards Massively Multilingual Language Models

François Yvon

DOI: 10.63317/4ba4ywtdjize
05

Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets

Zi Long, ZhenHao Tang, Xianghua Fu, Jian Chen, Shilong Hou, Jinze Lyu

pp. 36-50 DOI: 10.63317/38c634uuaze7
06

Exploring the Potential of Large Language Models in Adaptive Machine Translation for Generic Text and Subtitles

Abdelhadi Soudi, Mohamed Hannani, Kristof Van Laerhoven, Eleftherios Avramidis

pp. 51-58 DOI: 10.63317/4yv94dgocrq2
07

INCLURE: a Dataset and Toolkit for Inclusive French Translation

Paul Lerner, Cyril Grouin

pp. 59-68 DOI: 10.63317/3bsugstpj7gs
08

BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation

Sourav Saha, Zeshan Ahmed Nobin, Mufassir Ahmad Chowdhury, Md. Shakirul Hasan Khan Mobin, Mohammad Ruhul Amin, Sudipta Kar

pp. 69-84 DOI: 10.63317/52ohvnsga2w8
09

Creating Clustered Comparable Corpora from Wikipedia with Different Fuzziness Levels and Language Representativity

Anna Laskina, Eric Gaussier, Gaelle Calvary

pp. 85-93 DOI: 10.63317/2wbzsfwrk7mr
10

EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research

Marc Kupietz, Piotr Banski, Nils Diewald, Beata Trawinski, Andreas Witt

pp. 94-103 DOI: 10.63317/48r4f5pbusfm
11

Building Annotated Parallel Corpora Using the ATIS Dataset: Two UD-style treebanks in English and Turkish

Neslihan Cesur, Aslı Kuzgun, Mehmet Kose, Olcay Taner Yıldız

pp. 104-110 DOI: 10.63317/3s2ejhffugci
12

Bootstrapping the Annotation of UD Learner Treebanks

Arianna Masciolini

pp. 111-117 DOI: 10.63317/4skthmvb3xzu
13

SweDiagnostics: A Diagnostics Natural Language Inference Dataset for Swedish

Felix Morger

pp. 118-124 DOI: 10.63317/4obkf9gx7gnm
14

Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish

Deniz Zeyrek, Giedrė Valūnaitė Oleškevičienė, Amalia Mendes

pp. 125-134 DOI: 10.63317/5hpcf7w3vjoz
15

mini-CIEP+ : A Shareable Parallel Corpus of Prose

Annemarie Verkerk, Luigi Talamo

pp. 135-143 DOI: 10.63317/2dzt2k9ufksi

Showing all 15 papers