Back to Main Conference 2026
LREC 2026main

Coordinate Structure Extraction for Patent Claims Using Multilingual LLMs

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/36wbpiacwyxf

Abstract

This study proposes a simple, one-stage approach to coordinate structure extraction using multilingual Large Language Models (LLMs) with Translation between Augmented Natural Languages (TANL) to develop an error detection system for coordinate structure translation. Unlike conventional multi-component methods such as CoRec, our method employs an end-to-end Transformer decoder (LLM) trained via Continual Pre-Traning (CPT) and/or Supervised Fine-Tuning (SFT) on English and Japanese datasets obtained from parsed treebanks that includes coordinate structures. We evaluated the proposed models on 100 English and Japanese patent claims manually annotated with coordinate structure tags. The proposed method using open-weight models such as Llama-3.2-8B or gemma-3-4b-it significantly outperformed GPT-5 and CoRec by approximately 0.02-0.03 in F1 score for the English task. The proposed method using open-weight models such as llama-3-youko-8b and Llama-3-swallow-8B-0.1v significantly outperformed GPT-5 by approximately 0.02-0.05 in F1 score for the Japanese task. In addition, models using both English and Japanese training data significantly outperform those using monolingual training data only.

Details

Paper ID
lrec2026-main-387
Pages
pp. 4931-4941
BibKey
ishimaru-etal-2026-coordinate
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TI

    Tsukasa Ishimaru

  • TU

    Takehito Utsuro

  • MN

    Masaaki Nagata

Links