HomeLREC 2026WorkshopsLT4HALAlrec2026-ws-lt4hala-32
Back to LT4HALA 2026
LREC 2026workshop

AnandaSky: A Vision–Language Model for Line-Level Transcription of Historical Sinographic Documents

Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026

DOI:10.63317/3pk7cv8hxzod

Abstract

We present AnandaSky, a vision–language model for line-level transcription of historical sinographic documents. The model combines a compact high-resolution visual encoder with global attention, 10px patches, uncompressed visual prefix and a Qwen3-0.6B autoregressive decoder. It is trained at scale on 4M annotated lines from documents produced in China and Korea between the 8th and 20th centuries. Across in-domain and held-out public benchmarks, AnandaSky achieves sub-1% CER on five of eight datasets, sets a new state of the art on MTHv2 with 0.92% CER, and shows strong transfer to unseen collections. For EvaHan 2026, full fine-tuning on the organizers’ data to match task-specific annotation conventions reduces CER relative to the official baseline by 5.2% on prints and 12.1% on manuscripts, despite using one-tenth as many parameters.

Details

Paper ID
lrec2026-ws-lt4hala-32
Pages
pp. 311-321
BibKey
brisson-etal-2026-anandasky
Editors
Rachele Sprugnoli, Marco Passarotti
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • CB

    Colin Brisson

  • AK

    Ayoub Kahfy

  • FC

    Frédéric Constant

  • MB

    Marc Bui

Links