Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2026-ws-cmlc-14

Managing Growth in a National Corpus: The Hungarian National Corpus 3.0 (MNSZ3)

Paper Fields

Click the edit button next to a field to report a correction.

Title

Managing Growth in a National Corpus: The Hungarian National Corpus 3.0 (MNSZ3)

Abstract

The third generation of the Hungarian National Corpus (MNSZ3) aims to provide a large-scale, curated, and well-described corpus resource needed for the sustainable digital presence of Hungarian. Building on the domain structure and proportions of MNSZ2 (v2.0.5; 1.04 billion running words), the project targets a substantial increase in scale while also strengthening the coverage and metadata description of Hungarian language use outside Hungary. MNSZ3 retains the six traditional domains of the earlier corpus—press, fiction, scientific, official, personal, and transcribed spoken language—and is planned to reach approximately 10 billion tokens. This paper presents the motivation and design principles of the project, outlines the practical decisions and procedures used in data collection and cleaning, and discusses the annotation strategy developed for large-scale processing. In planning the linguistic analysis, we build on the complementary strengths of HuSpaCy and e-magyar: HuSpaCy provides the unified and efficient UD-oriented processing backbone, while e-magyar (emMorph) is preserved as an explicit additional layer for morphology and lemmatisation.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.