Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-ws-resourceful-10

Struct2Unstruct: Creating Tender NER Datasets from Structured Procurement Records using Large Language Models

View lrec2026-ws-resourceful-10.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

Struct2Unstruct: Creating Tender NER Datasets from Structured Procurement Records using Large Language Models

Abstract

Named Entity Recognition (NER) in the tender and procurement domain is critical for tasks such as contract monitoring, supplier analysis, and compliance tracking. However, unlike general-purpose NER, no open-source datasets exist for Tender NER, largely due to data sensitivity and confidentiality restrictions. This scarcity limits the development of automated entity extraction models. To address this gap, we propose struct2unstruct, a data preparation pipeline that generates and annotates tender-specific datasets using large language models (LLMs). Starting from structured procurement data published by the Singapore government (2015–2021) available in English language, we employ Llama-3 to generate synthetic tender narratives in multiple writing styles, ensuring each contains at least one tender-related entity. Post-processing steps correct inconsistencies in dates, symbols, and entity formats. Entities are then annotated using a BIO tagging scheme through deterministic alignment with structured fields, followed by expert validation to ensure accuracy. This study focuses on data preparation and evaluation, not model training. The resulting dataset provides a scalable resource for future Tender NER research in low-resource environments. By releasing both the dataset and pipeline as open-source resources, we establish a foundation for advancing domain-adapted information extraction and automated tender entity recognition.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.