Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
SumeCzech: Large Czech News-Based Summarization Dataset
Paper Fields
Click the edit button next to a field to report a correction.
SumeCzech: Large Czech News-Based Summarization Dataset
Document summarization is a well-studied NLP task. With the emergence of artificial neural network models, the summarization performance is increasing, as are the requirements on training data. However, only a few datasets are available for Czech, none of them particularly large. Additionally, summarization has been evaluated predominantly on English, with the commonly used ROUGE metric being English-specific. In this paper, we try to address both issues. We present SumeCzech, a Czech news-based summarization dataset. It contains more than a million documents, each consisting of a headline, a several sentences long abstract and a full text. The dataset can be downloaded using the provided scripts available at http://hdl.handle.net/11234/1-2615. We evaluate several summarization baselines on the dataset, including a strong abstractive approach based on Transformer neural network architecture. The evaluation is performed using a language-agnostic variant of ROUGE.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.