HomeLREC 2022WorkshopsLAWlrec2022-ws-law-01
Back to LAW 2022
LREC 2022workshop

Automatic Approach for Building Dataset of Citation Functions for COVID-19 Academic Papers

Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

DOI:10.63317/2tz5kmc28egf

Abstract

This paper develops a new dataset of citation functions of COVID-19-related academic papers. Because the preparation of new labels of citation functions and building a new dataset requires much human effort and is time-consuming, this paper uses our previous citation functions that were built for the Computer Science (CS) domain, which consists of five coarse-grained labels and 21 fine-grained labels. This paper uses the COVID-19 Open Research Dataset (CORD-19) and extracts 99.6k random citing sentences from 10.1k papers. These citing sentences are categorized using the classification models built from the CS domain. The manually check on 475 random samples resulted accuracies of 76.6% and 70.2% on coarse-grained labels and fine-grained labels, respectively. The evaluation reveals three findings. First, two fine-grained labels experienced meaning shift while retaining the same idea. Second, the COVID-19 domain is dominated by statements highlighting the importance, cruciality, usefulness, benefit, consideration, etc. of certain topics for making sensible argumentation. Third, discussing State of The Arts (SOTA) in terms of their outperforming previous works in the COVID-19 domain is less popular compared to the CS domain. Our results will be used for further dataset development by classifying citing sentences in all papers from CORD-19.

Details

Paper ID
lrec2022-ws-law-01
Pages
pp. 1-7
BibKey
basuki-tsuchiya-2022-automatic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • SB

    Setio Basuki

  • MT

    Masatoshi Tsuchiya

Links