Back to Main Conference 2016
LREC 2016main

The OpenCourseWare Metadiscourse (OCWMD) Corpus

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/28ygarja7uir

Abstract

This study describes a new corpus of over 60,000 hand-annotated metadiscourse acts from 106 OpenCourseWare lectures, from two different disciplines: Physics and Economics. Metadiscourse is a set of linguistic expressions that signal different functions in the discourse. This type of language is hypothesised to be helpful in finding a structure in unstructured text, such as lectures discourse. A brief summary is provided about the annotation scheme and labelling procedures, inter-annotator reliability statistics, overall distributional statistics, a description of auxiliary data that will be distributed with the corpus, and information relating to how to obtain the data. The results provide a deeper understanding of lecture structure and confirm the reliable coding of metadiscursive acts in academic lectures across different disciplines. The next stage of our research will be to build a classification model to automate the tagging process, instead of manual annotation, which take time and efforts. This is in addition to the use of these tags as indicators of the higher level structure of lecture discourse.

Details

Paper ID
lrec2016-main-279
Pages
pp. 1770-1776
BibKey
alharbi-hain-2016-opencourseware
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • GA

    Ghada Alharbi

  • TH

    Thomas Hain

Links