Back to Main Conference 2014
LREC 2014main

Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/3o9mbrpihxh2

Abstract

We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.

Details

Paper ID
lrec2014-main-264
Pages
pp. 1327-1334
BibKey
degaetano-ortlieb-etal-2014-data
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • SD

    Stefania Degaetano-Ortlieb

  • PF

    Peter Fankhauser

  • HK

    Hannah Kermes

  • EL

    Ekaterina Lapshinova-Koltunski

  • NO

    Noam Ordan

  • ET

    Elke Teich

Links