Beyond PARSEVAL
Towards Improved Evaluation Measures for Parsing Systems


Overview

The PARSEVAL metrics for evaluating the accuracy of parsing systems have underpinned recent advances in stochastic parsing with grammars learned from treebanks (most prominently the Penn Treebank of English). However, a new generation of parsing systems is emerging based on different underlying frameworks and covering other languages. PARSEVAL is not appropriate for many of these approaches: the NLP community therefore needs to come together and agree on a new set of parser evaluation standards.

Motivation and Aims

In line with increasing interest in fine-grained syntactic and semantic representations, stochastic parsing is currently being applied to several high level syntactic frameworks, such as unification-based grammars, tree-adjoining grammars and combinatory categorial grammars. A variety of different types of training data are being used, including dependency annotations, phrase structure trees, and unlabelled text. Other researchers are building parsing systems using shallower frameworks, based for example on finite-state transducers. Many of these novel parsing approaches are using alternative evaluation measures -- based on dependencies, valencies, or exact or selective category match -- since the PARSEVAL measures (of bracketing match with respect to atomic-labelled phrase structure trees) cannot be applied, or are uninformative.

The field is therefore confronted with a lack of common evaluation metrics, and also of appropriate gold standard evaluation corpora in languages other than English. We need a new and uniform scheme for parser evaluation that covers both shallow and deep grammars, and allows for comparison and benchmarking across different syntactic frameworks and different language types.

A previous LREC-hosted workshop on parser evaluation in 1998 (seehttp://ceres.ugr.es/~rubio/elra/parsing.htm) brought together a number of researchers advocating parser evaluation based on dependencies or grammatical relations as a viable alternative to the PARSEVAL measures.

The aim of this workshop is to start an initiative by bringing together four relevant parties:

The workshop will provide a forum for discussion with the aim of defining a new parser evaluation metric; we also intend the workshop to kick off a sustained collaborative effort into building or deriving sufficiently large evaluation corpora, and possibly training corpora appropriate to the new metric. To maintain the momentum of this initiative we will work towards setting up a parsing competition based on new standard evaluation corpora and evaluation metric.

Topics of Interest

The workshop organisers invite papers focussing on:

Papers on the following topics will be particularly welcome:

Workshop Agenda

The one-day workshop will consist of (30-minute) paper presentations, a panel session, and an extended open session at which important results of the workshop will be summarised and discussed.

As a follow-up, we hope to arrange a half-day meeting outside the workshop format to discuss concrete action plans, create working groups, and plan future collaboration.

Workshop Organisers

John Carroll University of Sussex (UK) John.Carroll@cogs.susx.ac.uk
Anette Frank DFKI GmbH, Saarbruecken (Germany)
Dekang Lin University of Alberta (Canada)
Detlef Prescher DFKI GmbH, Saarbruecken (Germany)
Hans Uszkoreit DFKI GmbH and Saarland University, Saarbruecken (Germany)

Programme Committee

Salah Ait-MokhtarXRCE Grenoble
Thorsten Brants Xerox PARC
Gosse Bouma Rijksuniversiteit Groningen
Ted Briscoe University of Cambridge
John Carroll University of Sussex
Jean-Pierre Chanod XRCE Grenoble
Michael Collins AT&T Labs-Research
Anette Frank DFKI Saarbruecken
Josef van GenabithDublin City University
Gregory Grefenstette Clairvoyance, Pittsburgh
Julia Hockenmaier University of Edinburgh
Dekang Lin University of Alberta
Chris ManningStanford University
Detlef Prescher DFKI Saarbruecken
Khalil Sima'an University of Amsterdam
Hans Uszkoreit DFKI Saarbruecken and Saarland University

Submissions

Abstracts for workshop contributions should not exceed two A4 pages (excluding references). An additional title page should state: the title; author(s); affiliation(s); and contact author's e-mail address, as well as postal address, telephone and fax numbers.

Submission is to be sent by email, preferably in Postscript or PDF format, to John Carroll before 1st February 2002. Abstracts will be reviewed by at least 3 members of the program committee.

Formatting instructions for the final full version of papers will be sent to authors after notification of acceptance.

Important Dates

Deadline for receipt of abstracts 1st February 2002
Notification of acceptance 22nd February 2002
Camera-ready final version for workshop proceedings 12th April 2002
Workshop 2nd June 2002

Workshop Registration Fees

The registration fees for the workshop are:

All attendees will receive a copy of the workshop proceedings.