Parsing models for identifying multiword expressions

65Citations
Citations of this article
194Readers
Mendeley users who have this article in their library.

Abstract

Multiword expressions lie at the syntax/semantics interface and have motivated alternative theories of syntax like Construction Grammar. Until now, however, syntactic analysis and multiword expression identification have been modeled separately in natural language processing. We develop two structured prediction models for joint parsing and multiword expression identification. The first is based on context-free grammars and the second uses tree substitution grammars, a formalism that can store larger syntactic fragments. Our experiments show that both models can identify multiword expressions with much higher accuracy than a state-of-theart system based on word co-occurrence statistics. We experiment with Arabic and French, which both have pervasive multiword expressions. Relative to English, they also have richer morphology, which induces lexical sparsity in finite corpora. To combat this sparsity, we develop a simple factored lexical representation for the context-free parsing model. Morphological analyses are automatically transformed into rich feature tags that are scored jointly with lexical items. This technique, which we call a factored lexicon, improves both standard parsing and multiword expression identification accuracy. © 2013 Association for Computational Linguistics.

References Powered by Scopus

Multiword expressions: A pain in the neck for NLP

691Citations
N/AReaders
Get full text

Learning accurate, compact, and interpretable tree annotation

602Citations
N/AReaders
Get full text

A reference Grammar of modern standard Arabic

478Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Multiword expression processing: A survey

177Citations
N/AReaders
Get full text

A dependency parser for tweets

150Citations
N/AReaders
Get full text

Semeval-2016 task 10: Detecting minimal semantic units and their meanings (DiMSUM)

55Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Green, S., de Marneffe, M. C., & Manning, C. D. (2013). Parsing models for identifying multiword expressions. Computational Linguistics, 39(1), 195–227. https://doi.org/10.1162/COLI_a_00139

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 86

65%

Researcher 23

17%

Professor / Associate Prof. 14

11%

Lecturer / Post doc 10

8%

Readers' Discipline

Tooltip

Computer Science 84

65%

Linguistics 33

26%

Neuroscience 6

5%

Social Sciences 6

5%

Save time finding and organizing research with Mendeley

Sign up for free