Most text displays an internal coherence structure, which can be analyzed as a tree structure of relations that hold between short segments of text. We present a machine-learning governed approach to such an analysis in the framework of Rhetorical Structure Theory. Our rhetorical analyzer observes a variety of textual properties, such as cue phrases, part-of-speech information, rhetorical context and lexical chaining. A two-stage parsing algorithm uses local and global optimization to find an analysis. Decisions during parsing are driven by an ensemble of support vector classifiers. This training method allows for a non-linear separation of samples with many relevant features. We define a chain of annotation tools that profits from a new underspecified representation of rhetorical structure. Classifiers are trained on a newly introduced German language corpus, as well as on a large English one. We present evaluation data for the recognition of rhetorical relations.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Reitter, D. (2003). Simple Signals for Complex Rhetorics: On Rhetorical Analysis with Rich-Feature Support Vector Models. Journal for Language Technology and Computational Linguistics, 18(1), 38–52. https://doi.org/10.21248/jlcl.18.2003.26