Czech dataset for semantic textual similarity

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Semantic textual similarity is the core shared task at the International Workshop on Semantic Evaluation (SemEval). It focuses on sentence meaning comparison. So far, most of the research has been devoted to English. In this paper we present first Czech dataset for semantic textual similarity. The dataset contains 1425 manually annotated pairs. Czech is highly inflected language and is considered challenging for many natural language processing tasks. The dataset is publicly available for the research community. In 2016 we participated at SemEval competition and our UWB system were ranked as second among 113 submitted systems in monolingual subtask and first among 26 systems in cross-lingual subtask. We adapt the UWB system for Czech (originally for English) and experiment with new Czech dataset. Our system achieves very promising results and can serve as a strong baseline for future research.

Cite

CITATION STYLE

APA

Svoboda, L., & Brychcín, T. (2018). Czech dataset for semantic textual similarity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11107 LNAI, pp. 213–221). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free