Integrating Approximate String Matching with Phonetic String Similarity

Junior Ferri; Hegler Tissot; Marcos Didonet Del Fabro

Conference Proceedings

Integrating Approximate String Matching with Phonetic String Similarity

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11019 LNCS 173-181

DOI: 10.1007/978-3-319-98398-1_12

1Citations

1Readers

Get full text

Abstract

Well-defined dictionaries of tagged entities are used in many tasks to identify entities where the scope is limited and there is no need to use machine learning. One common solution is to encode the input dictionary into Trie trees to find matches on an input text. However, the size of the dictionary and the presence of spelling errors on the input tokens have a negative influence on such solutions. We present an approach that transforms the dictionary and each input token into a compact well-known phonetic representation. The resulting dictionary is encoded in a Trie that is about 72% smaller than a non-phonetic Trie. We perform inexact matching over this representation to filter a set of initial results. Lastly, we apply a second similarity measure to filter the best result to annotate a given entity. The experiments showed that it achieved good F1 results. The solution was developed as an entity recognition plug-in for GATE, a well-known information extraction framework.

Author supplied keywords

Cite

CITATION STYLE

APA

Ferri, J., Tissot, H., & Del Fabro, M. D. (2018). Integrating Approximate String Matching with Phonetic String Similarity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11019 LNCS, pp. 173–181). Springer Verlag. https://doi.org/10.1007/978-3-319-98398-1_12

Integrating Approximate String Matching with Phonetic String Similarity

Abstract

Author supplied keywords

Cite

Register to see more suggestions