A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion

Lingxin Wang; Shengquan Liu; Longye Qiao; Weiwei Sun; Qi Sun; Huaqing Cheng

Journal ArticleOPEN ACCESS

A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion

IEEE Access (2022) 10 30666-30675

DOI: 10.1109/ACCESS.2022.3159692

5Citations

11Readers

Abstract

Cross-language sentence similarity computation is among the focuses of research in natural language processing (NLP). At present, some researchers have introduced fine-grained word and character features to help models understand sentence meanings, but they do not consider coarse-grained prior knowledge at the sentence level. Even if two cross-linguistic sentence pairs have the same meaning, the sentence representations extracted by the baseline approach may have language-specific biases. Considering the above problems, in this paper, we construct a Chinese-Uyghur cross-lingual sentence similarity dataset and propose a method to compute cross-lingual sentence similarity by fusing multiple features. The method is based on the cross-lingual pretraining model XLM-RoBERTa and assists the model in similarity calculation by introducing two coarse-grained prior knowledge features, i.e., sentence sentiment and length features. At the same time, to eliminate possible language-specific biases in the vectors, we whitened the sentence vectors of different languages to ensure that they were all represented under the standard orthogonal basis. Considering that the combination of different vectors has different effects on the final performance of the model, we introduce different vector features for comparison experiments based on the basic feature splicing method. The results show that the absolute value feature of the difference between two vectors can reflect the similarity of two sentences well. The final F1 value of our method reaches 98.97%, which is 19.81% higher than that of the baseline.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Wang, L., Liu, S., Qiao, L., Sun, W., Sun, Q., & Cheng, H. (2022). A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion. IEEE Access, 10, 30666–30675. https://doi.org/10.1109/ACCESS.2022.3159692

Readers' Seniority

Lecturer / Post doc 1

50%

PhD / Post grad / Masters / Doc 1

50%

Readers' Discipline

Social Sciences 1

50%

Computer Science 1

50%

A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion

Abstract

Author supplied keywords

References Powered by Scopus

Learning a similarity metric discriminatively, with application to face verification

Bag of tricks for efficient text classification

Siamese Neural Networks: An Overview

Cited by Powered by Scopus

A Study on Different Methods in Sentiment Analysis from Text

An AI based cross-language aspect-level sentiment analysis model using English corpus

Analysis of Content Consistency in Scientific Journal Based on Natural Language Processing and Machine Learning

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline