A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion

5Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Cross-language sentence similarity computation is among the focuses of research in natural language processing (NLP). At present, some researchers have introduced fine-grained word and character features to help models understand sentence meanings, but they do not consider coarse-grained prior knowledge at the sentence level. Even if two cross-linguistic sentence pairs have the same meaning, the sentence representations extracted by the baseline approach may have language-specific biases. Considering the above problems, in this paper, we construct a Chinese-Uyghur cross-lingual sentence similarity dataset and propose a method to compute cross-lingual sentence similarity by fusing multiple features. The method is based on the cross-lingual pretraining model XLM-RoBERTa and assists the model in similarity calculation by introducing two coarse-grained prior knowledge features, i.e., sentence sentiment and length features. At the same time, to eliminate possible language-specific biases in the vectors, we whitened the sentence vectors of different languages to ensure that they were all represented under the standard orthogonal basis. Considering that the combination of different vectors has different effects on the final performance of the model, we introduce different vector features for comparison experiments based on the basic feature splicing method. The results show that the absolute value feature of the difference between two vectors can reflect the similarity of two sentences well. The final F1 value of our method reaches 98.97%, which is 19.81% higher than that of the baseline.

References Powered by Scopus

Learning a similarity metric discriminatively, with application to face verification

3694Citations
N/AReaders
Get full text

Bag of tricks for efficient text classification

2109Citations
N/AReaders
Get full text

Siamese Neural Networks: An Overview

458Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A Study on Different Methods in Sentiment Analysis from Text

2Citations
N/AReaders
Get full text

An AI based cross-language aspect-level sentiment analysis model using English corpus

1Citations
N/AReaders
Get full text

Analysis of Content Consistency in Scientific Journal Based on Natural Language Processing and Machine Learning

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wang, L., Liu, S., Qiao, L., Sun, W., Sun, Q., & Cheng, H. (2022). A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion. IEEE Access, 10, 30666–30675. https://doi.org/10.1109/ACCESS.2022.3159692

Readers' Seniority

Tooltip

Lecturer / Post doc 1

50%

PhD / Post grad / Masters / Doc 1

50%

Readers' Discipline

Tooltip

Social Sciences 1

50%

Computer Science 1

50%

Save time finding and organizing research with Mendeley

Sign up for free