Constructing Cross-Lingual Consumer Health Vocabulary with Word-Embedding from Comparable User Generated Content

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The online health community (OHC) is the primary channel for laypeople to share health information. To analyze the health consumer-generated content (HCGC) from the OHCs, identifying the colloquial medical expressions used by laypeople is a critical challenge. The open-access and collaborative consumer health vocabulary (OAC CHV) is the controlled vocabulary for addressing such a challenge. Nevertheless, OAC CHV is only available in English, limiting its applicability to other languages. This research proposes a cross-lingual automatic term recognition framework for extending the English CHV into a cross-lingual one. Our framework requires an English HCGC corpus and a non-English (i.e., Chinese in this study) HCGC corpus as inputs. Two monolingual word vector spaces are determined using the skip-gram algorithm so that each space encodes common word associations from laypeople within a language. Based on the isometry assumption, the framework aligns two monolingual spaces into a bilingual word vector space, where we employ cosine similarity as a metric for identifying semantically similar words across languages. The experimental results demonstrate that our framework outperforms the other two large language models in identifying CHV across languages. Our framework only requires raw HCGC corpora and a limited size of medical translations, reducing human efforts in compiling cross-lingual CHV.

References Powered by Scopus

The Unified Medical Language System (UMLS): Integrating biomedical terminology

3345Citations
N/AReaders
Get full text

Automatic recognition of multi-word terms: The C-value/NC-value method

682Citations
N/AReaders
Get full text

Exploring and developing consumer health vocabularies

233Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Chang, C. H., Wang, L., & Yang, C. C. (2024). Constructing Cross-Lingual Consumer Health Vocabulary with Word-Embedding from Comparable User Generated Content. In Proceedings - 2024 IEEE 12th International Conference on Healthcare Informatics, ICHI 2024 (pp. 275–284). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICHI61247.2024.00043

Save time finding and organizing research with Mendeley

Sign up for free