Enhancing textual data quality in data mining: Case study and experiences

Yi Feng; Chunhua Ju

Conference Proceedings

Enhancing textual data quality in data mining: Case study and experiences

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7867 LNAI 392-403

DOI: 10.1007/978-3-642-40319-4_34

1Citations

4Readers

Get full text

Abstract

Dirty data is recognized as a top challenge for data mining. Textual data is one type of data that should be explored more on the topic of data quality, to ensure the discovered knowledge is of quality. In this paper, we focus on the topic of textual data quality (TDQ) in data mining. Based on our data mining experiences for years, three typical TDQ dimensions and related problems are highlighted, including representation granularity, representation consistency, and completeness. Then, to provide a real-world example on how to enhance TDQ in data mining, a case study is demonstrated in detail in this paper, under the background of data mining in traditional Chinese medicine and covers three typical TDQ problems and corresponding solutions. The case study provided in this paper is expected to help data analysts and miners to attach more importance to TDQ issue, and enhance TDQ for more reliable data mining. © Springer-Verlag 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Feng, Y., & Ju, C. (2013). Enhancing textual data quality in data mining: Case study and experiences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7867 LNAI, pp. 392–403). https://doi.org/10.1007/978-3-642-40319-4_34

Enhancing textual data quality in data mining: Case study and experiences

Abstract

Author supplied keywords

Cite

Register to see more suggestions