Hybrid Phishing URL Detection Using Segmented Word Embedding

Eint Sandi Aung; Hayato Yamana

Conference Proceedings

Hybrid Phishing URL Detection Using Segmented Word Embedding

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13635 LNCS 507-518

DOI: 10.1007/978-3-031-21047-1_46

0Citations

4Readers

Get full text

Abstract

Phishing is a type of cybercrime committed by attackers to steal sensitive information. This paper focuses on URL-based phishing detection, i.e., detecting phishing webpages by analyzing the URL. Previously proposed methods tackled this problem; however, insufficient word tokenization of URLs arises unknown words, which degrades the detection accuracy. To solve the unknown-word problem, we propose a new tokenization algorithm, called URL-Tokenizer, which integrates BERT and WordSegment tokenizers, besides utilizing 24 NLP features. Then, we adopt the URL-Tokenizer to the DNN-CNN hybrid model to leverage the detection accuracy. Our experiment using the Ebbu2017 dataset confirmed that our word-DNN-CNN achieves an AUC of 99.89% compared to the state-of-the-art DNN-BiLSTM with an AUC of 98.78%.

Author supplied keywords

Cite

CITATION STYLE

APA

Aung, E. S., & Yamana, H. (2022). Hybrid Phishing URL Detection Using Segmented Word Embedding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13635 LNCS, pp. 507–518). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21047-1_46

Hybrid Phishing URL Detection Using Segmented Word Embedding

Abstract

Author supplied keywords

Cite

Register to see more suggestions