Hybrid Phishing URL Detection Using Segmented Word Embedding

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Phishing is a type of cybercrime committed by attackers to steal sensitive information. This paper focuses on URL-based phishing detection, i.e., detecting phishing webpages by analyzing the URL. Previously proposed methods tackled this problem; however, insufficient word tokenization of URLs arises unknown words, which degrades the detection accuracy. To solve the unknown-word problem, we propose a new tokenization algorithm, called URL-Tokenizer, which integrates BERT and WordSegment tokenizers, besides utilizing 24 NLP features. Then, we adopt the URL-Tokenizer to the DNN-CNN hybrid model to leverage the detection accuracy. Our experiment using the Ebbu2017 dataset confirmed that our word-DNN-CNN achieves an AUC of 99.89% compared to the state-of-the-art DNN-BiLSTM with an AUC of 98.78%.

Cite

CITATION STYLE

APA

Aung, E. S., & Yamana, H. (2022). Hybrid Phishing URL Detection Using Segmented Word Embedding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13635 LNCS, pp. 507–518). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21047-1_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free