Hybrid Approach with a focus on Preprocessing Techniques for Detecting Phishing Websites

Authors

  • Abeer Yahya Salawi Independent Research Author

DOI:

https://doi.org/10.70715/jitcai.2025.v2.i2.023

Keywords:

Cybersecurity, phishing detection, hybrid, approach, URL, analysis, Deep, Learning

Abstract

Phishing is regarded as a significant cybersecurity problem in the digital era, utilizing the fabrication of fraudulent websites to deceive users and expropriate their sensitive information, including passwords and financial data. The growing dependence on the internet has led to a marked increase in the frequency of these attacks, resulting in considerable financial losses for individuals and businesses alike. This underscores the pressing necessity for efficient strategies to counteract such assaults.

This research aims to create a hybrid model for identifying phishing websites via URL analysis. The suggested model combines Convolutional Neural Networks (CNN) with Long Short-Term Memory networks (LSTM) and an Attention Mechanism to make predictions more accurate and uncover hidden patterns in the data. The model was trained on the "Url_Detection_Dataset" from the Kaggle platform, and its performance was assessed using precision, recall, and F1-score measures. The results showed that the hybrid model is better than traditional methods at telling apart real and harmful URLs, making it a useful tool in cybersecurity. The results provide a framework for subsequent research and promote the creation of more resilient, flexible, and effective solutions.

References

[1] Rao, R. S., Vaishnavi, T., & Pais, A. R. (2020). CatchPhish: detection of phishing websites by inspecting URLs. Journal of Ambient Intelligence and Humanized Computing, 11(2), 813-825.‏

[2] Jha, A. K., Muthalagu, R., & Pawar, P. M. (2023). Intelligent phishing website detection using machine learning. Multimedia Tools and Applications, 82(19), 29431-29456.‏

[3] Prasad, Y. B., & Dondeti, V. (2025). PDSMV3-DCRNN: A novel ensemble deep learning framework for enhancing phishing detection and URL extraction. Computers & Security, 148, 104123.

[4] Van der Merwe, A., Loock, M., & Dabrowski, M. (2005, January). Characteristics and responsibilities involved in a phishing attack. In Proceedings of the 4th international symposium on Information and communication technologies (pp. 249-254).‏

[5] Kirda, E., & Kruegel, C. (2005, July). Protecting users against phishing attacks with antiphish. In 29th Annual International Computer Software and Applications Conference (COMPSAC'05) (Vol. 1, pp. 517-524). IEEE.‏

[6] Alkhalil, Z., Hewage, C., Nawaf, L., & Khan, I. (2021). Phishing attacks: A recent comprehensive study and a new anatomy. Frontiers in Computer Science, 3, 563060.‏

[7] Do, N. Q., Selamat, A., Krejcar, O., Herrera-Viedma, E., & Fujita, H. (2022). Deep learning for phishing detection: Taxonomy, current challenges and future directions. Ieee Access, 10, 36429-36463.‏

[8] Alshingiti, Z., Alaqel, R., Al-Muhtadi, J., Haq, Q. E. U., Saleem, K., & Faheem, M. H. (2023). A deep learning-based phishing detection system using CNN, LSTM, and LSTM-CNN. Electronics, 12(1), 232.‏

[9] Zhu, E., Ju, Y., Chen, Z., Liu, F., & Fang, X. (2020). DTOF-ANN: an artificial neural network phishing detection model based on decision tree and optimal features. Applied Soft Computing, 95, 106505.

[10] Gandhar, A., Gupta, K., Pandey, A. K., & Raj, D. (2024). Fraud detection using machine learning and deep learning. SN Computer Science, 5(5), 453.‏

[11] Linh, D. M., Hung, H. D., Chau, H. M., Vu, Q. S., & Tran, T. N. (2024). Real-time phishing detection using deep learning methods by extensions. International Journal of Electrical and Computer Engineering (IJECE), 14(3), 3021-3035.‏

[12] Khatun, M. A., Yousuf, M. A., Ahmed, S., Uddin, M. Z., Alyami, S. A., Al-Ashhab, S., ... & Moni, M. A. (2022). Deep CNN-LSTM with self-attention model for human activity recognition using wearable sensor. IEEE Journal of Translational Engineering in Health and Medicine, 10, 1-16.‏

[13] Sha, L., Raković, M., Das, A., Gašević, D., & Chen, G. (2022). Leveraging class balancing techniques to alleviate algorithmic bias for predictive tasks in education. IEEE Transactions on Learning Technologies, 15(4), 481-492.‏

[14] Poojary, R., Raina, R., & Mondal, A. K. (2021). Effect of data-augmentation on fine-tuned CNN model performance. algorithms, 5, 6.‏

[15] Lee, K., Lim, K., Kim, H., Kwon, Y., & Kim, D. (2025, April). 7 Days Later: Analyzing Phishing-Site Lifespan After Detected. In Proceedings of the ACM on Web Conference 2025 (pp. 945-956).‏

[16] URL-Detection Dataset, Kaggle. https://www.kaggle.com/datasets/vishvapatel09/url-detection-dataset

Downloads

Published

11/07/2025

Data Availability Statement

The datasets used in this study are publicly available on Kaggle (Url_Detection_Dataset). The processed data and model code can be provided by the author upon request.

How to Cite

Salawi, A. (2025). Hybrid Approach with a focus on Preprocessing Techniques for Detecting Phishing Websites. Journal of Information Technology, Cybersecurity, and Artificial Intelligence, 2(2), 145-155. https://doi.org/10.70715/jitcai.2025.v2.i2.023