URL-BASED PHISHING DETECTION USING HYBRID MACHINE LEARNING

M.Dattatreya Goud

Authors

M.Dattatreya Goud Department of Computer Science, J.S University, Shikohabad, U.P Author

Keywords:

Decision Tree, Logistic Regression, Random Forest, Naive Bayes, Gradient Boosting Classifier, k-Nearest Neighbor, Support Vector Classifier, Hybrid Model.

Abstract

Currently, phishing is by far the most common and dangerous cybercrime, where the attackers create deceptive emails or fake websites to extract sensitive user information. Until today, despite multiple researches into phishing and its prevention techniques and detection, a panacea with wide adaptability has not yet been developed. Therefore, this paper presents a framework of automated detection based on machine-learning methodologies to discriminate phishing websites from legitimate ones with very high accuracy and minimum false positives. The research uses datasets of URLs from legitimate and phishing sources, which total over 11,000 combined, preprocessed for static analysis on Windows OS. The following machine learning algorithms are tested and analyzed: Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), Gradient Boosting (GBM), K-Nearest Neighbor (KNN), and Support Vector Classifier (SVC). In addition, the model LSD is presented as a hybrid ensemble method. Here the output of LR, SVC, and DT is combined using soft and hard voting techniques while using canopy-based feature selection, k-fold cross-validation, and grid search for hyperparameter tuning. The measurement of performance is done in terms of core performance measures like accuracy, precision, recall, F1 score, and specificity. The results imply that the LSD model yields better results than the individual models and increases the metrics of accuracy and detection efficiency. More importantly, the static nature of the model does not give freedom to adapt dynamically to new phishing techniques. Addressing this through continuous retraining using updated data along with the inclusion of dynamic analysis forms a prospective solution for better detection. Overall, the proposed framework acts as a strong and efficient solution for phishing detection, which could further be improved by hybrid methods and periodic updates.

URL-BASED PHISHING DETECTION USING HYBRID MACHINE LEARNING

Authors

Keywords:

Abstract

Additional Files

Published

Issue

Section

How to Cite

Latest publications

Language

Information