Study on the Effect of Preprocessing Methods for Spam Email Detection

Fariska Zakhralativa Ruskanda

doi:10.21108/INDOJC.2019.4.1.284

Authors

Fariska Zakhralativa Ruskanda Widyatama University

DOI:

https://doi.org/10.21108/INDOJC.2019.4.1.284

Abstract

The use of email as a communication technology is now increasingly being exploited. Along with its progress, email spam problem becomes quite disturbing to email user. The resulting negative impacts make effective spam email detection techniques indispensable. A spam email detection algorithm or spam classifier will work effectively if supported by proper preprocessing steps (noise removal, stop words removal, stemming, lemmatization, term frequency). This research studies the effect of preprocessing steps on the performance of supervised spam classifier algorithms. Experiments were conducted on two widely used supervised spam classifier algorithms: NaÃ¯ve Bayes and Support Vector Machine. The evaluation is performed on the Ling-spam corpus dataset and uses evaluation metrics: accuracy. The experimental results show that different preprocessing steps give different effects to different classifier.

Downloads

Download data is not yet available.

Author Biography

Fariska Zakhralativa Ruskanda, Widyatama University

Department of Informatics

References

G. V. Cormack, â€œEmail Spam Filtering: A Systematic Review,â€ Foundations and TrendsÂ® in Information Retrieval, vol. 1, no. 4, pp. 335â€“455, 2008.

E. Blanzieri and A. Bryl, â€œA survey of learning-based techniques of email spam filtering,â€ Artificial Intelligence Review, vol. 29, no. 1, pp. 63â€“92, 2008.

W. Yerazunis, â€œCorrespondence with Paul Graham.â€ 2002.

B. Leiba, J. Ossher, V. Rajan, R. Segal, and M. Wegman, â€œSMTP Path Analysis,â€ in Conference on Email and Anti-spam, 2005, vol. 2, no. 1, pp. 54â€“66.

S. Balakrishnan and K. L. Shunmuganathan, â€œAn Agent Based Collaborative Spam Filtering Assistance Using JADE,â€ International Journal of Applied Engineering Research, vol. 10, no. 21, pp. 42476â€“42479, 2015.

T. A. Almeida, J. Almeida, and A. Yamakami, â€œSpam filtering: How the dimensionality reduction affects the accuracy of Naive Bayes classifiers,â€ Journal of Internet Services and Applications, vol. 1, no. 3, pp. 183â€“200, 2011.

W. Feng, J. Sun, L. Zhang, C. Cao, and Q. Yang, â€œA Support Vector Machine based Naive Bayes Algorithm for Spam Filtering,â€ in 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), 2016, no. IEEE, p. 8.

A. Sharma and A. Suryawanshi, â€œA Novel Method for Detecting Spam Email using KNN Classification with Spearman Correlation as Distance Measure,â€ International Journal of Computer Applications, vol. 136, no. 6, pp. 975â€“8887, 2016.

O. Kufandirimbwa and R. Gotora, â€œSpam Detection Using Artificial Neural Networks (Perceptron Learning Rule),â€ Online Journal of Physical and Environmental Science Research, vol. 1, no. 2, pp. 22â€“29, 2012.

A. S. Rao, P. S. Avadhani, and N. B. Chaudhuri, â€œA Content-Based Spam E-Mail Filtering Approach Using Multilayer Perceptron Neural Networks,â€ International Journal of Engineering Trends and Technology (IJETT), vol. 41, no. 1, pp. 44â€“55, 2016.

J. Bluszcz, D. Fitisova, A. Hamann, A. Trifonov, and P. Jahnichen, â€œApplication of Support Vector Machine Algorithm in E-Mail Spam Filtering,â€ pp. 1â€“5, 2016.

Z. Khan and U. Qamar, â€œText Mining Approach to Detect Spam in Emails,â€ Proceedings of The International Conference on Innovations in Intelligent Systems and Computing Technologies, no. February, 2016.

H. Wei-chih and T. Yu, â€œE-mail Spam Filtering Using Support Vector Machines with Selection of Kernel,â€ Information and Control, pp. 764â€“767, 2009.

D. C. Trudgian and Z. R. Yang, â€œSpam Classification Using Nearest Neighbour Techniques,â€ in Intelligent Data Engineering and Automated Learning â€“ IDEAL 2004, 2004, pp. 578â€“585.

S. B. Rathod and T. M. Pattewar, â€œContent Based Spam Detection in Email using Bayesian Classfifier,â€ in 2015 International Conference on Communications and Signal Processing (ICCSP), 2015, pp. 1257â€“1261.

G. Sakkis, I. O. N. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, and P. Stamatopoulos, â€œA Memory-Based Approach to Anti-Spam Filtering,â€ pp. 49â€“73, 2003.

S. K. Trivedi, â€œA study of machine learning classifiers for spam detection,â€ in 2016 4th International Symposium on Computational and Business Intelligence (ISCBI), 2016, pp. 176â€“180.

A. R. On and D. Glaucoma, â€œA Review on Different Spam Detection Approaches,â€ vol. 11, no. 6, pp. 2â€“7, 2015.

J. Daniel and J. Martin, â€œNaive Bayes and Sentiment Classification,â€ in Speech and Language Processing Stanford University, 2017.

Study on the Effect of Preprocessing Methods for Spam Email Detection

Authors

DOI:

Abstract

Downloads

Author Biography

Fariska Zakhralativa Ruskanda, Widyatama University

References

Downloads

Published

How to Cite

Issue

Section

License

issn

currrentissue

Information

flagcounter

about

author

templates

technical

citation