Effects of Feature Extraction and Classification Methods on Cyberbully Detection
Cyberbullying is defined as an aggressive, intentional action against a defenseless person by using the Internet, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended in suicides; hence automatic detection of cyberbullying has become important. In this study we show the effects of feature extraction, feature selection, and classification methods that are used, on the performance of automatic detection of cyberbullying. To perform the experiments FormSpring.me dataset is used and the effects of preprocessing methods; several classifiers like C4.5, Naïve Bayes, kNN, and SVM; and information gain and chi square feature selection methods are investigated. Experimental results indicate that the best classification results are obtained when alphabetic tokenization, no stemming, and no stopwords removal are applied. Using feature selection also improves cyberbully detection performance. When classifiers are compared, C4.5 performs the best for the used dataset.
 Snakenborg, J., Van Acker, R., and Gable, R. A. 2011. Cyberbullying: Prevention and Intervention to Protect our Children and Youth. Preventing School Failure: Alternative Education for Children and Youth, 55(2011), 88-95.
 Smith, P.K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., and Tippett, N. 2008. Cyberbullying: its Nature and Impact in Secondary School Pupils. Journal of Child Psychology and Psychiatry 49(2008), 376–385.
 Li, Q. 2006. Cyberbullying in Schools: A Research of Gender Differences. School Psychology International, 27(2)(2006), 157-170.
 Agatston, P.W., Kowalski, R., and Limber, S. 2007. Students’ Perspectives on Cyber Bullying. Journal of Adolescent Health 41(2007), S59–S60.
 Beran, T., and Li, Q. 2005. Cyber-harassment: A Study of a New Method for an old Behavior. Journal of Educational Computing Research, 32(2005), 265-277.
 Hinduja, S., and Patchin, J. W. 2008. Cyberbullying: An Exploratory Analysis of Factors Related to Offending and Victimization. Deviant Behavior, 29(2008), 129-156.
 Kowalski, R. M., and Limber, S. P. 2007. Electronic Bullying among Middle School Students. Journal of Adolescent Health, 41(6, Suppl. 1)(2007), 22-30.
 Ortega, R., Elipe, P., Mora-Merchán, J. A., Calmaestra, J., and Vega, E. 2009. The Emotional Impact on Victims of Traditional Bullying and Cyberbullying: A study of Spanish Adolescents. Zeitschrift Für Psychologie/Journal of Psychology, 217(4)(2009), 197-204.
 Patchin, J.W., and Hinduja, S. 2006. Bullies Move Beyond the Schoolyard: A Preliminary Look at Cyberbullying. Youth Violence and Juvenile Justice 4(2006), 148–169.
 Rivers, I., and Noret, N. 2010. “I h8 u”: Findings from a Five-year Study of Text and Email Bullying. British Educational Research Journal, 36(2010), 643-671.
 Campbell, M.A. 2005. Cyber Bullying: An Old Problem in a New Guise? Australian Journal of Guidance and Counselling 15(2005), 68–76.
 What is Cyber Bullying? http://www.stopcyberbullying.org/ (Access Date: 15.12.2015)
 Barlett, C., and Coyne, S.M. 2014. A Meta Analysis of Sex Differences in Cyber-Bullying Behavior: The Moderating Role of Age: Sex Differences in Cyber-Bullying. Aggressive Behavior 40(2014), 474–488.
 Özdemir, Y. 2014. Cyber Victimization and Adolescent Self-esteem: The Role of Communication with Parents: Cyber Victimization and Self-esteem. Asian Journal of Social Psychology 17(2014), 255–263.
 Yin, D., Xue, Z., Hong, L., Davison, B. D., Kontostathis, A., and Edwards, L.. 2009. Detection of Harassment on Web 2.0. The Content Analysis in the WEB 2.0 (CAW2.0) Workshop at WWW2009, April 20-24, Madrid, Spain.
 Cambria, E., Chandra, P., Sharma, A., and Hussain A. 2010. Do not Feel the Trolls. 3rd International Workshop on Social Data on the Web (SDoW), co-located with the 9th International Semantic Web Conference (ISWC2010), Nov 7, Shanghai.
 Chen, Y., Zhou, Y., Zhu, S., and Xu, H. 2012. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust (SOCIALCOM-PASSAT '12), Washington, DC, USA, 71-80.
 Kontostathis, A., Edwards, L., and Leatherman, A. 2010. Text Mining and Cybercrime. Berry, M. W., and Kogan, J., ed. 2010. Text Mining: Applications and Theory, John Wiley and Sons, New York, NY.
 Reynolds, K., Kontostathis, A., and Edwards, L. 2011. Using Machine Learning to Detect Cyberbullying. 10th International Conference on Machine Learning and Applications and Workshops (ICMLA '11), December 18 - 21, Washington, DC, vol:2, 241–244.
 Dinakar, K., Reichart, R., and Lieberman, H. 2011. Modelling the Detection of Textual Cyberbullying. Social Mobile Web Workshop at International Conference on Weblog and Social Media, July 17-21, Barcelona, Spain.
 Sanchez, H., and Kumar, S. 2011. Twitter Bullying Detection. UCSC ISM245 Data Mining course report.
 Dadvar, M., Jong, F. D., Ordelman, R., and Trieschnigg, D. 2012. Improved Cyberbullying Detection Using Gender Information. Twelfth Dutch-Belgian Information Retrieval Workshop (DIR 2012), February 23-24, Ghent, Belgium, 23-25.
 Dadvar, M., and Jong F. D. 2012. Improved Cyberbullying Detection through Personal Profiles. International Conference on Cyberbullying, June 27-30, Paris, France.
 Dadvar, M., Trieschnigg, D., and Jong, F. D. 2013. Expert Knowledge for Automatic Detection of Bullies in Social Networks. 25th Benelux Conference on Artificial Intelligence (BNAIC), November 7-8, Delft.
 Xu, J., Jun, K., Zhu, X., and Bellmore, A. 2012. Learning from Bullying Traces in Social Media. Conference of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), June 03 – 08, Montreal, Canada, 656-666.
 Nahar, V., Unankard, S., Li, X., and Pang, C. 2012. Sentiment Analysis for Effective Detection of Cyber Bullying. 14th Asia-Pacific International Conference on Web Technologies and Applications (APWeb 2012), April 11-13, Kunming, China, 767-774.
 Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3 (2003), 993-1022.
 Munezero, M., Mozgovoy, M., Kakkonen, T., Klyuev, V., and Sutinen, E. 2013. Antisocial Behavior Corpus for Harmful Language Detection. Federated Conference on Computer Science and Information Systems, September 8-11, Krakow, Poland.
 Kontostathis, A., Edwards, L., and Leatherman, A. 2009. ChatCoder: Toward the Tracking and Categorization of Internet Predators. Text Mining Workshop held in conjunction with the Ninth SIAM International Conference on Data Mining (SDM 2009), May 2, Sparks, NV.
 McGhee, I., Bayzick, J., Kontostathis, A., Edwards, L., McBride, A., and Jakubowski, E. 2011. Learning to Identify Internet Sexual Predation. International Journal of Electronic Commerce 15(2011), 103–122.
 Smets, K., Goethals, B., and Verdonk, B. 2008. Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach, Wikipedia and Artificial Intelligence: an Evolving Synergy (WikiAi08) Workshop by Association for the Advancement of Artificial Intelligence, 43–48.
 Tan, P. N., Chen, F., and Jain, A. 2010. Information Assurance: Detection of Web Spam Attacks in Social Media. 27th Army Science Conference, Florida.
 Simanjuntak, D. A., and Ipung, H. P. 2010. Text Classification Techniques Used to Facilitate Cyber Terrorism Investigation. Second International Conference on Advances in Computing, Control and Telecommunication Technologies (ACT), 198-200.
 Zubiaga, A., Spina, D., Martínez, R., and Fresno, V. 2015. Real-Time Classification of Twitter Trends. Journal of the Association for Information Science and Technology, 66(3) (2015), 462–473.
 Bsecure. http://www.safesearchkids.com/BSecure.html (Access Date: 10.05.2014)
 Cyber Patrol. http://www.cyberpatrol.com/cpparentalcontrols.asp (Access Date: 10.05.2014)
 eBlaster. Available: http://www.eblaster.com/ (Access Date: 10.05.2014)
 IamBigBrother. http://www.iambigbrother.com/ (Access Date: 10.05.2014)
 Kidswatch. http://www.kidswatch.com/ (Access Date: 10.05.2014)
 Butler, D., Kift, S., and Campbell, M. 2009. Cyber Bullying in Schools and the Law: Is There an Effective Means of Addressing the Power Imbalance? eLaw Journal: Murdoch University Electronic Journal of Law, 16(2009).
 Porter, M.F. 1980. An Algorithm for Suffix Stripping. Program, 14(3)(1980),130-137.
 Liu, B. 2011. Web Data Mining. Second Edition. Springer-Verlag Berlin Heidelberg.
 Chakrabarti, S. 2002. Mining the Web. Morgan Kaufman.
 Salton, G. 1968. Automatic Information Organization and Retrieval. New York: McGraw-Hill.
 Özel, S.A., and Saraç, E. 2011. Feature Selection for Web Page Classification Using the Intelligent Water Drop Algorithm. 2nd World Conference on Information Technology (WCIT 2011), November 23-26, Antalya, Türkiye.
 Saraç, E., and Özel, S.A. 2013. Web Page Classification Using Firefly Optimization. IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA 2013), June 19-21, Albena, Bulgaria.
 Saraç, E., and Özel, S.A. 2014. An Ant Colony Optimization based Feature Selection for Web Page Classification. The Scientific World Journal (2014), Article ID 649260, http://dx.doi.org/10.1155/2014/649260
 Yates, F., 1934. Contingency Tables Involving Small Numbers and the χ2 Test. Supplement to the Journal of the Royal Statistical Society, 1(1934), 217-235
 Mitchell, T. M. 1997. Machine Learning. First Edition. McGraw-Hill, New York, 432 p.
 WEKA. http://www.cs.waikato.ac.nz/~ml/weka/ (Access Date: 12.05.2015)
 Han, J., and Kamber, M. 2001. Data Mining: Concepts and Techniques (Morgan-Kaufman Series of Data Management Systems). San Diego: Academic Press.
 Manning, C. D., Raghavan, P., and Schütze, H., 2008. Introduction to Information Retrieval, Cambridge, UK: Cambridge University Press.
This work is licensed under a Creative Commons Attribution 4.0 License.