Supervised Learning Model for the Automatic Identification of Spam in Mobile Messaging Environments
DOI:
https://doi.org/10.61799/2216-0388.1797Keywords:
classification, machine learning, mobile messaging, Random Forest, supervised model, spam, text processingAbstract
This article presents a supervised learning model for the automatic detection of unwanted messages (spam) in mobile messaging environments, using data obtained from the Hugging Face repository. The methodology included text cleaning and normalization, translation into Spanish, exploratory analysis, and message representation through the bag-of-words technique (CountVectorizer). The dataset was then split into training and testing sets, and a Random Forest classifier was trained using manually tuned hyperparameters. The results show an overall accuracy of 96%, a precision of 1.00 for the spam class, and a significant improvement in its F1-score (0.82). Additionally, five-fold cross-validation reported a mean accuracy of 94.4%, demonstrating the model’s stability. Overall, the proposed approach proves to be an effective alternative for spam detection in mobile messaging applications.
Downloads
References
[1] B. Sonare, G. J. Dharmale, A. Renapure, H. Khandelwal, and S. Narharshettiwar, “E-mail Spam Detection Using Machine Learning,” 2023 4th International Conference for Emerging Technology, INCET 2023, 2023, doi: 10.1109/INCET57972.2023.10170187.
[2] A. Alzahrani and D. B. Rawat, “Comparative Study of Machine Learning Algorithms for SMS Spam Detection,” Conference Proceedings - IEEE SOUTHEASTCON, vol. 2019-April, Apr. 2019, doi: 10.1109/SOUTHEASTCON42311.2019.9020530.
[3] K. Aparna and S. Halder, “Detection of Multilingual Spam SMS Using NaïveBayes Classifier,” 5th IEEE International Conference on Cybernetics, Cognition and Machine Learning Applications, ICCCMLA 2023, pp. 89–94, 2023, doi: 10.1109/ICCCMLA58983.2023.10346960.
[4] K. Debnath and N. Kar, “Email Spam Detection using Deep Learning Approach,” 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COM-IT-CON 2022, pp. 37–41, 2022, doi: 10.1109/COM-IT-CON54601.2022.9850588.
[5] J. Mythili, B. Deebeshkumar, T. Eshwaramoorthy, and J. N. Ajay, “Enhancing Email Spam Detection with Temporal Naive Bayes Classifier,” 2024 International Conference on Communication, Computing and Internet of Things, IC3IoT 2024 - Proceedings, 2024, doi: 10.1109/IC3IOT60841.2024.10550229.
[6] N. Ramya, M. K. Devi, K. Nithya, V. Hema, and R. Thomas Abragam Walker, “Detection of Malicious Messages from Mobile Computing Devices Using NLP and Slack Integration,” Proceedings of the 2023 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems, ICSES 2023, 2023, doi: 10.1109/ICSES60034.2023.10465341.
[7] P. Santhiya, S. Kavitha, T. Aravindh, S. Archana, and A. V. Praveen, “Fake News Detection Using Machine Learning,” 2023 International Conference on Computer Communication and Informatics, ICCCI 2023, 2023, doi: 10.1109/ICCCI56745.2023.10128339.
[8] X. Fei, J. Li, Y. Gao, and Y. Zhou, “SMS Text Classification Model Based on Machine Learning,” 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2021, pp. 289–292, 2021, doi: 10.1109/ICCWAMTIP53232.2021.9674063.
[9] H. Chen, L. Wu, J. Chen, W. Lu, and J. Ding, “A comparative study of automated legal text classification using random forests and deep learning,” Inf Process Manag, vol. 59, no. 2, p. 102798, Mar. 2022, doi: 10.1016/J.IPM.2021.102798.
[10] N. Jalal, A. Mehmood, G. S. Choi, and I. Ashraf, “A novel improved random forest for text classification using feature ranking and optimal number of trees,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 2733–2742, Jun. 2022, doi: 10.1016/J.JKSUCI.2022.03.012.
[11] S. U. Hassan, J. Ahamed, and K. Ahmad, “Analytics of machine learning-based algorithms for text classification,” Sustainable Operations and Computers, vol. 3, pp. 238–248, Jan. 2022, doi: 10.1016/J.SUSOC.2022.03.001.
[12] B. Zhang, “News Text Classification Algorithm Based on Machine Learning Technology,” Proceedings - 2022 International Conference on Education, Network and Information Technology, ICENIT 2022, pp. 182–186, 2022, doi: 10.1109/ICENIT57306.2022.00047.
[13] A. Rohalia, Z. Zainuddin, and Z. Tahir, “Classification of Community Responses to Service Offices using a Combined CNN-LSTM Algorithm and Random Forest,” 2023 International Conference on Artificial Intelligence Robotics, Signal and Image Processing (AIRoSIP), pp. 68–73, Aug. 2023, doi: 10.1109/AIROSIP58759.2023.10873903.
[14] G. Thangarasu and K. R. Alla, “Detection of Cyberbullying Tweets in Twitter Media Using Random Forest Classification,” 13th IEEE Symposium on Computer Applications and Industrial Electronics, ISCAIE 2023, pp. 113–117, 2023, doi: 10.1109/ISCAIE57739.2023.10165118.
[15] V. Mittal, M. Guru, H. Vishwakarma, D. Ganesh, S. Chandrappa, and M. Ram, “Sentimental Analysis of Movie Review Based on Naive Bayes and Random Forest Technique,” 2023 IEEE 4th Annual Flagship India Council International Subsections Conference: Computational Intelligence and Learning Systems, INDISCON 2023, 2023, doi: 10.1109/INDISCON58499.2023.10269857.
[16] Y. Sun, Y. Li, Q. Zeng, and Y. Bian, “Application research of text classification based on random forest algorithm,” Proceedings - 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering, AEMCSE 2020, pp. 370–374, Apr. 2020, doi: 10.1109/AEMCSE50948.2020.00086.
[17] S. Gadde, A. Lakshmanarao, and S. Satyanarayana, “SMS Spam Detection using Machine Learning and Deep Learning Techniques,” 2021 7th International Conference on Advanced Computing and Communication Systems, ICACCS 2021, pp. 358–362, Mar. 2021, doi: 10.1109/ICACCS51430.2021.9441783.
[18] E. Ramanujam, K. Shankar, and A. Sharma, “A Review on Artificial Intelligence Techniques for Multilingual SMS Spam Detection,” Lecture Notes in Electrical Engineering, vol. 1087, pp. 525–536, 2024, doi: 10.1007/978-981-99-6690-5_40.
[19] M. Johari., “Key insights into recommended SMS spam detection datasets,” Scientific Reports, vol. 15, no. 1, pp. 1–24, Dec. 2025, doi: 10.1038/S41598-025-92223-1.
[20] C. J. Nusch, “Breve Introducción a la Minería de Textos,” Oct. 2024. Accessed: Jul. 08, 2025.
[21] G. Rosenbrock, S. Trossero, and A. Pascal, “Técnicas de análisis de sentimientos aplicadas a la valoración de opiniones en el lenguaje español,” Memorias del Congreso Argentino en Ciencias de la Computación - CACIC 2021, vol. 1, no. 1, pp. 31–40, 2021, Accessed: Jul. 08, 2025.
[22] L. Vasquez and J. Vivas, “Detección de situaciones de emergencias usando el modelo Naive- Bayes de machine learning.,” Mundo FESC, vol. 13, no. 25, pp. 20–40, Jan. 2023, doi: 10.61799/2216-0388.1286.
[23] S. Sakthi Vel, “Pre-Processing techniques of Text Mining using Computational Linguistics and Python Libraries,” Proceedings - International Conference on Artificial Intelligence and Smart Systems, ICAIS 2021, pp. 879–884, Mar. 2021, doi: 10.1109/ICAIS50930.2021.9395924.
[24] H. Qiu, Z. Wu, and X. Zhang, “Exploring Multiple Genres Text Classification: Classifying 61 genres of Mobile App Description based on Naïve Bayes and Count Vectorizer,” Proceedings - 2022 3rd International Conference on Electronic Communication and Artificial Intelligence, IWECAI 2022, pp. 156–162, 2022, doi: 10.1109/IWECAI55315.2022.00039.
[25] N. O. Lwin, R. Jain, R. Dal, H. Yan, K. K. Thaw, and S. Y. Naung, “Text Classification for Clickbait Detection: A Model-Driven Approach Using CountVectorizer and ML Classifiers,” Journal of Applied Science and Technology Trends, vol. 6, no. 1, pp. 43–49, Jun. 2025, doi: 10.38094/jastt61237.
Published
Issue
Section
License
Copyright (c) 2026 Mundo FESC Journal

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

