Modelo de Aprendizaje Supervisado para la Identificación Automática de Spam en Entornos de Mensajería Móvil

Autores/as

DOI:

https://doi.org/10.61799/2216-0388.1797

Palabras clave:

aprendizaje automatico, clasificación, mensajería móvil , modelo supervisado, procesamiento de texto, Random Forest, Spam, Vectorización de texto

Resumen

Este artículo presenta un modelo de aprendizaje supervisado para la detección automática de mensajes no deseados (spam) en entornos de mensajería móvil, utilizando datos del repositorio Hugging Face. La metodología incluyó las etapas de limpieza y normalización del texto, traducción al español, análisis exploratorio y representación de los mensajes mediante la técnica de bolsa de palabras (CountVectorizer). Posteriormente, el conjunto de datos fue dividido en particiones de entrenamiento y prueba, y se entrenó un clasificador Random Forest con hiperparámetros ajustados manualmente. Los resultados muestran una exactitud global del 96 %, una precisión de 1.00 para la clase spam y una mejora significativa del F1-score (0.82). Además, la validación cruzada de cinco pliegues obtuvo una media de exactitud del 94.4 %, evidenciando estabilidad en el modelo. En conjunto, estos hallazgos confirman que el modelo propuesto es una alternativa eficaz para la detección de spam en aplicaciones de mensajería móvil.

Descargas

Los datos de descarga aún no están disponibles.

Referencias

[1] B. Sonare, G. J. Dharmale, A. Renapure, H. Khandelwal, and S. Narharshettiwar, “E-mail Spam Detection Using Machine Learning,” 2023 4th International Conference for Emerging Technology, INCET 2023, 2023, doi: 10.1109/INCET57972.2023.10170187.

[2] A. Alzahrani and D. B. Rawat, “Comparative Study of Machine Learning Algorithms for SMS Spam Detection,” Conference Proceedings - IEEE SOUTHEASTCON, vol. 2019-April, Apr. 2019, doi: 10.1109/SOUTHEASTCON42311.2019.9020530.

[3] K. Aparna and S. Halder, “Detection of Multilingual Spam SMS Using NaïveBayes Classifier,” 5th IEEE International Conference on Cybernetics, Cognition and Machine Learning Applications, ICCCMLA 2023, pp. 89–94, 2023, doi: 10.1109/ICCCMLA58983.2023.10346960.

[4] K. Debnath and N. Kar, “Email Spam Detection using Deep Learning Approach,” 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COM-IT-CON 2022, pp. 37–41, 2022, doi: 10.1109/COM-IT-CON54601.2022.9850588.

[5] J. Mythili, B. Deebeshkumar, T. Eshwaramoorthy, and J. N. Ajay, “Enhancing Email Spam Detection with Temporal Naive Bayes Classifier,” 2024 International Conference on Communication, Computing and Internet of Things, IC3IoT 2024 - Proceedings, 2024, doi: 10.1109/IC3IOT60841.2024.10550229.

[6] N. Ramya, M. K. Devi, K. Nithya, V. Hema, and R. Thomas Abragam Walker, “Detection of Malicious Messages from Mobile Computing Devices Using NLP and Slack Integration,” Proceedings of the 2023 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems, ICSES 2023, 2023, doi: 10.1109/ICSES60034.2023.10465341.

[7] P. Santhiya, S. Kavitha, T. Aravindh, S. Archana, and A. V. Praveen, “Fake News Detection Using Machine Learning,” 2023 International Conference on Computer Communication and Informatics, ICCCI 2023, 2023, doi: 10.1109/ICCCI56745.2023.10128339.

[8] X. Fei, J. Li, Y. Gao, and Y. Zhou, “SMS Text Classification Model Based on Machine Learning,” 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2021, pp. 289–292, 2021, doi: 10.1109/ICCWAMTIP53232.2021.9674063.

[9] H. Chen, L. Wu, J. Chen, W. Lu, and J. Ding, “A comparative study of automated legal text classification using random forests and deep learning,” Inf Process Manag, vol. 59, no. 2, p. 102798, Mar. 2022, doi: 10.1016/J.IPM.2021.102798.

[10] N. Jalal, A. Mehmood, G. S. Choi, and I. Ashraf, “A novel improved random forest for text classification using feature ranking and optimal number of trees,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 2733–2742, Jun. 2022, doi: 10.1016/J.JKSUCI.2022.03.012.

[11] S. U. Hassan, J. Ahamed, and K. Ahmad, “Analytics of machine learning-based algorithms for text classification,” Sustainable Operations and Computers, vol. 3, pp. 238–248, Jan. 2022, doi: 10.1016/J.SUSOC.2022.03.001.

[12] B. Zhang, “News Text Classification Algorithm Based on Machine Learning Technology,” Proceedings - 2022 International Conference on Education, Network and Information Technology, ICENIT 2022, pp. 182–186, 2022, doi: 10.1109/ICENIT57306.2022.00047.

[13] A. Rohalia, Z. Zainuddin, and Z. Tahir, “Classification of Community Responses to Service Offices using a Combined CNN-LSTM Algorithm and Random Forest,” 2023 International Conference on Artificial Intelligence Robotics, Signal and Image Processing (AIRoSIP), pp. 68–73, Aug. 2023, doi: 10.1109/AIROSIP58759.2023.10873903.

[14] G. Thangarasu and K. R. Alla, “Detection of Cyberbullying Tweets in Twitter Media Using Random Forest Classification,” 13th IEEE Symposium on Computer Applications and Industrial Electronics, ISCAIE 2023, pp. 113–117, 2023, doi: 10.1109/ISCAIE57739.2023.10165118.

[15] V. Mittal, M. Guru, H. Vishwakarma, D. Ganesh, S. Chandrappa, and M. Ram, “Sentimental Analysis of Movie Review Based on Naive Bayes and Random Forest Technique,” 2023 IEEE 4th Annual Flagship India Council International Subsections Conference: Computational Intelligence and Learning Systems, INDISCON 2023, 2023, doi: 10.1109/INDISCON58499.2023.10269857.

[16] Y. Sun, Y. Li, Q. Zeng, and Y. Bian, “Application research of text classification based on random forest algorithm,” Proceedings - 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering, AEMCSE 2020, pp. 370–374, Apr. 2020, doi: 10.1109/AEMCSE50948.2020.00086.

[17] S. Gadde, A. Lakshmanarao, and S. Satyanarayana, “SMS Spam Detection using Machine Learning and Deep Learning Techniques,” 2021 7th International Conference on Advanced Computing and Communication Systems, ICACCS 2021, pp. 358–362, Mar. 2021, doi: 10.1109/ICACCS51430.2021.9441783.

[18] E. Ramanujam, K. Shankar, and A. Sharma, “A Review on Artificial Intelligence Techniques for Multilingual SMS Spam Detection,” Lecture Notes in Electrical Engineering, vol. 1087, pp. 525–536, 2024, doi: 10.1007/978-981-99-6690-5_40.

[19] M. Johari., “Key insights into recommended SMS spam detection datasets,” Scientific Reports, vol. 15, no. 1, pp. 1–24, Dec. 2025, doi: 10.1038/S41598-025-92223-1.

[20] C. J. Nusch, “Breve Introducción a la Minería de Textos,” Oct. 2024. Accessed: Jul. 08, 2025.

[21] G. Rosenbrock, S. Trossero, and A. Pascal, “Técnicas de análisis de sentimientos aplicadas a la valoración de opiniones en el lenguaje español,” Memorias del Congreso Argentino en Ciencias de la Computación - CACIC 2021, vol. 1, no. 1, pp. 31–40, 2021, Accessed: Jul. 08, 2025.

[22] L. Vasquez and J. Vivas, “Detección de situaciones de emergencias usando el modelo Naive- Bayes de machine learning.,” Mundo FESC, vol. 13, no. 25, pp. 20–40, Jan. 2023, doi: 10.61799/2216-0388.1286.

[23] S. Sakthi Vel, “Pre-Processing techniques of Text Mining using Computational Linguistics and Python Libraries,” Proceedings - International Conference on Artificial Intelligence and Smart Systems, ICAIS 2021, pp. 879–884, Mar. 2021, doi: 10.1109/ICAIS50930.2021.9395924.

[24] H. Qiu, Z. Wu, and X. Zhang, “Exploring Multiple Genres Text Classification: Classifying 61 genres of Mobile App Description based on Naïve Bayes and Count Vectorizer,” Proceedings - 2022 3rd International Conference on Electronic Communication and Artificial Intelligence, IWECAI 2022, pp. 156–162, 2022, doi: 10.1109/IWECAI55315.2022.00039.

[25] N. O. Lwin, R. Jain, R. Dal, H. Yan, K. K. Thaw, and S. Y. Naung, “Text Classification for Clickbait Detection: A Model-Driven Approach Using CountVectorizer and ML Classifiers,” Journal of Applied Science and Technology Trends, vol. 6, no. 1, pp. 43–49, Jun. 2025, doi: 10.38094/jastt61237.

Publicado

2025-05-01

Número

Sección

Artículo Originales

Artículos más leídos del mismo autor/a