Sentiment Analysis on Social Media Using Fasttext Feature Expansion and Recurrent Neural Network (RNN) with Genetic Algorithm Optimization

Authors

  • Inggit Restu Illahi Telkom University
  • Erwin Budi Setiawan Telkom University, Indonesia

DOI:

https://doi.org/10.21108/ijoict.v10i1.905

Keywords:

FastText, Genetic Algorithm, RNN, Sentiment analysis, TF-IDF

Abstract

Social media is a place to express opinions or feelings, both positive and negative. One of them is to express opinions or feelings about a topic that is currently being discussed. The number of opinions or sentiments related to a topic can be challenging to assess if it leans towards positivity or negativity. Therefore, Sentiment analysis is essential for examining the viewpoints or sentiments on the topic. In this study, 37,391 Twitter user comments on the 2024 Indonesian presidential election were tested. This research employs the RNN methodology, TF-IDF feature extraction, and FastText feature expansion utilizing an IndoNews corpus of as much as 142,545 data and using Genetic Algorithm optimization. The outcomes of this study yielded the highest accuracy when combining TF-IDF feature extraction with max 7000 features, FastText feature expansion on top 5 features, and implementing Genetic Algorithm optimization with a value of 82.72%, accuracy increased by 3.4% from the baseline.

Downloads

Download data is not yet available.

References

[1] Yanti, P. G., Zabadi, F., & Rahman, F. (2020). The Effect Of Using Social Media Towards Students’ Reading Comprehension. RETORIKA: Jurnal Bahasa, Sastra, dan Pengajarannya, 13.
[2] Saad, S. E., & Yang, J. (2019). Twitter sentiment analysis based on ordinal regression. IEEE Access, 7, 163677-163685.
[3] Sanusi, R., Astuti, F. D., & Buryadi, I. Y. (2021). Analisis sentimen pada twitter terhadap program kartu pra kerja dengan recurrent neural network. JIKO (Jurnal Informatika dan Komputer), 5(2), 89-99.
[4] A. Mittal and S. Patidar, "Sentiment analysis on twitter data: A survei," ACM Int. Conf. Proceeding Ser., pp. 91-95, 2019, doi: 10.1145/3348445.3348466.
[5] Nistor, S. C., Moca, M., Moldovan, D., Oprean, D. B., & Nistor, R. L. (2021). Building a Twitter sentiment analysis system with recurrent neural networks. Sensors, 21(7), 2266.
[6] Patel, A., & Tiwari, A. K. (2019, February). Sentiment analysis by using recurrent neural network. In Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE).
[7] Cahyadi, R., Damayanti, A., & Aryadani, D. (2020). Recurrent neural network (rnn) dengan long short term memory (lstm) untuk analisis sentimen data instagram. JIKO (Jurnal Informatika dan Komputer), 5(1), 1-9.
[8] Alibrahim, H., & Ludwig, S. A. (2021, June). Hyperparameter optimization: Comparing genetic algorithm against grid search and bayesian optimization. In 2021 IEEE Congress on Evolutionary Computation (CEC) (pp. 1551-1559). IEEE.
[9] Chen, Y., Dong, Z., Wang, Y., Su, J., Han, Z., Zhou, D., ... & Bao, Y. (2021). Short-term wind speed predicting framework based on EEMD-GA-LSTM method under large scaled wind history. Energy Conversion and Management, 227, 113559.
[10] Kara, A. (2021). Multi-step influenza outbreak forecasting using deep LSTM network and genetic algorithm. Expert Systems with Applications, 180, 115153.
[11] Shahid, F., Zameer, A., & Muneeb, M. (2021). A novel genetic LSTM model for wind power forecast. Energy, 223, 120069.
[12] Xu, L., Hou, L., Zhu, Z., Li, Y., Liu, J., Lei, T., & Wu, X. (2021). Mid-term prediction of electrical energy consumption for crude oil pipelines using a hybrid algorithm of support vector machine and genetic algorithm. Energy, 222, 119955.
[13] Nugroho, N. A., & Setiawan, E. B. (2021). Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(5), 837-842.
[14] Nev??ilová, Z. (2022). Compressed FastText Models for Czech Tagger. RASLAN 2022 Recent Advances in Slavonic Natural Language Processing, 79.
[15] Fan, Y., Xie, X., Cai, Y., Chen, J., Ma, X., Li, X., ... & Guo, J. (2022). Pre-training methods in information retrieval. Foundations and Trends® in Information Retrieval, 16(3), 178-317.
[16] David Sayce, "The Number of tweets per day in 2020 | David Sayce."https://www.dsayce.com/social-media/tweets-day/ (accessed Dec 31, 2020).
[17] Lakmal, D., Ranathunga, S., Peramuna, S., & Herath, I. (2020, May). Word embedding evaluation for sinhala. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 1874-1881).
[18] Novitasari, F., & Purbolaksono, M. D. (2021). Analysis Sentiment Aspect Level on Beauty Product Reviews Using Chi-Square and Naïve Bayes. Journal of Data Science and Its Applications, 4(1), 18-30.
[19] Xu, J., & Du, Q. (2020). TextTricker: Loss-based and gradient-based adversarial attacks on text classification models. Engineering Applications of Artificial Intelligence, 92, 103641.
[20] Lopez, W., Merlino, J., & Rodriguez-Bocca, P. (2020). Learning semantic information from Internet Domain Names using word embeddings. Engineering Applications of Artificial Intelligence, 94, 103823. [21] M. Mohandes, A. Hussain, P. Khan, and H. Nuha, “Non-Vital Privacy Compromise for Improved Services in Pilgrimage Seasons,” in 2019 IEEE RFID Conference, 2019.
[21] Chakraborty, K., Bhattacharyya, S., & Bag, R. (2020). A survey of sentiment analysis from social media data. IEEE Transactions on Computational Social Systems, 7(2), 450-464.
[22] Goularas, D., & Kamis, S. (2019, August). Evaluation of deep learning techniques in sentiment analysis from twitter data. In 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML) (pp. 12-17). IEEE. [23] M. Mohandes, A. Hussain, P. Khan, and H. Nuha, “Non-Vital Privacy Compromise for Improved Services in Pilgrimage Seasons,” in 2019 IEEE RFID Conference, 2019.
[23] Deng, X., Li, Y., Weng, J., & Zhang, J. (2019). Feature selection for text classification: A review. Multimedia Tools and Applications, 78, 3797-3816.
[24] Gen, M., & Lin, L. (2023). Genetic algorithms and their applications. In Springer handbook of engineering statistics (pp. 635-674). London: Springer London.
[25] Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-label confusion matrix. IEEE Access, 10, 19083-19095.

Downloads

Published

2024-06-29

How to Cite

Inggit Restu Illahi, & Setiawan, E. B. . (2024). Sentiment Analysis on Social Media Using Fasttext Feature Expansion and Recurrent Neural Network (RNN) with Genetic Algorithm Optimization. International Journal on Information and Communication Technology (IJoICT), 10(1), 78–89. https://doi.org/10.21108/ijoict.v10i1.905