Influence of Imbalanced Data on Text Classification Using Recurrent Neural Network
DOI:
https://doi.org/10.47065/bulletincsr.v6i4.996Keywords:
Deep Learning; Imbalanced Data; Recurrent Neural Networks; Gated Recurrent Unit; ResamplingAbstract
Recurrent Neural Networks (RNNs) such as LSTM and GRU are designed for sequential data. However, their performance in emotion detection is often compromised by class imbalance. This study compares LSTM and GRU architectures for classifying emotional states using a dataset of 4,386 Indonesian tweets. The dataset exhibits a mild imbalance (approximately 1.7:1) across five classes: Anger, Happy, Sadness, Love, and Fear. However, the effectiveness of these models is often hindered by class imbalance in datasets, which biases predictions toward majority classes and compromises the reliability of standard metrics. This study aims to systematically evaluate the comparison of LSTM and GRU architectures in processing imbalanced Indonesian emotional tweet data. The methodology involves evaluating these models across various resampling techniques, including Random Oversampling, SMOTE, and Near-Miss. Key findings reveal that LSTM consistently outperforms GRU in capturing complex emotional patterns. Specifically, the LSTM model combined with Random Oversampling emerged as the most robust configuration, achieving a Macro-F1 score of 71% and an accuracy of 73%. While Random Oversampling effectively enhanced minority class recognition without overfitting, SMOTE and Near-Miss introduced significant performance trade-offs. These results provide actionable insights for selecting optimal architectures and resampling strategies to mitigate imbalance-related biases in sequential classification tasks.
Downloads
References
C. Janiesch, P. Zschech, and K. Heinrich, “Machine learning and deep learning,” Mach. Learn. Deep Learn. Christ., vol. 31, pp. 685–695, 2021, doi: 10.1515/9783110791402-004.
A. Mathew, P. Amudha, and S. Sivakumari, “Deep learning techniques: an overview,” in Advances in Intelligent Systems and Computing, Springer, 2021, pp. 599–608. doi: 10.1007/978-981-15-3383-9_54.
F. M. Shiri, T. Perumal, N. Mustapha, and R. Mohamed, “A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU,” arXiv Prepr. arXiv2305.17473, 2023, [Online]. Available: http://arxiv.org/abs/2305.17473
G. Ian, Y. Bengio, and A. Courville, Deep Learning (Adaptive Computation and Machine Learning series). Cambridge: The MIT Press, 2016.
P. Kumar, R. Bhatnagar, K. Gaur, and A. Bhatnagar, “Classification of Imbalanced Data:Review of Methods and Applications,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1099, no. 1, p. 012077, Mar. 2021, doi: 10.1088/1757-899x/1099/1/012077.
A. Amin, A. Adnan, and S. Anwar, “An adaptive learning approach for customer churn prediction in the telecommunication industry using evolutionary computation and Naïve Bayes,” Appl. Soft Comput., vol. 137, p. 110103, Apr. 2023, doi: 10.1016/j.asoc.2023.110103.
R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.
T. F. Handoyo, M. Pajar, and K. Putra, “Optimasi Bobot Kelas LSTM untuk Deteksi URL Phishing pada Dataset Tidak Berimbang,” JPIT (Jurnal Penelit. Inform. dan Teknol., vol. 10, no. 1, pp. 20–36, 2025, doi: 10.30591/jpit.v10i1.8128.
F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf. Sci. (Ny)., vol. 513, pp. 429–441, Mar. 2020, doi: 10.1016/j.ins.2019.11.004.
S. Korkmaz, “Deep Learning-Based Imbalanced Data Classification for Drug Discovery,” J. Chem. Inf. Model., vol. 60, no. 9, pp. 4180–4190, Sep. 2020, doi: 10.1021/acs.jcim.9b01162.
A. B. P. Negara, H. Muhardi, and F. Sajid, “Perbandingan Algoritma Klasifikasi terhadap Emosi Tweet Berbahasa Indonesia,” J. Edukasi dan Penelit. Inform., vol. 7, no. 2, p. 242, Aug. 2021, doi: 10.26418/jp.v7i2.48198.
E. Brochu, V. M. Cora, and N. de Freitas, “A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning,” arXiv Prepr. arXiv1012.2599, 2010, [Online]. Available: http://arxiv.org/abs/1012.2599
P. S. Muhuri, P. Chatterjee, X. Yuan, K. Roy, and A. Esterline, “Using a long short-term memory recurrent neural network (LSTM-RNN) to classify network attacks,” Inf., vol. 11, no. 5, pp. 1–21, 2020, doi: 10.3390/INFO11050243.
Y. Luan and S. Lin, “Research on Text Classification Based on CNN and LSTM,” Proc. 2019 IEEE Int. Conf. Artif. Intell. Comput. Appl. ICAICA 2019, pp. 352–355, 2019, doi: 10.1109/ICAICA.2019.8873454.
S. Nosouhian, F. Nosouhian, and A. K. Khoshouei, “A review of recurrent neural network architecture for sequence learning: Comparison between LSTM and GRU,” Preprints.org, pp. 1–7, Jul. 2021, doi: 10.20944/preprints202107.0252.v1.
H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications. New Jersey: John Wiley & Sons, 2013.
C. Yang, E. A. Fridgeirsson, J. A. Kors, J. M. Reps, and P. R. Rijnbeek, “Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-023-00857-7.
F. R. A. Pratama and S. I. Oktora, “Synthetic Minority Over-sampling Technique (SMOTE) for handling imbalanced data in poverty classification,” Stat. J. IAOS, vol. 39, no. 1, pp. 233–239, 2023, doi: https://doi.org/10.3233/SJI-220080.
M. S. Shelke, P. R. Deshmukh, and V. K. Shandilya, “A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique,” Int. J. Recent Trends Eng. Res., vol. 3, no. 4, pp. 444–449, May 2017, doi: 10.23883/IJRTER.2017.3168.0UWXM.
A. Tanimoto, S. Yamada, T. Takenouchi, M. Sugiyama, and H. Kashima, “Improving imbalanced classification using near-miss instances,” Expert Syst. Appl., vol. 201, no. November 2021, p. 117130, 2022, doi: 10.1016/j.eswa.2022.117130.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Influence of Imbalanced Data on Text Classification Using Recurrent Neural Network
ARTICLE HISTORY
How to Cite
Issue
Section
Copyright (c) 2026 Rina Septiriana, Tursina Tursina

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













