Adaptation of Contrastive Learning and Augmentation for Indonesian Product Review Classification on Unbalanced Data Using Deep Learning and NLP

Danang Bagus Reknadi; M. Ghofar  Rohman; Mustain; Aphila Fraga Listyo Utomo

doi:10.29407/gj.v9i2.25783

Authors

Danang Bagus Reknadi Universitas Islam Lamongan
M. Ghofar Rohman Universitas Islam Lamongan
Mustain Universitas Islam Lamongan
Aphila Fraga Listyo Utomo Guizhou Light Industry Technical College

DOI:

https://doi.org/10.29407/gj.v9i2.25783

Keywords:

Klasifikasi text, Contrastive Learning, Long Short-Term Memory , Natural Language Processing , Data Augmentation

Abstract

In the digital era, product reviews are an important source of information for consumers and businesses because they influence purchasing decisions and marketing strategies. However, the distribution of sentiment in product reviews is often unbalanced, with positive reviews dominating and negative reviews being limited. This condition poses a challenge in developing text classification models, especially for Indonesian which has a complex morphological structure and very rich vocabulary variations. This study adapts the Contrastive Learning method for the classification of unbalanced Indonesian language product reviews and tests the effectiveness of text augmentation techniques in improving representation, especially for minority classes with limited data. Data were obtained through web scraping from Indonesian e-commerce platforms, totaling around 10,000 reviews with a composition of 52% positive, 30% negative, and 18% neutral. The data was processed and expanded using augmentation techniques to significantly increase the variety and amount of training data. The LSTM model trained on the original data and the augmented data, showing an increase in validation accuracy from around 73% to almost 100% in the 30th epoch, with a final accuracy reaching 92% and an F1-Score of 90%. These results confirm that the incorporation of data augmentation is crucial to address imbalance, thereby improving the robustness and reliability of the model in product review sentiment classification

Abstract views: 788 , PDF downloads: 439

References

[1] C. A. B. Wahpiyudin, R. K. Mahanani, I. L. Rahayu, and M. Simanjuntak, “Kredibilitas Review Konsumen Pada Transaksi Di E-Commerce Sumber Informasi Dalam Keputusan Pembelian Online,” Policy Br. Pertanian, Kelautan, dan Biosains Trop., vol. 4, no. 1, pp. 199–202, 2022.

[2] S. N. Adhan, G. N. A. Wibawa, D. C. Arisona, I. Yahya, and R. Ruslan, “Analisis Sentimen Ulasan Aplikasi Wattpad di Google Play Store dengan Metode Random Forest,” AnoaTIK J. Teknol. Inf. dan Komput., vol. 2, no. 1, pp. 6–15, 2024.

[3] M. M. Mustain and E. Setiati, “Aspect Based Sentiment Analysis Data Kuesioner Di Rumah Sakit Muhammadiyah Lamongan Menggunakan Algoritma K-NN,” Joutica J. Inform. Unisla, vol. 6, no. 2, pp. 506–512, 2021.

[4] T. H. Rochadiani, “Pendekatan Transfer Learning Untuk Klasifikasi Tangisan Bayi Dengan Imbalance Dataset,” Indones. J. Comput. Sci., vol. 13, no. 2, Apr. 2024, doi: 10.33022/ijcs.v13i2.3834.

[5] S. P. Ermanto, H. Ardi, and N. Juita, Linguistik Korpus: Aplikasi Digital Untuk Kajian Dan Pembelajaran Humaniora. PT. RajaGrafindo Persada-Rajawali Pers, 2023.

[6] C. I. Liyana et al., Linguistik: Pengantar Studi Bahasa. PT. Green Pustaka Indonesia, 2025.

[7] H. F. Fadhilah and R. Kurniawan, “Keunggulan dan Tantangan dalam Penggunaan Computer Vision untuk Diagnosis Pneumonia Pediatri: A Systematic Review,” J. Biostat. Kependudukan, dan Inform. Kesehat., vol. 5, no. 1, p. 6, 2024.

[8] H. Berliana and R. Yusuf, “Analisis Sentimen Terhadap Penggunaan Donasi Korban Penyiraman Air Keras Pada Media Sosial X. Com Menggunakan Metode Bert,” J. Sci. Soc. Res., vol. 8, no. 2, pp. 1134–1142, 2025.

[9] A. Wafda, “Aspect-Based Sentiment Analysis terhadap Cuitan Platform X tentang Kurikulum Merdeka Menggunakan IndoBERT,” 2025, Universitas Islam Indonesia.

[10] D. B. Reknadi, Y. Kristian, and R. A. Harianto, “Classification of Criticisms and Suggestions on Public Services at RSI Nashrul Ummah Lamongan Using K-Competitive Autoencoder.,” in Proceeding International Conference on Environment Health, Socioeconomic and Technology, 2022, pp. 151–161.

[11] A. S. D. Pratama and N. Rijati, “Pengenalan Emosi terhadap Ulasan Pelanggan E-Commerce Menggunakan Deep Learning Berbasis Transformer,” Techno.Com, vol. 23, no. 3, pp. 532–541, Aug. 2024, doi: 10.62411/tc.v23i3.11090.

[12] M. G. Rohman, Z. Abdullah, and S. Kasim, “Hybrid Logistic Regression Random Forest on Predicting Student Performance,” JOIV Int. J. Informatics Vis., vol. 9, no. 2, pp. 852–858, 2025.

[13] N. R. Puteri and A. Meirza, “Implementasi Metode YOLOV5 dan Tesseract OCR untuk Deteksi Plat Nomor Kendaraan,” J. Comput. Sci. Vis. Commun. Des., vol. 9, no. 1, pp. 424–435, 2024.

[14] K. Boyd, K. H. Eng, and C. D. Page, “Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, Springer, 2013, pp. 451–466. doi: 10.1007/978-3-642-40994-3_29.

[15] N. N. Qomariyah, A. S. Araminta, R. Reynaldi, M. Senjaya, S. D. A. Asri, and D. Kazakov, “NLP Text Classification for COVID-19 Automatic Detection from Radiology Report in Indonesian Language,” in 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, 2022, pp. 565–569.

[16] Y. O. Sihombing, R. F. Rachmadi, S. Sumpeno, and M. J. Mubarok, “Optimizing IndoRoBERTa Model for Multi-Class Classification of Sentiment & Emotion on Indonesian Twitter,” in 2024 IEEE 10th Information Technology International Seminar (ITIS), IEEE, 2024, pp. 12–17.

[17] F. Muftie and M. Haris, “Indobert based data augmentation for indonesian text classification,” in 2023 International Conference on Information Technology Research and Innovation (ICITRI), IEEE, 2023, pp. 128–132.

[18] S. Henning, W. Beluch, A. Fraser, and A. Friedrich, “A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing,” in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2023, pp. 523–540. doi: 10.18653/v1/2023.eacl-main.38.

[19] G. O. Assunção, R. Izbicki, and M. O. Prates, “Is augmentation effective to improve prediction in imbalanced text datasets?,” arXiv Prepr. arXiv2304.10283, Apr. 2023, [Online]. Available: http://arxiv.org/abs/2304.10283

[20] X. Chen, W. Zhang, S. Pan, and J. Chen, “Solving Data Imbalance in Text Classification With Constructing Contrastive Samples,” IEEE Access, vol. 11, pp. 90554–90562, 2023, doi: 10.1109/ACCESS.2023.3306805.

[21] R. Asyrofi and R. Fauzan, “Synthetic-MixUp: A Simple Framework for Imbalanced Text classification,” in 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE), IEEE, Oct. 2023, pp. 927–929. doi: 10.1109/GCCE59613.2023.10315313.

[22] T. Chen, R. Xu, B. Liu, Q. Lu, and J. Xu, “WEMOTE-Word embedding based minority oversampling technique for imbalanced emotion and sentiment classification,” in Workshop on Issues of Sentiment Discovery and Opinion Mining, 2014.

[23] J. Tian et al., “Re-embedding Difficult Samples via Mutual Information Constrained Semantically Oversampling for Imbalanced Text Classification,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 3148–3161. doi: 10.18653/v1/2021.emnlp-main.252.

[24] F. Wang, L. Chen, F. Xie, C. Xu, and G. Lu, “Few-Shot Text Classification via Semi-Supervised Contrastive Learning,” in 2022 4th International Conference on Natural Language Processing (ICNLP), IEEE, Mar. 2022, pp. 426–433. doi: 10.1109/ICNLP55136.2022.00079.

[25] L. Qian, W. Zhao, Q. Chen, and J. Chen, “Text Classification Method Based on Approximate Nearest Neighbor Enhanced Contrastive Learning and Attention Mechanism,” in 2024 International Conference on Advanced Control Systems and Automation Technologies (ACSAT), IEEE, 2024, pp. 266–274. doi: 10.19678/j.issn.1000-3428.0068132.

[26] Y.-S. Wang, T.-C. Chi, R. Zhang, and Y. Yang, “PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Stroudsburg, PA, USA: Association for Computational Linguistics, 2023, pp. 14897–14911. doi: 10.18653/v1/2023.acl-long.832.

[27] I. A. Rahma and L. H. Suadaa, “Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 6, pp. 1329–1340, Dec. 2023, doi: 10.25126/jtiik.1067325.

[28] Y. Wang, M. Li, and R. Huang, “Data Augmentation Based on Word Importance and Deep Back-Translation for Low-Resource Biomedical Named Entity Recognition,” in 2024 IEEE 9th International Conference on Data Science in Cyberspace (DSC), IEEE, Aug. 2024, pp. 793–797. doi: 10.1109/DSC63484.2024.00119.

[29] S. Edunov, M. Ott, M. Ranzato, and M. Auli, “On The Evaluation of Machine Translation Systems Trained With Back-Translation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 2836–2846. doi: 10.18653/v1/2020.acl-main.253.

[30] R. Sutoyo, S. Achmad, A. Chowanda, E. W. Andangsari, and S. M. Isa, “PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks,” Data Br., vol. 44, p. 108554, Oct. 2022, doi: 10.1016/j.dib.2022.108554.