Word Stemming of Lampung Dialect Nyo using N-Gram Stemming

Parjito  Parjito; Zaenal Abidin; Akmal Junaidi; Wamiliana Wamiliana; Favorisen R. Lumbanraja; Farida Ariyani

doi:10.29407/intensif.v10i1.25364

Authors

Parjito Universitas Teknokrat Indonesia https://orcid.org/0009-0006-9790-6459
Zaenal Abidin Universitas Teknokrat Indonesia https://orcid.org/0000-0003-4237-7167
Akmal Junaidi Universitas Lampung https://orcid.org/0000-0003-1030-6954
Wamiliana Universitas Lampung https://orcid.org/0000-0002-3740-7950
Favorisen R. Lumbanraja Universitas Lampung https://orcid.org/0000-0002-1790-831X
Farida Ariyani Universitas Lampung https://orcid.org/0000-0003-0937-0043

DOI:

https://doi.org/10.29407/intensif.v10i1.25364

Keywords:

Stemming, Dialect of nyo, N-Gram Stemming, Threshold, Translation

Abstract

Background: Previous translation systems for the Lampung dialect of nyo to Indonesian achieved bilingual evaluation understudy (BLEU) scores below 40%, primarily due to challenges in processing affixed words. Objective: This research aims to perform stemming on affixed words in the Lampung dialect of nyo to enhance the performance of the translation system. Methods: We developed an n-gram stemming approach that reduces affixed words to their base forms by measuring similarity between n-grams using the Dice coefficient method. When similarity exceeds a specified threshold, the system identifies the corresponding base word. Results: Using a dataset of 700 words from the Lampung dialect of nyo, we constructed a comprehensive stemmer covering all affix variations. The optimal threshold was determined to be 0.5, achieving bigram accuracy of 93.86% and trigram accuracy of 89.14%. These accuracy levels demonstrate the method's effectiveness in identifying base word forms, which directly impacts translation quality improvement. Conclusion: N-gram stemming with a 0.5 threshold effectively processes the Lampung dialect of nyo morphology and shows potential for enhancing translation accuracy. This work represents the first comprehensive stemming system specifically designed for the Lampung dialect of nyo, contributing to the development of natural language processing tools for underrepresented regional languages in Indonesia.

Downloads

Download data is not yet available.

Abstract views: 44 , PDF downloads: 36

References

[1] W. Hermawan, N. Eko, N. Udin, W. Akhyar, and E. Sanusi. “Sistem Morfologi Verba Bahasa Lampung Dialek Tulang Bawang”, Jakarta, Indonesia: Pusat Pembinaan dan Pengembangan Bahasa, Departemen Pendidikan Nasional, 2001.

[2] F. Ariyani, N. E. Rusminto, Sumarti, A. R. Idris, and L. Misliani, “Examining the Forms and Variations of the Lampung Script in Ancient Manuscripts,” WSEAS Trans. Environ. Dev., vol. 18, pp. 204–217, 2022, doi: 10.37394/232015.2022.18.22.

[3] Z. Abidin, A. Junaidi, and Wamiliana, “Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review,” J. Inf. Syst. Eng. Bus. Intell., vol. 10, no. 2, pp. 217–231, Jun. 2024, doi: 10.20473/jisebi.10.2.217-231.

[4] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed. Tools Appl., vol. 82, no. 3, pp. 3713–3744, Jan. 2023, doi: 10.1007/s11042-022-13428-4.

[5] J. Singh and V. Gupta, “A systematic review of text stemming techniques,” Artif. Intell. Rev., vol. 48, no. 2, pp. 157–217, Aug. 2017, doi: 10.1007/s10462-016-9498-2.

[6] J. Singh and V. Gupta, “Text stemming: Approaches, applications, and challenges,” ACM Comput. Surv., vol. 49, no. 3, Sep. 2016, doi: 10.1145/2975608.

[7] A. Jabbar, S. Iqbal, M. I. Tamimy, S. Hussain, and A. Akhunzada, “Empirical evaluation and study of text stemming algorithms,” Artif. Intell. Rev., vol. 53, no. 8, pp. 5559–5588, Dec. 2020, doi: 10.1007/s10462-020-09828-3.

[8] J. Asian, H. E. Williams, and S. M. M. Tahaghoghi, “Stemming Indonesian,” in Conferences in Research and Practice in Information Technology Series, 2005, pp. 307–314. doi: 10.1145/1316457.1316459.

[9] A. Z. Arifin, H. T. Ciptaningtyas, P. Adhi, and K. Mahendra, "Enhanced confix stripping stemmer and ants algorithm for classifying news document in indonesian language." In The International Conference on Information & Communication Technology and Systems, vol. 5, pp. 149-158. 2009.

[10] F. Amin, W. Hadikurniawati, S. Wibisono, H. Februariyanti, and J. S. Wibowo, “A Hybrid Method of Rule-based and String Matching Stemmer for Javanese Language” J. Theor. Appl. Inf. Technol., vol. 15, p. 19, 2017.

[11] M. A. Nq, L. P. Manik, and D. Widiyatmoko, “Stemming Javanese: Another Adaptation of the Nazief-Adriani Algorithm,” in 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2020, Institute of Electrical and Electronics Engineers Inc., Dec. 2020, pp. 627–631. doi: 10.1109/ISRITI51436.2020.9315420.

[12] S. I. Melia, J. Sholihah, D. Nisak, I. S. Juniaristha, and A. T. Ni’mah, “The Ngoko Javanese Stemmer uses the Enhanced Confix Stripping Stemmer Method,” Rekayasa, vol. 16, no. 1, pp. 107–112, Apr. 2023, doi: 10.21107/rekayasa.v16i1.19308.

[13] N. W. Wardani and P. G. S. C. Nugraha, “Stemming Teks Bahasa Bali dengan Algoritma Enhanced Confix Stripping,” Int. J. Nat. Sci. Eng., vol. 4, no. 3, pp. 103–113, Dec. 2020, doi: 10.23887/ijnse.v4i3.30309.

[14] M. Agus, P. Subali, and C. Fatichah, “Kombinasi Metode Rule-based and N-Gram Stemming untuk Mengenali Stemmer Bahasa Bali,” vol. 6, no. 2, pp. 219–228, 2019, doi: 10.25126/jtiik.201961105.

[15] J. Elektronik, I. K. Udayana, I. Gede, A. P. Arimbawa, N. Agus, and S. Er, “Lemmatization in Balinese Language”. Jurnal Elektronik Ilmu Komputer Udayana p-ISSN 2301: 5373, 2017, doi: 10.24843/JLK.2020.v08.i03.p04.

[16] I.P.M. Wirayasa, I.M.A. Wirawan, and I.M.A. Pradnyana, "Algoritma Bastal: Adaptasi Algoritma Nazief & Adriani Untuk Stemming Teks Bahasa Bali," Jurnal Nasional Pendidikan Teknik Informatika: JANAPATI, 8(1), pp.60-69, 2019.

[17] P. Gede Surya Cipta Nugraha and N. Wayan Wardani, “Stemming Dokumen Teks Bahasa Bali Dengan Metode Rule Base Approach,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 7, no. 3, pp. 510-521, 2020, doi: 10.35957/jatisi.v7i3.538.

[18] F. H. Rachman, N. Ifada, S. Wahyuni, G. D. Ramadani, and A. Pawitra, “ModifiedECS (mECS) Algorithm for Madurese-Indonesian Rule-Based Machine Translation,” in 2022 International Conference of Science and Information Technology in Smart Administration, ICSINTESA 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 51–56. doi: 10.1109/ICSINTESA56431.2022.10041470.

[19] E. Lindrawati, E. Utami, and A. Yaqin, “Comparison of Modified Nazief&Adriani and Modified Enhanced Confix Stripping algorithms for Madurese Language Stemming,” INTENSIF J. Ilm. Penelit. dan Penerapan Teknol. Sist. Inf., vol. 7, no. 2, pp. 276–289, Aug. 2023, doi: 10.29407/intensif.v7i2.20103.

[20] Enni Lindrawati, Ema Utami, and A. Yaqin, “ANoM STEMMER: Nazief & Andriani Modification for Madurese Stemming,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 7, no. 6, pp. 1341–1347, Dec. 2023, doi: 10.29207/resti.v7i6.5086.

[21] I. Setiawan and H. Y. Kao, “SUSTEM: An Improved Rule-based Sundanese Stemmer,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 23, no. 6, Jun. 2024, doi: 10.1145/3656342.

[22] A. Ardiyanti Suryani, D. Hendratmo Widyantoro, A. Purwarianti, and Y. Sudaryat, “The rule-based sundanese stemmer,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 17, no. 4, Jul. 2018, doi: 10.1145/3195634.

[23] A. Maesya, Y. Arifin, A. Zahra, and W. Budiharto, “Development of Sundanese Stemmer Based on Morphophonemics,” in 10th International Conference on ICT for Smart Society, ICISS 2023 - Proceeding, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICISS59129.2023.10291840.

[24] A. Sutedi, R. Elsen, and M. R. Nasrulloh, “Sundanese Stemming using Syllable Pattern,” J. Online Inform., vol. 6, no. 2, p. 218, Dec. 2021, doi: 10.15575/join.v6i2.812.

[25] S. H. Wibowo and S. Wibowo, “Development of Stemming Algorithm for Rejang Language Stemmer Based on Rejang Language Morphology,” Artic. J. Adv. Res. Dyn. Control Syst., vol. 11, 2019.

[26] S. H. Wibowo, R. Toyib, M. Muntahanah, and Y. Darnita, “Time complexity in rejang language stemming,” J. INFOTEL, vol. 14, no. 3, pp. 174–179, Aug. 2022, doi: 10.20895/infotel.v14i3.764.

[27] R. Sovia, S. Defit, Yuhandri, and Sulastri, “Development of natural language processing on morphology-based Minangkabau language stemming algorithm,” Indones. J. Electr. Eng. Comput. Sci., vol. 31, no. 1, pp. 542–552, Jul. 2023, doi: 10.11591/ijeecs.v31.i1.pp542-552.

[28] R. Sovia, S. Defit, and Yuhandri, “Development of the Minangkabau Local Language Translation Machine Based on Stemming,” in Proceeding - 2022 International Symposium on Information Technology and Digital Innovation: Technology Innovation During Pandemic, ISITDI 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 195–198. doi: 10.1109/ISITDI55734.2022.9944457.

[29] Yusra, M. Fikry, and Hendi, "Stemmer bahasa melayu riau berdasarkan aturan morfologi." In Seminar Nasional Teknologi Informasi Komunikasi dan Industri, 2021, pp. 118-124.

[30] Z. Abidin, A. Wijaya, and D. Pasha, “Aplikasi Stemming Kata Bahasa Lampung Dialek Api Menggunakan Pendekatan Brute-Force dan Pemograman C#,” J. MEDIA Inform. BUDIDARMA, vol. 5, no. 1, p. 1, Jan. 2021, doi: 10.30865/mib.v5i1.2483.

[31] Z. Abidin, A. Junaidi, Wamiliana, F. R. Lumbanraja, D. Kurniasari, and R. I. Borman, “Rule-Based Dialect of Tulang Bawang Stemmer,” in 2025 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS), IEEE, Feb. 2025, pp. 1–6. doi: 10.1109/ICADEIS65852.2025.10933405.

[32] A. Guterres, Gunawan, and J. Santoso, “Stemming Bahasa Tetun Menggunakan Pendekatan Rule Based,” Teknika, vol. 8, no. 2, pp. 142–147, Oct. 2019, doi: 10.34148/teknika.v8i2.224.

[33] A. Maesya, A. Ramadhan, E. Abdurachman, A. Trisetyarso, and M. Zarlis, “Stemming Algorithm for the Indonesian Language: A Scientometric View,” in 2022 IEEE Creative Communication and Innovative Technology, ICCIT 2022, Institute of Electrical and Electronics Engineers Inc., 2022. doi: 10.1109/ICCIT55355.2022.10119050.

[34] Z. Abidin, P. Permata, and F. Ariyani, “Translation of the Lampung Language Text Dialect of Nyo into the Indonesian Language with DMT and SMT Approach,” INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, vol. 5, no. 1, pp. 58–71, Feb. 2021, doi: 10.29407/intensif.v5i1.14670.