Translation of the Lampung Language Text Dialect of Nyo into the Indonesian Language with DMT and SMT Approach

— Research on the translation of Lampung language text dialect of Nyo into Indonesian is done with two approaches, namely Direct Machine Translation (DMT) and Statistical Machine Translation (SMT). This research experiment was conducted as a preliminary effort in helping students immigrants in the province of Lampung, translating the Lampung language dialect of Nyo through prototypes or models was built. In the DMT approach, the dictionary is used as the primary tool. In contrast, in SMT, the parallel corpus of Lampung Nyo and Indonesian language is used to make language models and translation models using Moses Decoder. The result of text translation accuracy with the DMT approach is 39.32%, and for the SMT approach is 59.85%. Both approaches use Bilingual Evaluation Understudy (BLEU) assessment


I. INTRODUCTION
Lampung Province is a province located at the entrance gate to the island of Sumatra.
Lampung Province has a wealth of culture, one of which is the Lampung language and Lampung script. In general, in Lampung province, there are two main dialects, namely the fire dialect and the Nyo dialect. The Lampung provincial government has great concern for the Lampung language. The provincial government continues to make various efforts to preserve and maintain the Lampung language. The Government of Lampung, through Governor Regulation number 39 of 2014 concerning Lampung Language and Script Subjects, stipulates that the Lampung language is a mandatory local content at the primary to senior secondary education unit levels and is supported by the availability of textbooks ranging from elementary, junior high and high school, along with the Lampung language dictionary. The Lampung language, both the fire dialect and the Nyo dialect, is used by the people of Lampung to communicate daily both in the family environment and at formal events. The Lampung language belongs to the Austronesian class in the Polynesian Malay language family. The two main dialects are dialect A (api) and dialect O (Nyo), which refers to the word 'Apa' [1].
For immigrants who send their children to school in Lampung province, one of the subjects learned at the SD, SMP, and SMK / SMA is the Lampung language. Parents of immigrants indeed find it challenging to help their children learn the Lampung language because it is not the local language itself. In response to this, academics at the Technocrat University of Indonesia and the University of Lampung are trying to find a solution. Through this research, it is expected to try to provide an initial solution to solve this problem by making a prototype with the Python programming language or the Lampung language translator model, especially for the Nyo dialect. There are two approaches to build this solution, namely Direct Machine Translation (DMT) and Statistical Machine Translation (SMT). In this study, only careful observations were made of the Lampung language, Nyo dialect.
The way to translate the Lampung language text Nyo dialect can be done by using a dictionary. In this way, it will be tiring both for parents or students because they repeatedly see the words that need to be searched in the Lampung language dictionary. Research on translating the Lampung language text in the Nyo dialect has never been carried out on a dictionary basis.
The approach to building machine translation can be made in three approaches, namely (1) direct approach or DMT using a dictionary, (2) a rule-based approach or Rule-based Machine Translation (RBMT) using a series of rules in the language, and (3) a data-driven approach that uses a parallel corpus [2]. In the DMT research for the Lampung language, the Nyo dialect, the main component needed is a bi-dictionary Indonesian Lampung language. The construction of a translating machine with a rule-based approach will require rules for analyzing sentences in the source language, rules for transforming the representation of the source language analysis results, and rules for generating sentences in the destination language. The construction of a translation machine using a datadriven or parallel corpus-based approach requires sentence pairs between the original language and the destination language [2]. Research on the translation of the Lampung language in the Api dialect has been carried out, using a parallel corpus in the form of 3000 Lampung language sentence pairs and their translation in Indonesian, using the Neural Machine Translation (NMT) method without the Attention mechanism [3] and the Neural Machine Translation (NMT) method with the Attention mechanism [4]. Statistical Machine Translation (SMT) research using a parallel corpus in the form of 3000 sentence pairs in the Lampung Dialek Api language and their translation in Indonesian has been carried out [5]. In this study, the dictionary is used as a database to build the DMT and a parallel corpus of the Lampung language Nyo dialect. Its translation in Indonesian is used to build the SMT model. Meanwhile, the Lampung language research from the speech research aspect was carried out for the first time [6].
The studies related to DMT developed are research conducted in India [7], where this research is based on a dictionary from Kannada to Telugu. The study was conducted in Sri Lanka, where the study was based on a dictionary from Pali to Sinhala [8]. Research conducted in Indonesia is based on a dictionary from Indonesian to Balinese using Android [9]. The research results are in the form of applications that can be installed on an Android smartphone.
Research conducted in Indonesia is based on a dictionary from Indonesian to Javanese using mobile [10]. As for SMT, research on machine translation in Indonesia has been carried out by researchers including translation of Javanese and Indonesian with phrase-based SMT [11].
Research on translation of Sundanese into Indonesian using phrase-based SMT and utilizing the part of speech (PoS) Tag [12], Indonesian-Dayak Taman translation research with root word markings and affixes was carried out at the University of Tanjungpura [13], investigative research on the role of language models in the Indonesian-Dayak Kanayatn SMT research [14].
Research on the effect of corpus quantity on Bugis language SMT research Wajo into Indonesian [15], research on various models of translating Indonesian into Japanese has also been carried out [16], research on the measurement of translation results produced by machine translators using the Bilingual Evalution Understudy (BLEU) score calculation [17], observing the morphological aspects of language Lampung has been carried out by Lampung language researchers [1], while the references are few SMT research taken explicitly from the SINTA 2 accredited journal, namely efforts to improve the accuracy of the machine translator statistics in INTENSIF, Vol. Javanese to Indonesian with a lexical model probability improvement approach [18], experiments using Pivot Language SMT from English to Malay Sambas [19], influence research Dictionary lookup method on corpus cleaning on the accuracy of Indonesian-Malay Pontianak SMTs [20], observing the effect of increasing the accuracy of Indonesian-Minang SMT using the EWSB algorithm [21], and comparative research on the accuracy value of the smooting algorithm at SMT Indonesia -Melayu Sambas with the IRSLTM Language model toolkit [22].
By making a prototype application for translating the Lampung language, Nyo dialect, which maintains the Lampung language dictionary and the SMT model in Lampung, the Nyo dialect is expected to be the first way to help immigrant students in translating the Lampung language Nyo dialect. The dictionary acts as a database in making a prototype for translating the Lampung language Nyo dialect. The prototype was made using the Python programming language as a programming language that is reliable in processing data in the form of text and is open source for its use. Meanwhile, in building the SMT model, the parallel corpus of the Lampung language Nyo dialect and its translation in Indonesian plays a role as a raw material in making translation models and language models in SMT. This study aims to make a DMT prototype and an SMT model that can translate paragraph texts in Lampung dialect language that can translate paragraph texts in Lampung dialect of Nyo and analyze the translation results with Bilingual Evaluation Understudy (BLEU).

A. The Subject / Material Studied
The subjects/materials in this study were the Lampung language dictionary Nyo dialect and the parallel corpus Lampung language Nyo dialect to Indonesian. The dictionary used is the The SMT pre-processing phase in the Moses Decoder consists of sentence alignment, tokenization, cleaning, lowercase filtering, and actual case. Sentence alignment aligns the parallel corpus of the Nyo dialect with Indonesian as its translation. Tokenization is needed to provide spacing between words, including spacing between words and existing punctuation marks, while lowercase is a process to uniform the letters' case. In this proper casing process, each beginning of each sentence is converted to the most likely place. Cleaning is the process of limiting sentence length. Cleaning also functions to remove misaligned sentences. The next phase is the training phase. It is in this phase that the language model and model translation are carried out.
Language model using software, in this study used KenLM, which has been integrated into The research on the algorithm is for translating the text of a word/sentence/paragraph from Lampung from the Nyo dialect to Indonesian. The data collection technique scenario for testing DMT and SMT was through a random selection of sentences in the Lampung language Nyo dialect, which had been translated by native speakers of the Nyo dialect. Details of the test sentences are provided at the following link https://bit.ly/37YBNDe.

E. The analysis assesses the translation results
Evaluation of translation results is done by comparing the translated sentences with the reference sentences using the Bilingual Evaluation Understudy (BLEU) application available on the Moses Decoder. BLEU is an algorithm aimed to evaluate the quality of the translation results that have been translated by a machine from a source language to a destination language.
BLEU measures the modified statistical precision score between the translation results automatically and the reference translation using a constant called the brevity penalty (BP) [17].

III. RESULT AND DISCUSSION
Information on the accuracy of the results obtained from translating the Lampung language Nyo dialect into Indonesian using the DMT and SMT approaches is given in table 1.

A. DMT Testing Result
Translation of words or sentences, or paragraphs from the Lampung language Nyo dialect can be done through a prototype that has been made using Python 2.7 in the form of a console.
The prototype test for the Lampung language translator application -Indonesian was carried out by using more than one single sentence in the Lampung language, Nyo dialect. A list of twentyfive test sentences is given at the https://bit.ly/37YBNDe.
The results of translation, as shown in Figure 3 below, from the prototype application show that the application can translate, as stated in the application's database. If there are words that are not in the database, it shows that the application will give results in the initial/original word. The accuracy obtained through the DMT approach in testing the 25 test sentences obtained a value of 39.32%, meaning that the DMT application is only able to translate words that are already in its database. The use of the dictionary is considered insufficient due to the limitations of the words  prototype. The application is only able to translate words from the Lampung language Nyo dialect that have been found in the application database. The application only displays the input in the application as for symbols other than words and other words that are not in the database. It is proven that the dictionary used in this study still lacks words that should be present.

B. SMT TESTING RESULT
The implementation of SMT on the Moses Decoder, for the translation experiment from the Lampung language Nyo dialect to Indonesian, can be seen in detail at the link https://bit.ly/37YBNDe. Various steps have been taken to produce a language model and a translation model that can be used to translate the Lampung language, Nyo dialect. The training data used in this study were a number of the parallel corpus as many as 4057 sentence pairs from the Lampung language Nyo dialect to Indonesian and 13759 mono corpora Indonesian.
After the data training was carried out, the testing was carried out using twenty-five test sentences in the Nyo Lampung dialect made by speakers of the Nyo dialect Lampung language.
Twenty-five test sentences are given in full on the https://bit.ly/37YBNDe.
The results of testing sentences in the Lampung language, Nyo dialect with SMT, get an accuracy value of 59.85%, as shown in Table 1   were obtained whose SMT translation gave the same results as the reference sentences given.

IV. CONCLUSION
In this experiment, it is proven that the translation of the Lampung language Nyo dialect into Indonesian can be done using the DMT and SMT approaches. The accuracy results can be seen from the BLEU value obtained, namely, the DMT approach 39.32% and the SMT approach 59.85%. The DMT approach is useful in translating words that are already in the database but cannot capture aspects of the meaning of a given test sentence. The SMT approach can learn from the training data provided and is also able to accommodate the meaning of the test sentence so that it can be said that the SMT approach gives better results than the DMT approach.