Disease Detection of Dragon Fruit Stem Based on The Combined Features of Color and Texture

Dragon fruit is one of the favorite commodities in Banyuwangi Regency's agriculture. In 2019, this commodity had the fourth largest harvest area among other fruit commodities in Banyuwangi until it was exported to China. However, disease attacks often appeared in several dragon fruit plantations in Banyuwangi, and the identification system was still conventional. Many farmers did not know the types of disease and how to handle it, causing the quality and quantity of their crops to decline. Therefore, this study implemented two feature extraction methods. Both methods include color feature extraction using the color moments method and texture feature extraction using gray level co-occurrence matrices (GLCM). The methods used to develop a system that recognized or detected the three types of dragon fruit stem based on digital image processing using Support Vector Machine and k-Nearest Neighbors methods as comparison methods. The results obtained from this study indicated that the combination of the two proposed feature extraction methods could distinguish between stem rot, smallpox, and insect stings with an optimal accuracy score of 87.5% obtained by using Support Vector Machine as a classification method. Keyword— digital image processing, pitaya, support vector machine, k-nearest neighbors Abstrak— Buah naga menjadi salah satu komoditas unggulan pada bidang pertanian di Kabupaten Banyuwangi. Tahun 2019, komoditas ini memiliki luas lahan panen terbesar keempat diantara komoditas buah lainnnya di Banyuwangi hingga diekspor ke China. Namun, permasalahan serangan penyakit seringkali muncul dibeberapa perkebunan buah naga di Banyuwangi dan sistem deteksinya masih bersifat konvensional. Beberapa petani juga banyak yang belum mengetahui jenis penyakit dan cara penanganannya sehingga menyebabkan kualitas dan kuantitas hasil panennya menurun. Oleh karena itu, pada penelitian ini melakukan implementasi kombinasi dua metode ekstraksi fitur yaitu metode ekstraksi fitur warna menggunakan color moments dan metode ekstraksi fitur tekstur menggunakan gray level cooccurence matrices (GLCM) untuk mengembangkan sebuah sistem yang dapat mengenali atau mendeteksi tiga jenis penyakit pada batang buah naga berbasis pengolahan citra digital menggunakan metode Support Vector Machine dan k-Nearest Neighbors sebagai metode pembanding. Hasil yang didapatkan dari penelitian ini menunjukkan bahwa kombinasi dua metode ekstraksi fitur usulan dapat membedakan diantara penyakit busuk batang, cacar dan tersengat serangga dengan skor akurasi optimal sebesar 87,5% yang didapatkan dengan menggunakan Support Vector Machine sebagai metode klasifikasi. Kata Kunci— pengolahan citra digital, pitaya, support vector machine, k-nearest neighbors INTENSIF, Vol.5 No.2 August 2021 ISSN: 2580-409X (Print) / 2549-6824 (Online) DOI: https://doi.org/10.29407/intensif.v5i2.15287 162 INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi


I. INTRODUCTION
Dragon fruit (Pitaya) is one of the fruits of the cactus genus Hylocereus and Selenicereus.
Although this fruit comes from Mexico, Central America, and South America, it is now widely cultivated in Asian countries, such as Taiwan, China, Malaysia, and Indonesia. In Indonesia, especially in Banyuwangi Regency, dragon fruit plants are widely cultivated by farmers in several areas, such as Pesanggaran, Siliragung, Tegaldlimo, Purwoharjo, Sempu, Cluring, and Gambiran areas. Most of the vacant land in some of the area is planted with dragon fruit. Based on data collected on the official website of the Banyuwangi Regency Government, in 2019, dragon fruit became the most widely cultivated fruit commodity after Siamese oranges, bananas, and mangoes [1]. In addition, the harvested area of dragon fruit has continuously increased significantly from 2013-2019. It causes Banyuwangi to become one of the largest suppliers of dragon fruit in East Java until the harvest from this dragon fruit was exported to China in 2019 [2].
Increasing crop yields are not always directly proportional to the process faced by farmers. Some dragon fruit farmers also have experienced various problems. One of them is the attack of certain diseases. The online portal also reported that several dragon fruit plantations in Banyuwangi were stricken with smallpox, causing some time to experience a decrease in the quality of their harvests [3]. In addition to smallpox, several types of diseases attack dragon fruit plants, such as brown spot, anthracnose, scabies, stem rot, mosaic, root rot, red spot, black rot, and insect infestation [4]. Based on these diseases, some farmers and ordinary people who also grow dragon fruit know the types of diseases and how to handle them, especially in certain diseases that often attack some dragon fruit plantations in Banyuwangi smallpox, stem rot, and insects. Until now, the identification process is still using conventional methods and direct guessing.
In this regard, an expert had carried out a research study on developing a dragon fruit detection system based on an expert system [5]. An expert system was a computer program that imitated like an expert in giving decisions based on rules. The application of this system was less practical and accurate because it is vulnerable to manipulation of fillings and differences in assumptions. Therefore, a more accurate data-based system is needed based on digital image processing. Regarding the implementation of a digital image processing-based system, several previous researchers have implemented it for disease detection in other types of plants such as citrus, sugarcane, and rice using digital image processing techniques [6]- [8]. Sharif et al. [6] implemented color, texture, and geometry feature extraction methods to detect anthracnose, black INTENSIF, Vol.5 No.2 August 2021 ISSN: 2580-409X (Print) / 2549-6824 (Online) DOI: https://doi.org/10.29407/intensif.v5i2.15287 spot, canker, scab, greening, and melanose diseases. In addition to the three feature extraction methods, this study also applies a feature selection method using Principle Component Analysis, entropy, and skewness vector values based on covariance values. The proposed method can detect the type of disease in citrus with an optimal accuracy value of 97%. Another study applies a segmentation approach based on Gray Level Co-occurrence Matrix (GLCM) and LAB color moments to classify stain disease on sugarcane leaf images. The proposed algorithm can detect sugarcane leaf disease with an optimum accuracy of 93% [7]. Rakhmawati et al. implemented the Support Vector Machine (SVM) method for potato leaf disease classification based on texture and color features [9]. The proposed algorithm can classify the disease with an optimum accuracy of 80%.
Based on the problems, two feature extraction methods were implemented in this study, namely the color feature extraction method using Color Moments and the texture feature extraction method using GLCM. Researchers used both methods to develop a system that can identify or detect dragon fruit diseases on digital image processing. Based on the research done in the case of other plants, the two feature extraction methods were effectively applied for the detection or classification of plant disease types. In addition, the Support Vector Machine (SVM) algorithm was also used for the classification stage of the three proposed diseases (namely smallpox, stem rot, and insect infestation) and the k-Nearest Neighbors (kNN) method as a comparison method. Based on the proposed method, we hypothesize that this method can detect the type of disease in dragon fruit stem images well.

II. RESEARCH METHOD
The overall methodology proposed in this study includes: Image acquisition, pre-processing or pre-processing, image segmentation, feature extraction, and classification of three types of disease on dragon fruit stems based on digital image processing.  The image acquisition process is the process of taking, collecting, and preparing research datasets. Dataset retrieval used a smartphone camera that has a resolution of 4160x3120 pixels or a 13 MP camera. The location of data collection from this research was at the Dragon Fruit Plantation Center, Singojuruh and Genteng Districts, Banyuwangi. The data collection process was during the day in a place without direct sunlight exposure. In this study, researchers proposed three types of diseases for training into the system. These are including stem rot, smallpox, and insect infestation. The dataset obtained from the data collection process consists of 23 datasets of stem rot disease, 28 datasets of smallpox, and 30 datasets of insect stings. The total dataset used is 81 images. In addition to collecting datasets from research, the image acquisition stage is also preparing supporting devices and programming languages used. Furthermore, after the preparation process, the input processor dataset reading was also carried out. The dataset still used the local directory and used the Python programming language, OpenCV library, pip, pandas, Scikit-image, and matplotlib for system development. This stage is an advanced stage of the image acquisition process. In this pre-processing stage, images with RGB color space (Red, Green, Blue) were transformed into L*a*b* space.
However, before transforming to L*a*b*, RGB values were first converted into the XYZ color space [10]. For the transformation from RGB color space to LAB used the following equation: where R, G, B = RGB color intensity; X, Y, Z = XYZ color space; L = LAB intensity on channel L*; a* = Intensity L*a*b* on channel a*; b* = Intensity on channel b*; = 0.950456; = 1.008754; ; and the value of X, Y, Z is obtained from the following formula: Image segmentation is the process of separating objects affected by the disease and the background. After the transformation stage, the segmentation process was carried out from RGB to L*a*b*. Segmentation was done by determining the lowest and highest values of the a* component. After finding the lowest and highest values, the a* component image results are masked with the original image. Therefore, the image that was not exposed to masking will be changed to 255 or changed to white. The results obtained from the masking process were converted into a grayscale color space image, and a feature extraction process was carried out.

D. Feature Extraction
The feature extraction process is one of the essential steps in the field of pattern recognition.
The system to detect various patterns must determine each disease's unique characteristics or features in dragon fruit stems. In this study, two feature extraction methods were proposed, namely using color moment and GLCM. These two feature extraction methods were also implemented in other cases in detecting corn plants [8]. For the color moments method, the color mean and standard deviation (STD) features were used. The following formula was used to get the color mean value.
Where σ is the standard deviation, c is the color component, is the pixel value (i,j) in the c color component, M is the image height, N is the image width, and c, M is the mean value for the color component.
This study used two features in feature extraction using the GLCM method, namely contrast and dissimilarity values. Contrast is a measure of variations in the gray level of one pixel with adjacent pixels throughout the image [11]. The following formula was used to get the contrast value: Where |i-j| = k. Dissimilarity shows a measure that defines the variation in the intensity level of a pair of pixels in the image [11]. To find the dissimilarity value, researchers used the following formula: (7) Maximum Probability = Max ij p( , ).

E. Clarification
The classification stage of this research was the matching stage between test data or testing and training data or training. The training data here was data trained on the system through a sequence of proposed algorithms to recognize or identify specific patterns. At the same time, the test data was a new image detected and matched with the training data using the classification algorithm method. In this study, two classification methods were used to match test data and training data, namely, using the Support Vector Machine method as the proposed method and the k-Nearest Neighbors (kNN) method as a comparison method.
 Support Vector Machine (SVM) is a classification method with the main working principle, namely by determining the optimal separation plane or hyperplane to separate test data. The optimal hyperplane was obtained by maximizing the value of the margin (d) through the Lagrange multiplier of the dual problem , namely : Where C is a large penalty is given, = slack variables. The vector w that determines the optimal hyperplane can be given as a linear combination of the training data vectors. In addition to using these principles, SVM also applies the principle of kernel tricks. There are several kernels in SVM, but the Radial Basis Function (RBF) kernel is used in this study. The formula for the RBF kernel is as follows:  k-Nearest Neighbors (kNN) is a supervised classification method. The kNN algorithm assumes that all sampling data correspond to points in the n-dimensional space. kNN stores all training data and displays only if the attribute is according to the training data that is closest to the object. The standard distance formula defines the nearest neighbor of the data sampling.
This study used the Euclidean distance formula as the default distance formula of the kNN algorithm. Researchers frequently use it because of its better performance than other distance formulas [13]. The following formula works to obtain the Euclidean distance formula: (11) Another principle that determines the level of performance of the kNN algorithm is the determination of the size of the k parameter. Some researchers recommend choosing k = .
Therefore, this study tested the kNN classification process up to k = 15.

F. Performance Evaluation
Performance evaluation is the stage of measuring the system's performance developed through a series of proposed algorithms. The performance of the classification system describes how well the system identifies data. Accuracy is one of the methods used to measure a system.
Because a system is said to be good if it has a good level of accuracy. For the evaluation of the proposed system, this study used an accuracy score obtained from the following formula: Or Accuracy = correct amount of data x 100% (12) The total amount of data

III. RESULT AND DISCUSSION
In this study, a system was developed to detect three types of diseases in dragon fruit stems through a series of algorithms that have been proposed. The whole system was developed using the python programming language version 3 and was supported by several supporting libraries such as the Pandas library, Scikit-image, OpenCV, and matplotlib.

A. Pre-Processing Results
At this stage, the pre-processing stage was carried out by transforming the original image, which has an RGB color space, into an image that has an L*a*b* color space. This transformation process had been described in the method section that produces an output image, as shown in Figure 3. The use of this approach refers to references from research that had been carried out with almost the same disease object characteristics but with different plant types, namely, stain disease on sugarcane leaf image [7]. The use of this approach in this study is very effective because it can help achieve the highest accuracy of 93%. Meanwhile, in this study, we tried to apply almost the same approach. The results obtained indicate that this approach is also more effective than the first trial at this pre-processing stage. Diseased image objects were seen more clearly than other objects that are not diseased or their backgrounds. The conversion process from the RGB color space to L*a*b* went through two stages, namely from the RGB  The use of the a* component because the color of the diseased image on this layer looked more distinct than the other objects that were not diseased. It applied to all datasets used.  In a previous publication, the approach employed in the pre-processing step up to the tested segmentation succeeded in segmenting diseased objects with an accuracy score of 92.63% [14].
The results of this segmentation process also produced the same optimal accuracy as previous studies that implemented it on sugarcane disease images [7].

C. Feature Extraction Results
This study used two feature extraction methods: the color feature extraction method or color moments and texture feature extraction or GLCM. Using the color feature extraction method is because each disease has a different color, although several diseased images have almost the same color. In addition, the reason for using the texture feature extraction method is to compare or add unique features when there are color features that are almost the same as other diseases.
The use of the two feature extraction methods was also used by Kamilah et al. [7]. In this study, Meanwhile, in the research conducted by Sari et al. [8], color and texture feature extraction methods were also used. The color feature extraction method used the mean, standard deviation, and skewness values. While the texture feature extraction method uses entropy, energy, contrast, homogeneity, and correlation. These features can identify diseases in corn plants with an optimum accuracy of 89.375%. Moreover, another study is in the case of disease detection in potato leaves [9]. The textural features used are the textural features, the energy value, contrast, correlation, homogeneity, entropy, dissimilarity, and maximum probability. At the same time, the color features use the mean, standard deviation, and skewness. In this study, the optimum accuracy was 87%, which could recognize diseases in potato leaves. In this study, for the color moments method, this study used the mean and standard deviation of the RGB (Red, Green, Blue) values. Six features were discovered using the feature extraction approach, namely, The red mean value (R mean), green mean value (G mean), blue mean value (B mean), red standard deviation, value (R STD), the green standard deviation value (G STD), and the blue standard deviation value (B STD). The image used in the feature extraction stage was a segmented image with an RGB color vector value. Using the OpenCV library, these two feature extraction methods on systems developed using available libraries in the python programming language for color feature extraction were used. The feature extraction of color means and the color standard deviation is the average value of each color layer on all segmented pixels of each image. Meanwhile, in texture feature extraction or GLCM, the segmented image must first be converted to a grayscale value before calculating the contrast and dissimilarity values. Then defined the features to be used in the pro List variable, which contains the features from GLCM. After that, search for the GLCM matrix's value, which will be used to determine the value of each GLCM feature later.
The search for this matrix value uses the Scikit-image library using the greycomatrix syntax and determines the degree value to be used. After finding the matrix value, look for each GLCM   In this study, the classification process was carried out using two classification methods: the Support Vector Machine (SVM) and k-Nearest Neighbors (kNN) methods. For this stage, several experiments were carried out to test several features and the optimal classification method. In the first experiment, only three colors mean features were used, namely the average value of the red (R mean), green (G mean), and blue (B mean) values. Based on the experimental results, it was found that the optimal accuracy value generated with this feature was 56.25% using the SVM method. In the second experiment, the classification process was carried out using only the color standard deviation (R STD, G STD, and B STD). The results obtained from this experiment get an optimal accuracy score of 62.5%. The following experiment obtained an optimal accuracy score of 68.75% using six features: three-color mean features and three-color standard deviation features, namely the red mean, green mean, blue mean, red STD, green STD, and blue STD.
While the last experiment using two combination feature extraction methods, namely the color moment feature and GLCM, using the eight features mentioned in the feature extraction section above. The results obtained that the optimal accuracy score was 87.5% using the SVM method.
In experiment 1 to the last, the optimum accuracy was obtained using the SVM algorithm with the Radial Basis Function (RBF) kernel. In the RBF kernel, the tunning parameter cost (C) was also carried out, and the highest accuracy was obtained at the value of C=50. SpO2 signals [15] using the Support Vector Machine method. This finding indicated that using combination features is recommended for classification to develop precise object detection or identification systems. As a comparison, a classification process was also carried out using the kNN method. This experiment started from K = 1-15, where the classification process used a combination feature, namely, using color and texture features with a K parameter value of 11, the findings yielded an optimum accuracy score of 73.33 percent. Moreover, Figure 8 depicts the optimal accuracy score recapitulation as a result of the trials.
The classification results from the experiments were then plotted using the matplotlib library.
The resulting detection outputs only displayed diseased objects, and other than that, it is thresholded to black or a value of 0. In addition, the label of the type of disease detected is also displayed. The sample detection results are presented in Table 2, where the first row is a sample of smallpox detection, the second row is insect sting disease, and the third row is the detection result of stem rot disease.

IV. CONCLUSION
In this study, the researchers experimented with implementing two feature extraction methods. The two methods are color feature extraction using Color Moment and texture feature extraction methods using the GLCM method to develop a system that can detect three types of disease in fruit stems based on digital image processing. The three types of disease are stem rot, smallpox, and stem sting. This experiment concludes that the combined feature extraction method, namely the color moments and GLCM methods, can distinguish the three types of diseases well, where the optimal accuracy score is 87.5%. These results were obtained in experiments using the SVM classification method. While the optimum accuracy using the kNN algorithm is at 73.33%. Several things are still challenging to complete this research, especially related to the number of datasets used is still not too much. It is because the data collection process is still not optimal as a whole. Therefore, in further research, additional datasets will be used in the training process so that the best algorithm formula is from identifying the type of disease in dragon fruit stems. So that if in the future more datasets are obtained, an evaluation of the proposed algorithm will be carried out, and it is possible to develop it using a deep learning algorithm.