Framework for Analyzing Netizen Opinions on BPJS Using Sentiment Analysis and Social Network Analysis (SNA)

—The Social Security Administrative Body is a legal entity established to administer social security programs. News about BPJS policies is often found online and social media that has received responses from netizens as a form of public opinion on the policy. One of them is the opinion of netizens on social media Twitter. Ideas can be positive, neutral, or negative. These opinions are processed using the Support Vector Machine (SVM) method, in some SVM studies still getting unsatisfactory results, with rates below 60%. For this reason, it is necessary to have feature selection or a combination with the other methods to obtain higher accuracy. To see the actors who influence the opinion of netizens on the topic of BPJS, the Social Network Analysis (SNA) method is used. Based on the SVM Method's test results, the best accuracy results are obtained in combining the SVM Method with Adaboost, with an accuracy rate of 92%. Compared to the pure SVM method by 91%, the Combination of SVM Particle Swarm Optimization (PSO) by 87% and SVM using Feature Selection Genetic Algorithm (GA) by 86%.


I. INTRODUCTION
The Social Security Administrative Body, Penyelenggara Jaminan Sosial (BPJS), is an agency that handles problems for the users of both BPJS Kesehatan (Health Security Administrative Body) and BPJS Ketenagakerjaan (Labor Security Administrative Body). BPJS Kesehatan is the development of PT Askes (Persero) in 2011. The state is present through the National Health Insurance -Indonesian Health Cards (Jaminan Kesehatan National-Kartu Indonesia Sehat, abbreviated as JKN-KIS). The program organized by BPJS Kesehatan among its people ensures that all Indonesian citizens are protected by comprehensive, fair, and even health insurance [1]. Meanwhile, BPJS Ketenagakerjaan is a development of PT Jamsostek (Persero) in 2011. The BPJS Ketenagakerjaan program provides benefits to workers and employers and makes an essential contribution to increasing the nation's economic growth and the welfare of the Indonesian people [2]. News related to BPJS is widely available in online media such as detik.com, kompas.com, and liputen6.com. The news concerning the BPJS received mixed responses from netizens posted in comments on online social media platforms such as Twitter, Facebook, and Instagram [3]. Statements given by netizens are positive, negative, and neutral towards the policies issued by the BPJS.
The role of netizens in commenting on online media is a form of public opinion on policy. A statement is a form of participation by netizens on news or issues that develop both online and offline. Participation done online is commonly called E-participation, which several countries use to make policies [4]. Social media is one of the places where e-participation is formed, from providing support to criticizing it on social media [5]. Previous studies related to BPJS have been conducted, concerning sentiment analysis and social network analysis (SNA), including studies [6] [7] that showed a sentiment analysis of the increase in BPJS contribution fees. Furthermore, a study also conducted [8] an analysis of the increase in BPJS contribution fees using SNA with Drone Emprit. Apart from the problem of increasing BPJS contribution fees, other researchers also conducted a sentiment analysis on BPJS services [9] [10]. These studies have only discussed the increase in BPJS contribution fees or its service to BPJS users by applying one method.
This study uses tweets taken on Twitter by using the Drone Emprit Academy. This data tweet was processed using two stages. The first stage is to conduct sentiment analysis using Support vector machines (SVM). SVM is used because it produces a fairly good accuracy in several studies [11][12] [13]. Sentiment analysis research using SVM [14] conducted several experiments in analyzing by. U. It is resulting in an accuracy value above 83%. Then another study [15] resulted in an accuracy of 93.65%. However, there are several studies that produce an accuracy of less than 80%, including 53.88% [11], 79,67% [16], and 67,83% [17]. From previous INTENSIF, Vol. Optimization) [20], combination of XGBSVM (SVM and XGBoost) [21], combination of RF + SVM (Random Forest dan SVM) [22] [23], and combination of AdaBoost + SVM [24]. This study will perform a combination of SVM PSO, AdaBoost + SVM, and the SVM using the GA feature selection to increase the accuracy generated by SVM.
The second stage was looking for the relationship between one entity unit and other entity units with the help of graph theory [25][26] [27]. The SNA method was chosen because this study required a technique. It can provide an image or visualization in the network according to the data that has been preprocessed. This SNA method can also find nodes, communities, and informal hierarchies that influence the network [26]. A complete, accurate data presentation framework was completed, and a better visualization was displayed by conducting these two stages.

II. RESEARCH METHOD
There are several stages to do this research can be seen in Figure 1. This study uses quantitative research because it has detailed and measurable character. [28]. The results obtained from sentiment analysis the accuracy were used to inform other researchers. SNA also produces the highest node from social media, Twitter. Below is an explanation of Figure 1 of the methodology flow.

Crawling Data
Crawling data was carried out on Twitter to see comments or criticisms given by netizens to BPJS. The data crawling process in this study uses Drone Emprit Academy, using BPJS as a topic, and produces 2,145 Tweet data.

Preprocessing data
The following process was to perform data preprocessing to enable the data obtained from Twitter to be read by the system. The preprocessing process in this study employed several stages as follows: a. The cleansing stage removed unnecessary characters and punctuation from the text.
Cleansing works to reduce noise in the dataset [29].
b. The next step was to do a stopword. A stopword is a common word that usually appears in large numbers and is considered to have no meaning [11].
c. Tokenization is the process of cutting or breaking a sentence into several words [29].
d. Case Folding is a process to change all documents' text to lowercase [30].
e. Stemming is the stage to make suffix words into essential words according to correct Indonesian rules [29].
3. The TF-IDF method calculates the weight of each word that is most commonly used in information retrieval. This method is also efficient, easy, and has accurate results. This After each document's weight (W) was known, a sorting process was carried out where the more significant the W value, the greater the similarity level of the document to keywords, and vice versa. 4. Processing sentiment analysis on these data using the SVM method with Adaboost, GA, and PSO. The GA method is used for feature selection to optimize the SVM parameters [31].
Problem solutions to use GA are represented as chromosomes. There are several important aspects when using GA, including: [32]: -definition of the fitness function, -definition and implementation of genetic representation, and -definition and performance of genetic operations.
Then the PSO is used because it can optimize the SVM performance [33]. PSO is used as a feature selection tool, with PSO particles will be able to provide a combination of features in a problem space [34]. Next is Adaboost, a learning ensemble often used in boosting algorithms [35]. Boosting can be combined with other classifier algorithms to improve classification performance [36]. Another study conducted a combination of SVM and Adaboost can provide good performance on unbalanced data [37]. is the shortest number of j to node k passing through node ni and gjk is the number of shortest paths between 2 nodes in the network.
c. Closeness centrality calculates the average distance between a node and all other nodes in the network. In other words, it measures the closeness of a node to other nodes. In a network with g node, the closeness centrality of these nodes was as follows: (5) Description: N is the number of nodes in the network is the number of shortest paths connecting node ni and nj. d. Eigenvector centrality is measurements that give higher weight to nodes connected to other nodes with high centrality values. The following formula was done to calculate the eigenvector centrality value of a node: -lAl (6) Description: is the normalization constant (vector scale) represents how much a node has a centrality weight in a node with a high centrality value.
A is the adjacency matrix, The amount of β is the radio power of a node. If β is positive, it has high centrality bonds and connects with central people. Meanwhile, if β is negative, it has high centrality bonds but is connected to not central people. If β=0, a degree of centrality can be obtained.
6. After getting the analysis results from SNA and sentiment analysis, the next step was concluding the findings obtained in this study.

III. RESULT AND DISCUSSION
The following is an explanation of the research conducted, in which this study carried out two different analyzes. The first analysis used was sentiment analysis, while the second was social network analysis.

Sentiment Analysis
Preprocessing that had been done aims to process data or opinions from netizens into sentiment analysis. Figure.     The selection of the proper parameters will make the genetic algorithm optimal [42]. However, some researchers make GA a Feature Selection [43][44] [45]. This study also uses GA as feature selection. GA used cannot be separated from the previous studies using SVM with a Genetic Algorithm. A previous study [46] classified Parkinson's disease using a genetic algorithm and SVM classifier. The combination of the two methods showed higher accuracy than the last survey, 91.18%.
Meanwhile, a previous study [47] resulted in an accuracy of 80% using the SVM and MFCC methods. Then, another previous study [48] conducted a sentiment analysis on television shows using SVM and SVM + GA. There was no improvement in their accuracy. Another study conducted [49] a sentiment analysis on Apple products using SVM + GA. In the SVM of that study, an accuracy of 70.00% was obtained when GA was added to SVM. There was a significant increase in the accuracy of 85.76%. It is presented in Figure 4 that the resulting accuracy was not good enough compared to SVM without a feature selection, 86%. It shows that GA has not been able to improve the accuracy in this study, which used more than 2000 data and used 70:30 data splitting. In addition, to using the GA feature selection, this study also combines SVM with Particle Swarm Optimization (PSO).   Figure 5 describes that PSO is better than GA but still lower than SVM without feature selection or combination, 87%. PSO is the simplest optimization method for modifying several parameters [41]. PSO was used because it has relatively high accuracy when combined with SVM. A previous study also used SVM and PSO, [50] comparing SVM and SVM-PSO for airline services reviews. SVM initially had an accuracy of 84,25%. After adding PSO, the accuracy increased to 87,39%. Another study [41] analyzed online transportation sentiment using SVM. The accuracy was 95.46% before adding the PSO, and it grew to 96.04% after adding the PSO. However, in this study, the combination of SVM PSO has not been able to increase the accuracy but tends to decrease compared to SVM. Apart from using GA and PSO, this study also employed Adaboost.

Figure 6. ACCURACY SVM + ADABOOST
The combination of SVM and Adaboost is the right one that can be applied in this research.
The addition of Adaboost resulted in an increase in accuracy, which was 92%. Adaboost is a learning algorithm that can increase precision for weak learning algorithms [51]. Another study using Adaboost [52] Table 2  Thus, it gets more specifications for the product in each community. The BPJS contribution fee increase network received the value modularity of 0.922.
The fourth network property is the diameter. Diameter is the distance between nodes in a network. The smaller the diameter on the web, the easier the nodes will interact because the distance between the nodes is very short. In the BPJS contribution fee increase data, the diameter value was 16, indicating that many nodes interact. The fifth network property is the Average degree. The average degree shows the value between actor relationships in a social network. The greater the average value of the moderate degree, the better since every actor in the network is connected. Therefore, the dissemination of information is wider. Data on the increase in the BPJS contribution fees got a value of 1.677.
The sixth network property Average path length is that the less the average network of accounts passed, the better because each network has a strong relationship. The value of the middle path length on user interaction regarding the BPJS contribution fee increase data was 5.4334. The last network property is the clustering coefficient. The clustering coefficient shows the actor related to network properties. Actors in network properties in the BPJS contribution fee increase data were told. Thus, the information discussed was known in advance.
c. The centrality of BPJS Contribution Fee Increase Data Table 3 presents the centrality of data on the increase in BPJS in the research conducted. The following is the comparison table. Centrality, and Eigenvector Centrality. It showed that the actor who influenced social network interaction is LailyFadillah, who excels at the value of degree centrality, betweenness centrality, eigenvector centrality, and LokadataID actors who excel at closeness centrality. The LailyFadillah account became the most influential actor from the number of interactions generated. Then, this account became a bridge for the exchanges of other actors in the network and excelled in their relationships with other influential actors in the network. The LokadataID account excels in being close to other actors around it. Thus, enabling these actors to convey information to other actors quickly. Other supporting actors also had a sizable influence in interactions on Twitter.

IV. CONCLUSION
This research produces a framework that combines two methods, namely SVM and SNA Tree, etc. Therefore, this research could still be developed by other researchers in the future to compare the accuracy, either using feature selection or a combination of methods. Then on the SNA, it can be seen that the influential actor or account in Tweets about BPJS is @LailyFadillah.
SNA in this study still uses one tool. It is necessary to research using other devices such as Drone Emprit Academy, which has many features. So the comparisons can be made regarding the results of the tools used.