Implementation of TF-IDF Algorithm to detect Human Eye Factors Affecting the Health Service System

Elderly is someone whose age is around 60-74 years, at that age, one's health tends to decrease, and it has an impact on reduced perception, cognition, and psychometry. One result of cognitive decline is a decrease in memory. Programs have been provided by the Indonesian government, such as submitting information, producing brochures, and making announcements on the health services website. But this counseling is not optimal because the elderly tend to be lazy to read this because the eyes have begun to look away from other than that the eye health of the elderly has already started to decrease. So that the health information provided by the health department can be optimized, we try to make a model that is used to summarize an article so that the article is easily understood by the elderly. To summarize the article, this study uses the term frequency-inverse document frequency (TF-IDF) algorithm. By using the TF-IDF algorithm, it is hoped that the elderly will more easily read health articles. User Experience Questionnaire after the application of writing software summary is higher than before the application of writing software summary that is 25.27> 19.30.


I. INTRODUCTION
Elderly is someone whose age ranges between 60-74 years [1]; at this age, most elderly experience health problems. Health problems experienced by the elderly are hearing loss, reduced vision, and have started to be senile [1]. The elderly show age-related decline in understanding complex sentences; this is associated with a decrease in cognitive abilities [2], [3] At that age, the health of the elderly has begun to decline.
With the declining health of the elderly, the elderly often do health checks at hospitals, health centers, and clinics. Also, the Indonesian government provides counseling to the elderly to increase the elderly's knowledge about health. Information is conveyed through brochures, notice boards, and health service websites [4]. With the reduced ability to see the elderly, the Indonesian government's efforts are less useful because the elderly are lazy to read. To overcome this problem, we try to examine an application that is used to summarize the information provided by the government [5]. With this tool, it is hoped that the elderly can easily understand the information provided by the Indonesian government.
Text Mining is the discovery of information about data sets [6]. The data collected can be in the form of image data, video data, and text data. The principle for summarizing texts is to mark the passages that appear most often. Easy to summarize text into short stories application to convert data by removing unnecessary words [7].
Some studies use text mining to obtain comparisons of original data with modified data or similar data [8] [9]. Other reviews of text mining use the term frequency-inverse document frequency (TF-IDF) algorithm to provide analysis related to acupoint characteristics and identification of unknown patterns from classical medical texts [10]. The TF-IDF method can also provide data classification results as in other data mining methods such as the c.45 algorithm [11], as in research conducted by Herwijayanti regarding online news classification [12].
In this text, mining research is used to reduce unnecessary words so that the text, the user can understand with secure information on the text [13], [7]. The workings of the system to be built are the elderly photographing announcements available at hospitals, puskesmas, and clinics.
After the image is stored on an old smartphone, it will enter the image into the Vision API. Fire Vision is Google's technology for converting images into text.
The extracted text will be changed by the TF-IDF method. TF-IDF (Term Frequency -Inverse Document Frequency) algorithm is an algorithm that can be used to analyze the relationship between a phrase/sentence and a collection of documents. TF-IDF in this study was used to summarize information obtained from the brochure. TF-IDF is a method that creates the highest  [15]. By utilizing the TF-IDF method, the information collected by the elderly from the brochure can be summarized so that it is easily understood [16].

II. RESEARCH METHOD
The following figure 1 is step by step used by the system to summarize information. Details can be seen in the flowchart below. The picture is taken from a poster available at a health center, hospital, or clinic. Usually, every time there is counseling by the health department, always put up posters. Pictures were taken using a smartphone camera. After that, the image is converted into text.

Convert images into text
To convert images that contain information about health into text, Google's API, Google Cloud Vision, is used. With this technology, we can convert images into text.

Text Processing
Text output is saved in .txt extension files. This file is the original file of the poster, with the amount of text in the poster can be ascertained if the elderly are reluctant to read. Before processing with TF-IDF text processing needs to be done, here are some stages of text processing. Tokenization is a process to separate each word string in a sentence, and also includes a process to delete duplicate words, numbers, punctuation marks, characters other than letters of the alphabet & scientific symbols, and change any existing capital letters to lowercase/necessary letters, this process more precisely referred to as the process of uniforming texts, so that they have the same magnitude.

b. Stopword
Stopword is a collection of words that do not have meaning, the function of removing words that do not have the sense in this section to summarize the words that are on the poster one of the words that are considered to have no words in this study are (in order, in, to).

c. Index
This stage is used to mark the remaining words. The remaining words are then weighted so that a summary of the sentence will appear

TF-IDF
TF-IDF is a weighting/weighting process in which the weight/weight calculation will be calculated for each index term generated at the text preprocessing stage [17]. TF-IDF is an algorithm used to give relationship weight to a word. TF-IDF is a method used to measure how important a word is in a document. The frequency with which a word appears will be used to calculate how important the word is. The weight of a word will be considered large if the analyzed word appears more frequently and allows the word not to be deleted. Whereas words that have a small frequency of occurrence are very likely to be deleted because they are considered not influential. In the TF¬-IDF algorithm, the formula is used to calculate the weight (W) of each document against keywords with the formula that is [18] :

A. Sample
Based on calculations using the Slovin formula, the number of samples is 27 people. By utilizing TF-IFD, texts that are initially long and cannot be well understood by the elderly can be easily understood by the elderly. The results of the sampling test were re-tested to 5 older people; from the results of the trial, the elderly still found it difficult. Because they experience visual impairment. 3 The elderly recommends that the summary text be converted into sound. in this case using e=10%(0.1). Berikut ini adalah perhitungannya. (1)

B. Data Normality Testing
In this case, the respondent studied was a member of the elderly posyandu, known that the number of participants was 27 because there were too many, so a sample was needed to represent the population.

C. Data Normality Testing
The normality test aims to determine whether the data used in this study has a normal distribution or not. Normality testing is carried out by the Kolmogorov Smirnov One Sample test.
With the hypothesis tested as follows: H0: Data is normally distributed

Ha: Data is not normally distributed
The criterion is if the significance value is more significant than 0.05, then H0 is accepted, and Ha is rejected. If the significance value is less than 0.05, then H0 is rejected, and Ha is accepted. Normality testing is performed on each User Experience Questionnaire data before and after the application of the written software summary. The normality test results can be seen in Table 1 as follows: From the table 1, it can be seen that the significance values both in the period before and after the application of the software summary are all higher than 0.05, then H0 is accepted, and H1 is rejected, so it can be concluded that the data are typically distributed. Thus to perform different tests on User Experience Querytaire data before and after the application of written software summary, the parametric statistical method is used, namely paired sm-test t-test.

E. Hypothesis test
The purpose of this study is to determine whether the subject's understanding after the application of the written software summary is more significant than before the application of the written software summary. For achieving the research objectives and test the proposed research hypotheses, paired sample t-tests will be conducted on the User Experience Querytaire data. To process the data used computer aids with the SPSS 23.0 program with the following hypothesis: H0: µ ≤ 0: The value of the User Experience Questionnaire after the application of the written software summary is equal to or smaller than before the expected written software summary.
Ha:µ > 0: The value of the User Experience Questionnaire after the application of the written software summary is more significant than before the expected writing software summary.
If the paired sample t-test produces significance ≥ 0,05, then Ha is rejected, and H0 is accepted. If the resulting significance value < 0,05, then Ha is approved, and H0 is rejected. The following will be presented with the results of paired sample t-tests in the User Experience Questionnaire data. Table 2 is the results of the paired sample t-test of User Experience Questionnaire data before and after the application of the written software summary: Judging from the mean (average) value, the User Experience Questionnaire after the application of the written software summary is more significant than before the application of the written software summary, namely 25.27> 19.30. Based on the results of data analysis using paired sample t-test, it can be seen that with the hope that the summary software is beneficial to increase understanding of an object written.

IV. CONCLUSSION
The TF-IDF algorithm is useful for encapsulating text from health information. The problem faced in this research is to make a good flow system so that it is easy to use by the elderly.
Twenty-seven sampling test results, three elderly suggest output is not text, but sound. It is due to the vision of the elderly have started to run away. Judging from the mean (average) value, the User Experience Questionnaire after the application of the written software summary is more significant than before the application of the written software summary, namely 25.27> 19.30.
Based on the results of data analysis using paired sample t-test, it can be seen that the application of a written software summary is beneficial in increasing the understanding of an object written.