Analysis of Information Technology (IT) Goods Sales Patterns Using the FP-Growth Algorithm

Received Oct 24, 2021 Revised Dec 21, 2021 Accepted Jan 12, 2022 Determination of sales patterns is very important in marketing. Sales pattern serves to conduct an effective analysis in improving marketing. Sales analysis aims to explore new knowledge that can help design effective strategies by utilizing sales transaction data. This study processes sales data for 12 transaction days containing 47 items using the Fp-Growth algorithm. The results of this study are items with a minimum value of support > 0.10 and confidence 0.60 and will be compared with testing data using RapidMiner to test whether the results are valid so that the test results can help in designing sales strategies. Keyword:


INTRODUCTION
A place that sells IT goods or what we know as a computer shop has many items that are sold at the store such as laptops, computer accessories, PC components, service services, and much more. Sales are the activity of selling products or services, where the process is that the seller of goods and services gives ownership to the buyer at a certain price. Sales can be done through various methods, such as selling directly to consumers, creating stores in several places to expand sales reach, through sales agents, and many other sales methods.
Discussing sales is one of the right ways to get or explore new information to assist in sales by applying the use of data mining. Data mining itself is a process of collecting important information from data using complex processes and techniques which will identify and extract useful knowledge from data. In data mining, the association pattern is one of the interesting functions for extracting data, because it has been widely used in everyday life, especially those related to transaction data.
In this study, researchers took a sample of data at a computer shop, which is more precisely at an x-com counter that sells IT goods, where we will analyze the pattern of selling IT goods using an algorithm that will produce knowledge from the analysis carried out. Based on the results of sales transaction analysis, where data mining is used to analyze sales data by looking at the relationship between data, an appropriate algorithm is used, namely the Frequent Pattern Growth (FP-Growth) algorithm. The FP-Growth algorithm is a development of the Apriori method as an alternative to determine the data set that appears most often (Frequent itemset) in a data set and then generates a tree data structure or called the Frequent Pattern Tree (FP-Tree) [1]. IT Jou Res and Dev, Vol.6, No.2, March 2022 : 130 -141 Rizky, Analysis of Information Technology (IT) Goods Sales Patterns Using the FP-Growth Algorithm 131 Sales data will be processed to find frequent itemset using the FP-Growth algorithm method with FP-Tree construction structure, then look for sales patterns using association rules to find support and confidence in related items. And later the results of the analysis using the FP-Growth algorithm will be compared with the test results using the RapidMiner application to find out whether the analysis results are by the test results using the application.

RESEARCH METHOD
This study aims to analyze sales patterns using the FP-Growth algorithm where the results of this study can assist in designing effective strategies by utilizing sales transaction data. In this section, the research uses transaction datasets with various items purchased by consumers. The dataset will be processed to find frequent itemset using the FP-Growth algorithm method with the FP-Tree construction structure, then look for sales patterns using association rules to find support and confidence in related items.

Knowledge Discovery In Database (KDD)
Knowledge Discovery in Database (KDD) is a process carried out to obtain knowledge in the database. Data Mining and KDD are often used interchangeably because Data Mining is the most important part of the KDD process [2]. KDD is another name for Data Mining although the two terms have different concepts but are related to each other, and one of the stages of the overall KDD process is Data Mining which is the core of the KDD process [3]. The KDD process is ultimately a data mining method to extract patterns from data. Each method has a different purpose, which determines the outcome of the KDD process completely [4].

Data Mining
Data mining is the process of sifting through enormous amounts of data kept in repositories to find new significant correlations, patterns, and trends utilizing pattern recognition tools as well as mathematical and statistical techniques. With only a few queries to the original database, FP-Growth can find the frequency of itemsets, and its method is the most efficient. Furthermore, if the number of potential itemsets is too great, FP-Growth can help avoid difficulties. To arrange data, FP-Growth employs the special prefix Tree (FP-Tree) [5].
Data mining is a new technology that can assist firms in locating critical information in their data warehouses [6]. Data mining is the study of strategies for extracting knowledge or finding patterns in data. Data mining is a technique for extracting hidden patterns from large amounts of data. The outcomes of this data mining method's data processing can be used to make future judgments. Pattern recognition is another name for data mining. KDD or Knowledge Discovery from Data, is a structured process, as follows: [7].
1. Data cleaning is the process of removing noise and incorrect data from data. 2. Data integration is the process of bringing together data from several sources. 3. Data selection refers to the process of picking data from a database based on the analysis' goals. 4. Data transformation is the process of converting data from one format to another that is suited for mining. 5. Data mining is a crucial procedure that employs a specific way to extract a pattern from data. 6. The process of discovering patterns is called pattern evaluation. 7. Knowledge Presentation is the method through which the information received is then used by the data owner, and it is the one that can portray the information required.

Algoritma Frequent Pattern-Growth (FP-Growth)
The FP-Growth algorithm is an alternative algorithm that is quite effective for finding the most frequently occurring data set (frequent itemset) in a large data set [5]. An algorithm commonly used in finding frequent itemsets including the a priori algorithm and the Frequent Pattern-Growth (FP-Growth) algorithm. In this study, we will discuss how to search for frequent itemsets using the Frequent Pattern-Growth (FP-Growth) algorithm to determine the frequent itemset (data that appears most often) from a dataset. The Frequent Pattern-Growth (FP-Growth) algorithm is an alternative to candidate generation for finding the most often recurring data collection. Frequent Pattern-Growth (FP-Growth) reduces the original data by creating a highly compressed Frequent Pattern Tree (FP-Tree) data construct. The algorithm Frequent Pattern-Growth (FP-Growth) analyzes the same database repeatedly. We can get a frequent 1-item-set in the first database scanning, and we can filter non-frequent item databases in the second database scanning; the rest, a Frequent Pattern Tree (FP-Tree) is built simultaneously. Finally, the Frequent Pattern Tree can be used to obtain association rules (FP-Tree) [8].
The mining process using FP-Growth does not require candidate generation. FP-Growth adopts a divide-and-conquer strategy. To produce broad patterns, FP-Growth uses Frequent Pattern Tree (FP-Tree) with only two database scans. The first step is to locate common things, and the second is to create the FP-Tree. The steps to find the pattern is as follows: [9] 1. Scans transaction data, then collects the frequency of each item. Then items that do not meet the minimum support threshold must be removed. 2. Sort the items on each transaction from the highest frequency. 3. Building an FP-Tree starts with root and reads all items in each transaction. When a transaction has the same prefix as the previous transaction, then a row can be added to the same node, and that will increase the support amounts to the nod. After that, if there is no similar prefix, a new row will be created which has a value of one for the number of supports on each node. 4. The next process search pattern uses the FP-Growth algorithm based on the FP-Tree that has been developed in phase. There are three steps, conditional pattern base generator, conditional FP-Tree generator, and frequently forming itemset. Excavation of frequent itemsets using the FP-Growth algorithm will be carried out by generating a tree data structure (FP-Tree). The FP-Growth method can be divided into 3 main stages, namely as follows: [10] 1. Conditional pattern base generation stage Path prefixes and suffix patterns are stored in the Conditional Pattern Base subdatabase. The previously constructed FP-tree is used to generate the conditional pattern base generation.

FP-Tree conditional generation stage
The support counts of each item in each conditional pattern base are added together at this point, and each item with a bigger support count equal to the minimum support count is constructed using a conditional FP-tree.

Frequent itemset search stage
If the conditional FP-tree has a single path, combining items for each conditional FP-tree yields a frequent itemset. If there isn't a single path, the FP-growth is generated recursively.

Association Rules
Association rule is a process in data mining to determine all associative rules that meet the minimum requirements for support (minsup) and confidence (minconf) in a database. These two conditions will be used for interesting association rules compared to the predetermined limits, namely minsup and minconf. Rule of Association A process for determining correlations between objects in a dataset is known as mining. It starts by looking for the itemset's frequency, which is the most common combination in an itemset that must meet the minsup [11].
Association rules mining is a procedure for finding relationships between items in a dataset. It starts by looking for a frequent itemset, which is the combination that occurs most often in an itemset and must meet the minimum support (minsup). Association rules are one way for finding patterns that frequently emerge among several demand transactions, where each request consists of several items, in order to aid in the analysis of products demand by identifying patterns between items in every request for goods that occurs [12].
This measure will determine whether itemsets can be searched for confidence values. The following is the formula for calculating support item A.

Support (A) =
(1) Meanwhile, to determine the support of two items, namely item A and item B, a formula is used.
Confidence is a measure that shows how big the association between 2 products that are requested simultaneously from all requests that contain one of these goods. The following is the formula for calculating the confidence of product A and product B.

FP-Tree
FP-Tree is a compressed data storage structure. FP-Tree is built by mapping each transaction data into every particular path in FP-Tree. Every transaction is mapped; however, there could be several transactions with the same item, and the path could be rewritten. The compression process in the FP-Tree data structure will be more effective the more transaction data that has the same item. is a set of transactions containing items n. Whereas support is the counter of the frequency of occurrence of transactions containing a pattern. A pattern is said to occur frequently (frequent pattern) if the support of the pattern is not less than a constant ξ (minimum support threshold) that has been defined previously. The problem of finding frequent patterns with minimum support threshold support count ξ is what FP-Growth attempts to solve with the help of FP-Tree Structure [13].
The FP-Tree development stage uses the FP-Growth algorithm to look for frequent and significant itemsets using a set of transaction data. The FP-Growth algorithm is divided into three main steps, namely [14]- [16]: 1. Phase Conditional Pattern Base Generation This is a subdatabase containing a prefix path and a suffix pattern. The pre-built FP-Tree is used to generate the conditional pattern base generation. 2. Conditional FP-Tree Creation Stage. At this stage, the number of supports for each item is summed, and then each item that has the number of supports is greater than the minimum number of supports ξ.
Keep an eye out for the frequently used itemset. The frequent itemset is created by merging the items for each conditional FP-Tree if the conditional FP-Tree is a single path. If the track isn't a single one, a recursive Fp-Growth generation is used [17]- [18].

RESULTS AND ANALYSIS
Transaction data is taken for 12 transaction days from January 1 to January 12, 2019, which contains 70 transactions, where this data has been processed by KDD starting from selection, cleaning, transformation, then the data will be processed using the Fp-Growth Algorithm, as indicated in table 1.  Table 1 is transaction data that has been transformed with initials and has been formed into a Header Frequent Itemset data table.

A. Process Data Using the Fp-Growth Algorithm
This process is to get new knowledge in the database using the Fp-Growth Algorithm. Several processes are carried out, namely determining minimum support, determining frequent itemset headers, making Fp-Tree, generating Conditional Patterns based on Fp-Tree, and determining frequent itemset. a. Determine Minimum Support Starting the Fp-Growth process requires establishing an Fp-Tree to determine the Frequent Itemset. Before doing Fp-Tree, you must first determine the minimum support based on sample data, the specified minimum support is >10% of 12 transaction days and a minimum of 60% confidence from 12 transaction days, then look for each data in the dataset (scan the database first) to calculate frequent itemset for each item.  Tree process. And items that do not meet the minimum support will be discarded or removed because they have no significant effect. b. Defining Frequent Itemset Headers In Table 2 are the results of the data scanning table from Table 1 based on the occurrence of items in each transaction, sorted by the highest frequent itemset occurrence. The items in Table 1 that have been eliminated will be removed from the transaction table, along with the Frequent Itemset Header that has been adjusted to the highest occurrence frequent itemset, as shown in table 3. 12, 25, 44 Table 3 is the result of adjusting the frequent Itemset Header with the highest occurrence frequent itemset where items that do not meet the requirements are eliminated.

c. Formation of Fp-Tree
Frequent Itemset headers that have been compressed, a second database search is carried out, namely reading each transaction starting with reading TID 1 to creating an FP-Tree.  (19,40) where we will create nodes (19,40) so that a path is formed like Figure 1 with an initial support count of one.
Performed up to TID 12, here are the results from TID 12 as shown in Figure 3.  Figure 3 illustrates the process of reading TIDs 1 to 12 where nodes from 12 TIDs are created which will form the FP-Tree as a whole from the existing data.

d. Application of the Fp-Growth Algorithm
After the Fp-Tree development stage of 12 transaction days, the Fp-Growth process will be carried out to find the frequent itemset that meets the requirements. After the Fp-Tree process, the next step is the Generation of Conditional Pattern bases. This stage can be done by looking back at the previously created Fp-Tree. The way to find the frequent itemset from the available data will determine the branch of the tree where the path ends with the smallest support count. The first stage generates the conditional pattern base on the Fp-Tree by scanning the Fp-Tree with a suffix prefix from bottom to top. See suffix 44 in Figure 4. After finding the conditional pattern base, the next step is to generate the Fp-Tree conditional, from Figure 5 the next step to determine the Fp-Tree conditional is to delete item 44 and recalculate the support count as shown in Figure 5.   Table 4 is the result of the frequent itemset that has been generated for the Association Rule search process.

B. Association Rule
Association Rule is a frequent itemset search and rule compilation. The Association Rule can be identified with two parameters, namely support (supporting value) with a minimum support of >10% and confidence (certainty value) with a minimum of 60%. After obtaining the frequent itemset, then create a rule by calculating the confidence of each rule. And the resulting itemset is calculated as a minimum containing two items from all the resulting rules if A then B. So that there are 48 subsets in Table 5.
Calculation of the Support and Confidence values for each Association Rule that is passed, then to get the results can be seen in Table 5.  Table 5 is the frequent itemset data that the Association Rule search process has carried out by the existing formula in the Association Rule, which searches for support a, support a-b, and confidence.
From the calculation above, it can be concluded that meeting 60% confidence with a minimum support > 10% there are 8 rules that can be seen in table 6. The results of these 8 rules will be recommended as a reference to help design an effective sales strategy at counter x-com. The following is an explanation of the rule results that meet the minimum 60% confidence. 1. If the buyer buys (19) then the buyer will buy (40) with 67% confidence and is supported by 17% of the overall data. 2. If the buyer buys (42) then the buyer will buy (43) with 67% confidence and is supported by 17% of the overall data. 3. If the buyer buys (25) then the buyer will buy (44) with 67% confidence and is supported by 17% of the overall data. 4. If the buyer buys (42) then the buyer will buy (4) with 67% confidence and is supported by 17% of the overall data. 5. If the buyer buys (26), then the buyer will buy (43) with 100% confidence and is supported by 17% of the overall data. 6. If the buyer buys (31) then the buyer will buy (43) with 100% confidence and is supported by 17% of the overall data. 7. If the buyer buys (44) then the buyer will buy (25) with 100% confidence and is supported by 17% of the overall data. 8. If the buyer buys (4) then the buyer will buy (42) with 100% confidence and is supported by 17% of the overall data. Description of Figure 6 is the result of a 12 days test of 70 transactions that have been tested using the RapidMiner application to find out whether the previous rule search process has obtained the correct results, in which the results obtained are 8 rule associations.
It is known from the test results using the RapidMiner application, if tested using transaction data 12 days 70 transactions used in the previous data, then the results of the rule search using transaction data 12 days 70 transactions get 8 rules, which is the level of accuracy between the results of the rule search process that has been done by testing the rules generated by the RapidMiner application with 12 days 70 transactions are 100% the same.

CONCLUSION
Analysis of the pattern of sales of IT products using the Fp-Growth Algorithm gains new knowledge where the results of data processing with a minimum of >10% support and a minimum of 60% confidence produce association rules of 8 rules from 70 transactions and 47 items processed. Then by testing the FP-Growth algorithm on the RapidMiner application, the same results were obtained between manual data processing and the results on system testing. 8 rule results from testing in this study can be used as a reference to help design an effective sales strategy at counter x-com.