Eclat Algorithm

ABSTRAT:

Frequent Itemset Mining (FIM) is the core of tasks such as association rules and sequential pattern mining. With the increasing amount of data, traditional FIM algorithms become inefficient due to excessive resource requirements or high communication costs. In this paper, the Eclat algorithm in the frequent itemset mining algorithm is taken as the research point, and the parallel Eclat optimization algorithm BPEclat (Balanced Parallel Eclat) based on Spark is proposed to solve the performance shortcoming of Eclat algorithm in serial processing large-scale data. The algorithm is improved and optimized from many aspects: combining the pre-pruning and post-pruning depth pruning strategies to reduce the calculation of irrelevant itemset, compressing the candidate set size; using the prefix term to divide the data set, and using the range partitioning idea to balance the calculations node load, improve the parallel computing power of the algorithm. The experimental results show that the proposed BPEclat algorithm reduces the candidate set size by 25.3% and the time consumption by 32.5%. Therefore, it is possible to process massive amounts of data more efficiently and reliably, and has good scalability and universality.

EXISTING SYSTEM :

The traditional horizontal data format consists of a transaction consisting of a Transaction Identifier (Tid) and an item (Item). A transaction is uniquely identified by Tid, and a transaction can contain one or more items. The Eclat algorithm adds the idea of inversion, and uses vertical data format to represent data, that is, a record is composed of a project and a list of all transaction records (Tidset table) [12]. The specific data format types are shown in Table I. The left side is the horizontal data structure used by Apriori, FP- Growth and other algorithms, and the right is the vertical data structure used by Eclat’s algorithm.

EXISTING SYSTEM DISADVANTAGES:

1.LESS ACCURACY

2. LOW EFFICIENCY

PROPOSED SYSTEM :

According to the two improvement strategies proposed in Section 3.1, combined with the parallel computing platform of Hadoop+Spark, the parallelization of the Eclat optimi- zation algorithm is realized. The implementation of the algorithm is divided into three stages: a) reading data from HDFS, modifying the data structure, optimizing the data storage method, and using BitSet to store the Tidset table. Finally, the filtering 1-item set is frequently obtained, and the Pair RDD is persisted in memory; b) the frequent 1-item 151 Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 14:34:39 UTC from IEEE Xplore. Restrictions apply. set prefix items generated in the first stage are extracted, and the prefix items are key, and the item set and transaction set are value two. Tuple2 < Profix, Tuple2 < Item Set, BitSet >>, and then distribute the data to each computing node according to the load balancing strategy divided by the combination of the prefix item

PROPOSED SYSTEM ADVANTAGES:

1.HIGH ACCURACY

2.HIGH EFFICIENCY

SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• Programming Language : Python
• Font End Technologies : TKInter/Web(HTML,CSS,JS)
• IDE : Jupyter/Spyder/VS Code
• Operating System : Windows 08/10

HARDWARE REQUIREMENTS:

 Processor : Core I3
 RAM Capacity : 2 GB
 Hard Disk : 250 GB
 Monitor : 15″ Color
 Mouse : 2 or 3 Button Mouse
 Key Board : Windows 08/10

For More Details of Project Document, PPT, Screenshots and Full Code
Call/WhatsApp – 9966645624
Email – info@srithub.com