High Scalability Document Clustering Algorithm Based On Top-K Weighted Closed Frequent Itemsets

Dublin Core

Title

Subject

documents clustering,frequent itemsets,weighted maximum capturing,top-k closed frequent itemsets

Description

Documents clustering based on frequent itemsets can be regarded a new method of documents clustering which is aimed to overcome curse of dimensionality of items produced by documents being clustered. The Maximum Capturing(MC)technique is an algorithm of documents clustering based on frequent itemsets that is capable of producing a better clustering quality in compared to other similar algorithms. However, since the maximum capturing technique employed frequent itemsets, it still suffers from such several weaknesses as the emergence of items redundancy that may still cause curse of dimensionality, difficult to determine the minimum support value from a set of documents to be clustered, and no weighting on items incurred to the resulting frequent itemsets.To cope with those various weaknesses, in this research, an algorithm of documents clustering based on weighted top-k closed frequent itemsets, which is called as Weighted Maximum Capturing(WMC)algorithm, is developed. The proposed algorithm involves thefrequent pattern tree algorithm to mine closed frequent itemsets from a set of documents without specifying the minimum support value of items to be generated.Experimental results showed that improvement on the resulting clustering accuracy was produced. The resulting average values of F-measure of 0.713 and purity of 0.721 with improvement ratio of 1.4% for F-measure and 2% for purity.Nevertheless, results of the scalability test showed very significant improvement.The WMCalgorithm only requires the average computing time of 623.77 minutes, 518.05 minutes faster than the average computing time required by the MC algorithm

Creator

Gede Aditra Pradnyana1, Arif Djunaidy

Source

https://jurnal.iaii.or.id/index.php/RESTI/issue/view/22

Publisher

Universitas Pendidikan Ganesha

Date

30 april 2021

Contributor

Fajar bagus W

Format

PDF

Language

Indonesia

Type

Text

Files

2987-Article Text-8850-1-10-20210429.pdf

Collection

VOL 5 NO 2 (2021)

Citation

Gede Aditra Pradnyana1, Arif Djunaidy, “High Scalability Document Clustering Algorithm Based On Top-K Weighted Closed Frequent Itemsets,” Repository Horizon University Indonesia, accessed April 25, 2026, https://repository.horizon.ac.id/items/show/8588.