Clustering technique is one of the most important techniques of data mining and is the branch of multivariate statistical analysis and a method for grouping similar data in to same clusters. With the databases getting bigger, the researchers try to find efficient and effective clustering methods so that they can make fast and real decisions.
Thus, in this paper, we proposed an improved ant system-based clustering algorithm (IASC) in order to providing the fast clusters with high accuracy. The goal of clustering analysis is to group similar objects together. There are many methods being applied in clustering analysis, like hierarchical clustering, partition-based clustering, density-based clustering, and artificial intelligence-based clustering.
The ant colony system (ACS) is one of the newest meta-heuristics for combinatorial optimization problems, and this study uses the ant colony system to find the clusters effectively.
The IASC algorithm is including four sub-procedures, that is Divide, Agglomerate_obj, Agglomerate, and Remove. First, initialize the parameters and group all the objects as a cluster. And then the sub-procedure Divide will divide the cluster into several sub-clusters and some object which does not belong to any sub-clusters through the consistency of the pheromone and some criterion. After Divide, the Agglomerate_obj is the next step at this algorithm in order to agglomerate the objects into the suitable sub-cluster. Fourth, Agglomerate is the sub-procedure to merge the similar two sub-clusters into a cluster. And then run Agglomerate_obj again. Sixth, after agglomerating the similar object into the suitable sub-cluster, the Remove sub-procedure tries to remove the un-similar from sub-cluster. Calculate the total within cluster variance (TWCV). If TWCV is not changed, stop the procedure. Otherwise, repeat the sub-procedure Divide, Agglomerate_obj, Agglomerate, Agglomerate_obj, Remove until TWCV is not changed.
The implementation results on the Iran earthquake data show that the proposed method is able to provide more accurate and fast clusters and to determine the outliers. The computational time is also reduced.