Center-based clustering of interval data /

Denate, Ellen May B.

Center-based clustering of interval data / Ellen May B. Denate. - 2005 - 136 leaves

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2005

This paper presents clustering methods in handling interval data based on center-based clustering algorithms. Two clustering approaches were proposed, the Redefined K-Median algorithm and the Interval K-Mean algorithm. The Hausdorff distance is used in the Redefined K-Median algorithm and the Euclidean Distance Squared is utilized in the interval K-Means algorithm. The proposed algorithms were compared with existing Dynamic Clustering algorithms designed for interval data types (Standard Dynamic clustering using de Carvalho?s distance and Dynamical Clustering using Hausdorff distance) and Standard Clustering algorithms (Standard K-Median and Standard K-Means algorithms). The Corrected Rand (CR) index was utilized in comparing the different results of the algorithms. The proposed algorithms were tested using two sets of randomly generated artificial data following two sets of parameters. The CR index between the parameter-defined classes and the results of the different algorithms were computed to determine the recovery of the parameter-defined classes by the different algorithms used. The comparison between the parameter-defined classes and the final clustering results of the different algorithms showed that the Dynamic Clustering algorithms and the proposed algorithms have high recovery of the parameter-defined classes. However, the standard deviation for the Dynamic Clustering algorithms are higher than that of proposed algorithm implying that the results of the proposed algorithms are more stable. It was also seen in the CR index between the final clustering results of the different algorithms that the proposed algorithms are more similar results to the Dynamic Clustering algorithms than to the Standard Clustering algorithms. The proposed approaches have certain features that makes them likely choice of algorithm in clustering interval data depending on the size of the data set, the characteristics of the data set, and the number of clusters. The proposed approaches are also superior to the standard clustering algorithms when dealing with handling interval data since results show that the proposed algorithms can recover more of the predefined classes/realistic classification. The usual practice of computing the mean of the intervals and applying it to the K-Means or K-Median algorithm is not sufficient in getting the optimum clustering. The proposed approaches add to the options of clustering methods for handling interval data.


Data clustering.
Center-based clustering.
Interval data.


Undergraduate Thesis --AMAT200
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved