Daisog, Lyna Mie C.

Modified K-modes clustering algorithms for categorical data sets with missing values/ Lyna Mie C.Daisog. - 2000 - 67 leaves.

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2000

Clustering is a process of organizing objects in a database into groups such that objects within the same cluster have a high degree of similarity, while objects from different clusters have a high degree of dissimilarity. However, clustering data sets including those with categorical attributes can only be done when the data set is complete. This problem was addressed with the existing methods. The usual way done in handling missing values is by deleting the missing data and considering only complete data points in clustering, and preprocess imputation. However, these methods might jeopardize the quality of resulting clusters. This study modifies K-modes algorithm in order to handle missing values. The first modified algorithm makes use of available information while the second one uses imputation during clustering stage. The performance of the modified algorithms was compared to existing methods namely, casewise deletion, mode imputation, and K-nearest neighbor (KNN) imputation. Modified algorithms produced high quality of resulting clusters compared to case deletion and mode imputation. Although KNN imputation came out to the most stable method in handling missing values, the modified algorithm using available case approach was found out to have resulting clusters close to those of KNN. The methods were used to cluster actual incomplete data set to verify their performance and similar behavior of results was observed.

Undergraduate Thesis --AMAT200