Onggo, Raphael John Rule.

Nearest neighbor-based imputation in treating data sets with missing values and their effects in the clustering accuracy / Raphael John Rule Onggo. - 2008 - 73 leaves.

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2008

The K-Nearest Neighbor (KNN) imputation method, along with the more commonly used imputation methods mean and median imputations, were used in treating incomplete data sets. In order to obtain a clear comparison, three complete data sets were used with two types of missingness: missing completely at random (MCAR) and missing at random (MAR). missing values were generated from these complete data sets at rates 1%, 5%, 10%, and 20%. The treated incomplete data sets were then clustered using then k-mean clustering algorithm. The incomplete data sets were also clustered using the modified k-means clustering algorithms to the imputed data sets obtained from the three imputation methods were compared to each other and to that of the results obtained after applying the modified k-means clustering algorithm with adaptive imputation to the incomplete data sets. Results revealed that the k-nearest neighbor, mean, and medium imputation methods and the modified k-means clustering algorithm attained high cluster recovery even at 20% missing values. Furthermore, clustering results obtained from the k-nearest neighbor imputed data sets showed to have the most accurate clustering results as compared to the clustering results obtained from the mean imputed data sets and the median imputed data sets, and also the clustering results obtained after applying the modified k-means clustering algorithm with adaptive imputation to the incomplete data sets in MAR and MCAR types of missing values.


Imputation methods.
KNN (K-Nearest Neighbor)
K-means clustering..
Clustering.
MCAR (Missing completely at random)
Data sets.


Undergraduate Thesis --AMAT200,