000 02509nam a22003133a 4500
001 UPMIN-00003241232
003 UPMIN
005 20230206173538.0
008 230206b |||||||| |||| 00| 0 eng d
040 _cUPMin
041 _aeng
090 0 _aLG993.5 2008
_bA64 O55
100 _aOnggo, Raphael John Rule.
_92140
245 _aNearest neighbor-based imputation in treating data sets with missing values and their effects in the clustering accuracy /
_cRaphael John Rule Onggo.
260 _c2008
300 _a73 leaves.
502 _aThesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2008
520 3 _aThe K-Nearest Neighbor (KNN) imputation method, along with the more commonly used imputation methods mean and median imputations, were used in treating incomplete data sets. In order to obtain a clear comparison, three complete data sets were used with two types of missingness: missing completely at random (MCAR) and missing at random (MAR). missing values were generated from these complete data sets at rates 1%, 5%, 10%, and 20%. The treated incomplete data sets were then clustered using then k-mean clustering algorithm. The incomplete data sets were also clustered using the modified k-means clustering algorithms to the imputed data sets obtained from the three imputation methods were compared to each other and to that of the results obtained after applying the modified k-means clustering algorithm with adaptive imputation to the incomplete data sets. Results revealed that the k-nearest neighbor, mean, and medium imputation methods and the modified k-means clustering algorithm attained high cluster recovery even at 20% missing values. Furthermore, clustering results obtained from the k-nearest neighbor imputed data sets showed to have the most accurate clustering results as compared to the clustering results obtained from the mean imputed data sets and the median imputed data sets, and also the clustering results obtained after applying the modified k-means clustering algorithm with adaptive imputation to the incomplete data sets in MAR and MCAR types of missing values.
650 1 7 _aImputation methods.
_92141
650 1 7 _aKNN (K-Nearest Neighbor)
_92142
650 1 7 _aK-means clustering..
_92093
650 1 7 _aClustering.
_9366
650 1 7 _aMCAR (Missing completely at random)
_92103
650 1 7 _aData sets.
_91992
658 _aUndergraduate Thesis
_cAMAT200,
_2BSAM
905 _aFi
905 _aUP
942 _2lcc
_cTHESIS
999 _c2248
_d2248