MARC View

000			02509nam a22003133a 4500
001			UPMIN-00003241232
003			UPMIN
005			20230206173538.0
008			230206b \|\|\|\|\|\|\|\| \|\|\|\| 00\| 0 eng d
040			_cUPMin
041			_aeng
090		0	_aLG993.5 2008 _bA64 O55
100			_aOnggo, Raphael John Rule. _92140
245			_aNearest neighbor-based imputation in treating data sets with missing values and their effects in the clustering accuracy / _cRaphael John Rule Onggo.
260			_c2008
300			_a73 leaves.
502			_aThesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2008
520	3		_aThe K-Nearest Neighbor (KNN) imputation method, along with the more commonly used imputation methods mean and median imputations, were used in treating incomplete data sets. In order to obtain a clear comparison, three complete data sets were used with two types of missingness: missing completely at random (MCAR) and missing at random (MAR). missing values were generated from these complete data sets at rates 1%, 5%, 10%, and 20%. The treated incomplete data sets were then clustered using then k-mean clustering algorithm. The incomplete data sets were also clustered using the modified k-means clustering algorithms to the imputed data sets obtained from the three imputation methods were compared to each other and to that of the results obtained after applying the modified k-means clustering algorithm with adaptive imputation to the incomplete data sets. Results revealed that the k-nearest neighbor, mean, and medium imputation methods and the modified k-means clustering algorithm attained high cluster recovery even at 20% missing values. Furthermore, clustering results obtained from the k-nearest neighbor imputed data sets showed to have the most accurate clustering results as compared to the clustering results obtained from the mean imputed data sets and the median imputed data sets, and also the clustering results obtained after applying the modified k-means clustering algorithm with adaptive imputation to the incomplete data sets in MAR and MCAR types of missing values.
650	1	7	_aImputation methods. _92141
650	1	7	_aKNN (K-Nearest Neighbor) _92142
650	1	7	_aK-means clustering.. _92093
650	1	7	_aClustering. _9366
650	1	7	_aMCAR (Missing completely at random) _92103
650	1	7	_aData sets. _91992
658			_aUndergraduate Thesis _cAMAT200, _2BSAM
905			_aFi
905			_aUP
942			_2lcc _cTHESIS
999			_c2248 _d2248