000 02214nam a2200241 4500
001 UPMIN-00000010897
003 UPMIN
005 20230201165228.0
008 230201b |||||||| |||| 00| 0 eng d
040 _aDLC
_cUPMin
_dupmin
041 _aeng
090 _aLG993.5 2000
_bA64 M34
100 1 _aMacabenta, Mel Zha Leah M.
_92020
245 0 0 _aClustering of data sets with missing values using statistical imputation methods /
_c Mel Zha Leah M.Macabenta.
260 _c2000
300 _a65 leaves.
502 _aThesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2000
520 3 _aK-means clustering algorithm is the most widely used clustering algorithm in the field of data analysis. One major drawback of this algorithm is that it can never accommodate data set with missing values. However, in reality, occurrence of missing values can not be avoided. Imputation methods are more extensively used in treating missing values compared to deletion. Several imputation methods are suggested but each has advantages and disadvantages over the others, so proper choice of imputation methods is very necessary. Two of the statistical imputation methods namely, hot deck imputation and imputation using a prediction model were used in treating the incomplete data sets. The incomplete data sets after treatment were then clustered using the K-means clustering algorithm. To have a clear comparison, five data sets were used with two kinds of missingness, missing completely at random (MCAR) and missing at random (MAR) at five different levels of degradation ranging from 1% missing values. The evaluation of the resulting clusters was done using the adjusted Rand index. The two methods were compared to the modified K-means algorithm, particularly the modified Euclidean distance. Results showed that the hot deck imputation, regression method and modified K-means clustering algorithm attained a high recovery of clusters especially with big data sets until 30% levels of missing values. In small data sets, good recovery is attained until 10% level of missing values only.
658 _aUndergraduate Thesis
_cAMAT200
905 _aFi
905 _aUP
942 _2lcc
_cTHESIS
999 _c532
_d532