000 | 02155nam a2200241 4500 | ||
---|---|---|---|
001 | UPMIN-00000010893 | ||
003 | UPMIN | ||
005 | 20230105163930.0 | ||
008 | 230105b |||||||| |||| 00| 0 eng d | ||
040 |
_aDLC _cUPMin _dupmin |
||
041 | _aeng | ||
090 |
_aLG993.5 2000 _bA64 D35 |
||
100 | 1 |
_aDaisog, Lyna Mie C. _91075 |
|
245 | 0 | 0 |
_aModified K-modes clustering algorithms for categorical data sets with missing values/ _cLyna Mie C.Daisog. |
260 | _c2000 | ||
300 | _a67 leaves. | ||
502 | _aThesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2000 | ||
520 | 3 | _aClustering is a process of organizing objects in a database into groups such that objects within the same cluster have a high degree of similarity, while objects from different clusters have a high degree of dissimilarity. However, clustering data sets including those with categorical attributes can only be done when the data set is complete. This problem was addressed with the existing methods. The usual way done in handling missing values is by deleting the missing data and considering only complete data points in clustering, and preprocess imputation. However, these methods might jeopardize the quality of resulting clusters. This study modifies K-modes algorithm in order to handle missing values. The first modified algorithm makes use of available information while the second one uses imputation during clustering stage. The performance of the modified algorithms was compared to existing methods namely, casewise deletion, mode imputation, and K-nearest neighbor (KNN) imputation. Modified algorithms produced high quality of resulting clusters compared to case deletion and mode imputation. Although KNN imputation came out to the most stable method in handling missing values, the modified algorithm using available case approach was found out to have resulting clusters close to those of KNN. The methods were used to cluster actual incomplete data set to verify their performance and similar behavior of results was observed. | |
658 |
_aUndergraduate Thesis _cAMAT200 |
||
905 | _aFi | ||
905 | _aUP | ||
942 |
_2lcc _cTHESIS |
||
999 |
_c529 _d529 |