Local cover image
Local cover image
Local cover image
Local cover image

A modified K-modes algorithm for clustering categorical data sets with missing values using bhattacharyya distance function / Marie Lou Manalili Gabiana.

By: Material type: TextTextLanguage: English Publication details: 2006Description: 61 leavesSubject(s): Dissertation note: Thesis (BS Computer Science -- University of the Philippines Mindanao, 2006 Abstract: Clustering can be defined as the process of organizing objects in a database into cluster/groups such that objects within the same cluster hav ea high degree of similarity, while objects belonging to different clusters have a high degree of dissimalirity. This study clusters data sets and utilized K-modes algorithm for clustering. However, this algorithm is arranged only for complete data sets and not for data sets which contains missing values. This led to the modification of the K-modes algorithm incorporated with the Bhattacharyya distance. There were two modifications; the first modification was the availbale case analyis which uses the availbale information left on the data set while the second modification was the adaptive imputation which imputes missing data during clustering stage. The performances of these modifications were compared with the performances of the existing methods namely; attribute deletion, mode imputation, KNN imputation and K-modes clustering using Chi-square distance. The two modifications produced goofd quality of clustering results compared with K-modes after attribute deletion and K-modes after mode iputation. These modifications were also competitive with regards to K-modes after KNN imputation. The first modification using Bhattcharyya distance produced higher quality resluts compared with forst modification using Chi-square distance. The second modification using Bhattacharyya distance on the other hand produced poorer quality results compared with second modification using Chi-sqaure distance. However, differences between the results in second modifications of both distance functions were not that high. The two modifications using Bhattacharyya distance were later used to cluster an actual incomplete data set to verify further the clustering perfomances.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Collection Call number Status Date due Barcode
Thesis Thesis University Library General Reference LG993.5 2006 C6 G33 (Browse shelf(Opens below)) Not For Loan 3UPML00012198
Thesis Thesis University Library Archives and Records Preservation Copy LG993.5 2006 C6 G33 (Browse shelf(Opens below)) Not For Loan 3UPML00032594

Thesis (BS Computer Science -- University of the Philippines Mindanao, 2006

Clustering can be defined as the process of organizing objects in a database into cluster/groups such that objects within the same cluster hav ea high degree of similarity, while objects belonging to different clusters have a high degree of dissimalirity. This study clusters data sets and utilized K-modes algorithm for clustering. However, this algorithm is arranged only for complete data sets and not for data sets which contains missing values. This led to the modification of the K-modes algorithm incorporated with the Bhattacharyya distance. There were two modifications; the first modification was the availbale case analyis which uses the availbale information left on the data set while the second modification was the adaptive imputation which imputes missing data during clustering stage. The performances of these modifications were compared with the performances of the existing methods namely; attribute deletion, mode imputation, KNN imputation and K-modes clustering using Chi-square distance. The two modifications produced goofd quality of clustering results compared with K-modes after attribute deletion and K-modes after mode iputation. These modifications were also competitive with regards to K-modes after KNN imputation. The first modification using Bhattcharyya distance produced higher quality resluts compared with forst modification using Chi-square distance. The second modification using Bhattacharyya distance on the other hand produced poorer quality results compared with second modification using Chi-sqaure distance. However, differences between the results in second modifications of both distance functions were not that high. The two modifications using Bhattacharyya distance were later used to cluster an actual incomplete data set to verify further the clustering perfomances.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image Local cover image
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved