A modified K-modes algorithm for clustering categorical data sets with missing values using bhattacharyya distance function / Marie Lou Manalili Gabiana.

By:

Gabiana, Marie Lou Manalili

Material type: Text

TextLanguage: English Publication details: 2006Description: 61 leavesSubject(s):

Dissertation note: Thesis (BS Computer Science -- University of the Philippines Mindanao, 2006 Abstract: Clustering can be defined as the process of organizing objects in a database into cluster/groups such that objects within the same cluster hav ea high degree of similarity, while objects belonging to different clusters have a high degree of dissimalirity. This study clusters data sets and utilized K-modes algorithm for clustering. However, this algorithm is arranged only for complete data sets and not for data sets which contains missing values. This led to the modification of the K-modes algorithm incorporated with the Bhattacharyya distance. There were two modifications; the first modification was the availbale case analyis which uses the availbale information left on the data set while the second modification was the adaptive imputation which imputes missing data during clustering stage. The performances of these modifications were compared with the performances of the existing methods namely; attribute deletion, mode imputation, KNN imputation and K-modes clustering using Chi-square distance. The two modifications produced goofd quality of clustering results compared with K-modes after attribute deletion and K-modes after mode iputation. These modifications were also competitive with regards to K-modes after KNN imputation. The first modification using Bhattcharyya distance produced higher quality resluts compared with forst modification using Chi-square distance. The second modification using Bhattacharyya distance on the other hand produced poorer quality results compared with second modification using Chi-sqaure distance. However, differences between the results in second modifications of both distance functions were not that high. The two modifications using Bhattacharyya distance were later used to cluster an actual incomplete data set to verify further the clustering perfomances.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 2 )
Title notes ( 2 )
Comments ( 0 )
Images

Holdings
Cover image	Item type	Current library	Collection	Call number	Status	Date due	Barcode
	Thesis	University Library General Reference		LG993.5 2006 C6 G33 (Browse shelf(Opens below))	Not For Loan		3UPML00012198
	Thesis	University Library Archives and Records	Preservation Copy	LG993.5 2006 C6 G33 (Browse shelf(Opens below))	Not For Loan		3UPML00032594

Thesis (BS Computer Science -- University of the Philippines Mindanao, 2006

Clustering can be defined as the process of organizing objects in a database into cluster/groups such that objects within the same cluster hav ea high degree of similarity, while objects belonging to different clusters have a high degree of dissimalirity. This study clusters data sets and utilized K-modes algorithm for clustering. However, this algorithm is arranged only for complete data sets and not for data sets which contains missing values. This led to the modification of the K-modes algorithm incorporated with the Bhattacharyya distance. There were two modifications; the first modification was the availbale case analyis which uses the availbale information left on the data set while the second modification was the adaptive imputation which imputes missing data during clustering stage. The performances of these modifications were compared with the performances of the existing methods namely; attribute deletion, mode imputation, KNN imputation and K-modes clustering using Chi-square distance. The two modifications produced goofd quality of clustering results compared with K-modes after attribute deletion and K-modes after mode iputation. These modifications were also competitive with regards to K-modes after KNN imputation. The first modification using Bhattcharyya distance produced higher quality resluts compared with forst modification using Chi-square distance. The second modification using Bhattacharyya distance on the other hand produced poorer quality results compared with second modification using Chi-sqaure distance. However, differences between the results in second modifications of both distance functions were not that high. The two modifications using Bhattacharyya distance were later used to cluster an actual incomplete data set to verify further the clustering perfomances.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer