Local cover image
Local cover image
Local cover image
Local cover image

A modified K-means algorithm for clustering data sets with missing values using adaptive imputation / Lovella V. Mamalias

By: Material type: TextTextLanguage: English Publication details: 2005Description: 64 leavesSubject(s): Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2005 Abstract: Clustering is a technique for partitioning the complete data set into groups such that data points belonging to the same group are more similar than the data points in other groups. However, missing data is common in data sets. Clustering data set with missing values are usually done by deleting the missing data and cluster only the remaining complete data points. Another approach is done by filling-up first the missing values before the clustering stage using the information from the complete data points making the incomplete data set a complete data set. However, these methods might jeopardize the quality of the clustering result. This study deals with clustering data set with missing values that uses imputation during the clustering stage. The k-means clustering method was modified such that incomplete data set can be partitioned into groups. The distance function was modified so that membership of the incomplete data points to the nearest cluster can be obtained. The computation for the new cluster center was also modified so that a new cluster center can be obtained from the data points (including the incomplete data points) belonging on the same cluster. The performance of the modified k-means algorithm was compared with the performance of the two other clustering methods that deal with missing values namely, k-means after case deletion and k-means after mean imputation. Modified k-means, although less efficient, has better quality of clustering result in terms of cluster recovery when compared with the other clustering methods. The modified k-means algorithm was applied to the Philippine eagle data, an incomplete data having missing values. The clustering result of the proposed algorithm was compared with the clustering result using k-means after attribute deletion.
List(s) this item appears in: BS Applied Mathematics
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Collection Call number Status Date due Barcode
Thesis Thesis University Library Theses Room-Use Only LG993.5 2005 A64 M35 (Browse shelf(Opens below)) Not For Loan 3UPML00011332
Thesis Thesis University Library Archives and Records Preservation Copy LG993.5 2005 A64 M35 (Browse shelf(Opens below)) Not For Loan 3UPML00022035

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2005

Clustering is a technique for partitioning the complete data set into groups such that data points belonging to the same group are more similar than the data points in other groups. However, missing data is common in data sets. Clustering data set with missing values are usually done by deleting the missing data and cluster only the remaining complete data points. Another approach is done by filling-up first the missing values before the clustering stage using the information from the complete data points making the incomplete data set a complete data set. However, these methods might jeopardize the quality of the clustering result. This study deals with clustering data set with missing values that uses imputation during the clustering stage. The k-means clustering method was modified such that incomplete data set can be partitioned into groups. The distance function was modified so that membership of the incomplete data points to the nearest cluster can be obtained. The computation for the new cluster center was also modified so that a new cluster center can be obtained from the data points (including the incomplete data points) belonging on the same cluster. The performance of the modified k-means algorithm was compared with the performance of the two other clustering methods that deal with missing values namely, k-means after case deletion and k-means after mean imputation. Modified k-means, although less efficient, has better quality of clustering result in terms of cluster recovery when compared with the other clustering methods. The modified k-means algorithm was applied to the Philippine eagle data, an incomplete data having missing values. The clustering result of the proposed algorithm was compared with the clustering result using k-means after attribute deletion.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image Local cover image
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved