Local cover image
Local cover image
Local cover image
Local cover image

K-means clustering of data sets with missing values using modified Euclidean distance / Emmylou H. Pulvera

By: Material type: TextTextLanguage: English Publication details: 2005Description: 63 leavesSubject(s): Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2005 Abstract: K-means clustering is the most extensively used in clustering algorithm in the field of data analysis. One major problem in data analysis is the occurrence of missing values. Mean imputation and case deletion can produce erroneous conclusions by introducing possibly unreliable estimates and significantly reducing the data set, respectively. To totally avoid these problems, the Euclidean distance function used in the allocation step was modified to compute distances between two vectors with some unknown step was modified to compute distances between two vectors with some unknown values. Representation, defined by the center of cluster, was also modified to compute means of each feature in a cluster even when one or more of the cases were incomplete. This, modification is an extension of the K-means clustering algorithm for handling missing values. For the evaluation of the method, different sets of data were simulated from the Iris Data Base to represent different types of missing values with different levels of degradation. The modified algorithm was compared to imputation and case deletion. Results showed that the modified algorithm has higher cluster recovery than imputation method while cluster recovery in case deletion was higher than that of the modified K-means. However, the latter was only true for data points left after deletion. Thus, the modified K-means has the advantage of avoiding losing information.
List(s) this item appears in: BS Applied Mathematics
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Collection Call number Status Date due Barcode
Thesis Thesis University Library Theses Room-Use Only LG993.5 2005 A64 P84 (Browse shelf(Opens below)) Not For Loan 3UPML00011327
Thesis Thesis University Library Archives and Records Preservation Copy LG993.5 2005 A64 P84 (Browse shelf(Opens below)) Not For Loan 3UPML00022129

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2005

K-means clustering is the most extensively used in clustering algorithm in the field of data analysis. One major problem in data analysis is the occurrence of missing values. Mean imputation and case deletion can produce erroneous conclusions by introducing possibly unreliable estimates and significantly reducing the data set, respectively. To totally avoid these problems, the Euclidean distance function used in the allocation step was modified to compute distances between two vectors with some unknown step was modified to compute distances between two vectors with some unknown values. Representation, defined by the center of cluster, was also modified to compute means of each feature in a cluster even when one or more of the cases were incomplete. This, modification is an extension of the K-means clustering algorithm for handling missing values. For the evaluation of the method, different sets of data were simulated from the Iris Data Base to represent different types of missing values with different levels of degradation. The modified algorithm was compared to imputation and case deletion. Results showed that the modified algorithm has higher cluster recovery than imputation method while cluster recovery in case deletion was higher than that of the modified K-means. However, the latter was only true for data points left after deletion. Thus, the modified K-means has the advantage of avoiding losing information.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image Local cover image
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved