Local cover image
Local cover image
Local cover image
Local cover image

Clustering of data sets with missing values using statistical imputation methods / Mel Zha Leah M.Macabenta.

By: Material type: TextTextLanguage: English Publication details: 2000Description: 65 leavesSubject(s): Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2000 Abstract: K-means clustering algorithm is the most widely used clustering algorithm in the field of data analysis. One major drawback of this algorithm is that it can never accommodate data set with missing values. However, in reality, occurrence of missing values can not be avoided. Imputation methods are more extensively used in treating missing values compared to deletion. Several imputation methods are suggested but each has advantages and disadvantages over the others, so proper choice of imputation methods is very necessary. Two of the statistical imputation methods namely, hot deck imputation and imputation using a prediction model were used in treating the incomplete data sets. The incomplete data sets after treatment were then clustered using the K-means clustering algorithm. To have a clear comparison, five data sets were used with two kinds of missingness, missing completely at random (MCAR) and missing at random (MAR) at five different levels of degradation ranging from 1% missing values. The evaluation of the resulting clusters was done using the adjusted Rand index. The two methods were compared to the modified K-means algorithm, particularly the modified Euclidean distance. Results showed that the hot deck imputation, regression method and modified K-means clustering algorithm attained a high recovery of clusters especially with big data sets until 30% levels of missing values. In small data sets, good recovery is attained until 10% level of missing values only.
List(s) this item appears in: BS Applied Mathematics
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Collection Call number Status Date due Barcode
University Library Theses Room-Use Only LG993.5 2006 A64 M34 (Browse shelf(Opens below)) Not For Loan 3UPML00011616
University Library Archives and Records Preservation Copy LG993.5 2000 A64 M34 (Browse shelf(Opens below)) Not For Loan 3UPML00021978

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2000

K-means clustering algorithm is the most widely used clustering algorithm in the field of data analysis. One major drawback of this algorithm is that it can never accommodate data set with missing values. However, in reality, occurrence of missing values can not be avoided. Imputation methods are more extensively used in treating missing values compared to deletion. Several imputation methods are suggested but each has advantages and disadvantages over the others, so proper choice of imputation methods is very necessary. Two of the statistical imputation methods namely, hot deck imputation and imputation using a prediction model were used in treating the incomplete data sets. The incomplete data sets after treatment were then clustered using the K-means clustering algorithm. To have a clear comparison, five data sets were used with two kinds of missingness, missing completely at random (MCAR) and missing at random (MAR) at five different levels of degradation ranging from 1% missing values. The evaluation of the resulting clusters was done using the adjusted Rand index. The two methods were compared to the modified K-means algorithm, particularly the modified Euclidean distance. Results showed that the hot deck imputation, regression method and modified K-means clustering algorithm attained a high recovery of clusters especially with big data sets until 30% levels of missing values. In small data sets, good recovery is attained until 10% level of missing values only.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image Local cover image
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved