Local cover image
Local cover image
Local cover image
Local cover image

Clustering of datasets with missing values using principal feature analysis as a feature selection tool / Frank B. Pelpinosas.

By: Material type: TextTextLanguage: English Publication details: 2008Description: 51 leavesSubject(s): Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2008 Summary: One of the most prevalent problemsAbstract: One of the most prevalent problems in clustering is the presence of redundant and irrelevant features, which could damage and misguide the clustering results of the data. Principal Feature Analysis is used as a filter feature selection tool to reduce highly dimensional datasets into smaller dimensions yet preserving the original structure of the data. The problem is worsened with the presence of missing values in the data. The study provides a comparison of the clustering results of the complete (base) datasets and imputed datasets using K-NN and mean imputation across three levels of degradation. The features retained by PFA were used to cluster the samples and were assessed using the Adjusted Rand Index. Results showed that PFA indeed had reduced the dimensions of the data. Principal Feature Analysis also can hardly drop some feature seven when charges in the levels of degradation appear. Both feature retention and cluster recovery were negatively affected by the number of missing values in the data in all the comparison.
List(s) this item appears in: BS Applied Mathematics
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Collection Call number Status Date due Barcode
Thesis Thesis University Library Theses Room-Use Only LG993.5 2008 A64 P44 (Browse shelf(Opens below)) Not For Loan 3UPML00012278
Thesis Thesis University Library Archives and Records Preservation Copy LG993.5 2008 A64 P44 (Browse shelf(Opens below)) Not For Loan 3UPML00032898

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2008

One of the most prevalent problems

One of the most prevalent problems in clustering is the presence of redundant and irrelevant features, which could damage and misguide the clustering results of the data. Principal Feature Analysis is used as a filter feature selection tool to reduce highly dimensional datasets into smaller dimensions yet preserving the original structure of the data. The problem is worsened with the presence of missing values in the data. The study provides a comparison of the clustering results of the complete (base) datasets and imputed datasets using K-NN and mean imputation across three levels of degradation. The features retained by PFA were used to cluster the samples and were assessed using the Adjusted Rand Index. Results showed that PFA indeed had reduced the dimensions of the data. Principal Feature Analysis also can hardly drop some feature seven when charges in the levels of degradation appear. Both feature retention and cluster recovery were negatively affected by the number of missing values in the data in all the comparison.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image Local cover image
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved