TY - BOOK AU - Aguelo, Renz Marion Y. TI - Modified principal feature analysis (MPFA) as a feature selection algorithm for clustering large data sets within missing values PY - 2010/// KW - Clustering KW - Feature selection KW - Missing values KW - Principal feature analysis KW - Undergraduate Thesis KW - AMAT200, KW - BSAM N1 - College of Science and Mathematics; Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2010 N2 - Clustering is a technique of positioning objects into groups that objects within the same group exhibit a high degree of similarity, while objects from different groups manifest a high degree of disparity. Unfortunately, high-dimensional datasets often contain unimportant features that can adversely affect the performance of clustering algorithms. Feature selection has emerged as a reduction technique that chooses only the important features from data. It is commonly applied in preparation for clustering. However, the use of clustering and feature selection is limited only to complete datasets. This study modified Principal Feature Analysis in order to handle missing values. Modified Principal Feature Analysis (MPFA) makes use of all the available information in the data. MPFA was compared to case deletion, mean imputation and KNN imputation, which are common methods of handling missing values. In general, MPFA reduced the datasets with a very low percentage of retention and whose clustering results are low of quality. Also, in comparison with the existing approaches, MPFA exhibited the least satisfactory performance. This is due to inappropriate use of correlation and erroneous choice of data sets used. The competing approaches were further applied to an actual incomplete datasets and similar ranking of performance was observed ER -