000 02133nam a22003373a 4500
001 UPMIN-00003211650
003 UPMIN
005 20230208143956.0
008 230208b |||||||| |||| 00| 0 eng d
040 _aDLC
_dupmin
_cUPMin
041 _aeng
090 0 _aLG993.5 2008
_bA64 P44
100 _aPelpinosas, Frank B.
_92176
245 _aClustering of datasets with missing values using principal feature analysis as a feature selection tool /
_cFrank B. Pelpinosas.
260 _c2008
300 _a51 leaves.
502 _aThesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2008
520 _aOne of the most prevalent problems
520 3 _aOne of the most prevalent problems in clustering is the presence of redundant and irrelevant features, which could damage and misguide the clustering results of the data. Principal Feature Analysis is used as a filter feature selection tool to reduce highly dimensional datasets into smaller dimensions yet preserving the original structure of the data. The problem is worsened with the presence of missing values in the data. The study provides a comparison of the clustering results of the complete (base) datasets and imputed datasets using K-NN and mean imputation across three levels of degradation. The features retained by PFA were used to cluster the samples and were assessed using the Adjusted Rand Index. Results showed that PFA indeed had reduced the dimensions of the data. Principal Feature Analysis also can hardly drop some feature seven when charges in the levels of degradation appear. Both feature retention and cluster recovery were negatively affected by the number of missing values in the data in all the comparison.
650 1 7 _aClustering.
_9366
650 1 7 _aFeature selections.
_92177
650 1 7 _aMissing values.
_9990
650 1 7 _aPFA Principal feature analysis.
_92178
650 1 7 _aDatasets.
_91958
650 1 7 _aAdjusted Rand Index.
_92100
650 1 7 _aMCAR (Missing completely at random)
_92103
658 _aUndergraduate Thesis
_cAMAT200,
_2BSAM
905 _aFi
905 _aUP
942 _2lcc
_cTHESIS
999 _c2244
_d2244