MARC details
000 -LEADER |
fixed length control field |
02133nam a22003373a 4500 |
001 - CONTROL NUMBER |
control field |
UPMIN-00003211650 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
UPMIN |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20230208143956.0 |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
230208b |||||||| |||| 00| 0 eng d |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
DLC |
Modifying agency |
upmin |
Transcribing agency |
UPMin |
041 ## - LANGUAGE CODE |
Language code of text/sound track or separate title |
eng |
090 #0 - LOCALLY ASSIGNED LC-TYPE CALL NUMBER (OCLC); LOCAL CALL NUMBER (RLIN) |
Classification number (OCLC) (R) ; Classification number, CALL (RLIN) (NR) |
LG993.5 2008 |
Local cutter number (OCLC) ; Book number/undivided call number, CALL (RLIN) |
A64 P44 |
100 ## - MAIN ENTRY--PERSONAL NAME |
Personal name |
Pelpinosas, Frank B. |
9 (RLIN) |
2176 |
245 ## - TITLE STATEMENT |
Title |
Clustering of datasets with missing values using principal feature analysis as a feature selection tool / |
Statement of responsibility, etc. |
Frank B. Pelpinosas. |
260 ## - PUBLICATION, DISTRIBUTION, ETC. |
Date of publication, distribution, etc. |
2008 |
300 ## - PHYSICAL DESCRIPTION |
Extent |
51 leaves. |
502 ## - DISSERTATION NOTE |
Dissertation note |
Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2008 |
520 ## - SUMMARY, ETC. |
Summary, etc. |
One of the most prevalent problems |
520 3# - SUMMARY, ETC. |
Summary, etc. |
One of the most prevalent problems in clustering is the presence of redundant and irrelevant features, which could damage and misguide the clustering results of the data. Principal Feature Analysis is used as a filter feature selection tool to reduce highly dimensional datasets into smaller dimensions yet preserving the original structure of the data. The problem is worsened with the presence of missing values in the data. The study provides a comparison of the clustering results of the complete (base) datasets and imputed datasets using K-NN and mean imputation across three levels of degradation. The features retained by PFA were used to cluster the samples and were assessed using the Adjusted Rand Index. Results showed that PFA indeed had reduced the dimensions of the data. Principal Feature Analysis also can hardly drop some feature seven when charges in the levels of degradation appear. Both feature retention and cluster recovery were negatively affected by the number of missing values in the data in all the comparison. |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Clustering. |
9 (RLIN) |
366 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Feature selections. |
9 (RLIN) |
2177 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Missing values. |
9 (RLIN) |
990 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
PFA Principal feature analysis. |
9 (RLIN) |
2178 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Datasets. |
9 (RLIN) |
1958 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Adjusted Rand Index. |
9 (RLIN) |
2100 |
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
MCAR (Missing completely at random) |
9 (RLIN) |
2103 |
658 ## - INDEX TERM--CURRICULUM OBJECTIVE |
Main curriculum objective |
Undergraduate Thesis |
Curriculum code |
AMAT200, |
Source of term or code |
BSAM |
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN) |
a |
Fi |
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN) |
a |
UP |
942 ## - ADDED ENTRY ELEMENTS (KOHA) |
Source of classification or shelving scheme |
Library of Congress Classification |
Koha item type |
Thesis |