Clustering of data sets with missing values using statistical imputation methods / (Record no. 532)
[ view plain ]
000 -LEADER | |
---|---|
fixed length control field | 02214nam a2200241 4500 |
001 - CONTROL NUMBER | |
control field | UPMIN-00000010897 |
003 - CONTROL NUMBER IDENTIFIER | |
control field | UPMIN |
005 - DATE AND TIME OF LATEST TRANSACTION | |
control field | 20230201165228.0 |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
fixed length control field | 230201b |||||||| |||| 00| 0 eng d |
040 ## - CATALOGING SOURCE | |
Original cataloging agency | DLC |
Transcribing agency | UPMin |
Modifying agency | upmin |
041 ## - LANGUAGE CODE | |
Language code of text/sound track or separate title | eng |
090 ## - LOCALLY ASSIGNED LC-TYPE CALL NUMBER (OCLC); LOCAL CALL NUMBER (RLIN) | |
Classification number (OCLC) (R) ; Classification number, CALL (RLIN) (NR) | LG993.5 2000 |
Local cutter number (OCLC) ; Book number/undivided call number, CALL (RLIN) | A64 M34 |
100 1# - MAIN ENTRY--PERSONAL NAME | |
Personal name | Macabenta, Mel Zha Leah M. |
9 (RLIN) | 2020 |
245 00 - TITLE STATEMENT | |
Title | Clustering of data sets with missing values using statistical imputation methods / |
Statement of responsibility, etc. | Mel Zha Leah M.Macabenta. |
260 ## - PUBLICATION, DISTRIBUTION, ETC. | |
Date of publication, distribution, etc. | 2000 |
300 ## - PHYSICAL DESCRIPTION | |
Extent | 65 leaves. |
502 ## - DISSERTATION NOTE | |
Dissertation note | Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2000 |
520 3# - SUMMARY, ETC. | |
Summary, etc. | K-means clustering algorithm is the most widely used clustering algorithm in the field of data analysis. One major drawback of this algorithm is that it can never accommodate data set with missing values. However, in reality, occurrence of missing values can not be avoided. Imputation methods are more extensively used in treating missing values compared to deletion. Several imputation methods are suggested but each has advantages and disadvantages over the others, so proper choice of imputation methods is very necessary. Two of the statistical imputation methods namely, hot deck imputation and imputation using a prediction model were used in treating the incomplete data sets. The incomplete data sets after treatment were then clustered using the K-means clustering algorithm. To have a clear comparison, five data sets were used with two kinds of missingness, missing completely at random (MCAR) and missing at random (MAR) at five different levels of degradation ranging from 1% missing values. The evaluation of the resulting clusters was done using the adjusted Rand index. The two methods were compared to the modified K-means algorithm, particularly the modified Euclidean distance. Results showed that the hot deck imputation, regression method and modified K-means clustering algorithm attained a high recovery of clusters especially with big data sets until 30% levels of missing values. In small data sets, good recovery is attained until 10% level of missing values only. |
658 ## - INDEX TERM--CURRICULUM OBJECTIVE | |
Main curriculum objective | Undergraduate Thesis |
Curriculum code | AMAT200 |
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN) | |
a | Fi |
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN) | |
a | UP |
942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
Source of classification or shelving scheme | Library of Congress Classification |
Koha item type | Thesis |
Withdrawn status | Lost status | Source of classification or shelving scheme | Damaged status | Status | Collection | Home library | Current library | Shelving location | Date acquired | Source of acquisition | Accession Number | Total Checkouts | Full call number | Barcode | Date last seen | Price effective from |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Library of Congress Classification | Not For Loan | Preservation Copy | University Library | University Library | Archives and Records | 2006-06-27 | donation | UAR-T-gd745 | LG993.5 2000 A64 M34 | 3UPML00021978 | 2022-09-21 | 2022-09-21 | ||||
Library of Congress Classification | Not For Loan | Room-Use Only | College of Science and Mathematics | University Library | Theses | 2006-06-27 | donation | CSM-T-gd1430 | LG993.5 2006 A64 M34 | 3UPML00011616 | 2022-09-21 | 2022-09-21 |