A modified k-means clustering algorithm with Mahalanobis distance for clustering incomplete data sets / (Record no. 2269)

MARC details
000 -LEADER
fixed length control field 03200nam a22004453a 4500
001 - CONTROL NUMBER
control field UPMIN-00003300442
003 - CONTROL NUMBER IDENTIFIER
control field UPMIN
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20230202172209.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 230202b |||||||| |||| 00| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency DLC
Transcribing agency UPMin
Modifying agency upmin
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title eng
090 #0 - LOCALLY ASSIGNED LC-TYPE CALL NUMBER (OCLC); LOCAL CALL NUMBER (RLIN)
Classification number (OCLC) (R) ; Classification number, CALL (RLIN) (NR) LG993.5 2009
Local cutter number (OCLC) ; Book number/undivided call number, CALL (RLIN) A64 M67
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Moreno, Iresh Granada.
9 (RLIN) 2090
245 #2 - TITLE STATEMENT
Title A modified k-means clustering algorithm with Mahalanobis distance for clustering incomplete data sets /
Statement of responsibility, etc. Iresh Granada Moreno.
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2009
300 ## - PHYSICAL DESCRIPTION
Extent 94 leaves.
502 ## - DISSERTATION NOTE
Dissertation note Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2009
520 3# - SUMMARY, ETC.
Summary, etc. Cluster analysis is an art of finding grounds in data in such a way that objects in the same group are similar to each other, whereas objects in different groups are as dissimilar as possible. The most commonly used clustering algorithm is the K-means with Euclidean distance. However, such distance function neglects the covariance among the variables in calculating distances. To account for this issue, the Mahalanobis distance is used. However, occurrence of missing values is inevitable and clustering such kind of data set is impossible. Existing method such as case deletion and mean imputation for treating missing values are very prone to producing erroneous conclusions by imputing unreliable estimates and significantly reducing the data set. To avoid these problems, modifications of the K-means clustering algorithm's two most essential elements, allocation and representation, were made. Allocation, which was defined by the Mahalanobis distance, was modified to compute distances between two vectors and to compute variances with some unknown values. The representation which was defined by arithmetic mean was modified to estimate mean where there are one or more unknown values of the certain attribute. The proposed algorithm was applied to Iris and Bupa incomplete data sets simulated under MCAR and MAR assumptions with different levels of missing values. Under MAR, case deletion has the highest cluster recovery at 5% of the samples. However, it was totally outperformed by the proposed algorithm as the occurrences of missing values in the sample increased. In general, the modified k-means with Mahalanobis distance has outdone the rest of the algorithms when applied to both data sets.
610 ## - SUBJECT ADDED ENTRY--CORPORATE NAME
Corporate name or jurisdiction name as entry element Philippine Eagle Foundation.
9 (RLIN) 2091
610 ## - SUBJECT ADDED ENTRY--CORPORATE NAME
Corporate name or jurisdiction name as entry element Philippine Eagle Foundation
Geographic subdivision Davao City
-- Philippines.
9 (RLIN) 2092
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Clustering.
9 (RLIN) 366
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element K-means clustering.
9 (RLIN) 2093
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Mahalanobis distance.
9 (RLIN) 2094
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Clustering algorithm.
9 (RLIN) 1300
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Data sets.
9 (RLIN) 1992
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Modified algorithm.
9 (RLIN) 2095
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Incomplete data.
9 (RLIN) 2096
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Missing Values.
9 (RLIN) 990
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Iris data base.
9 (RLIN) 2097
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element BUPA data base.
9 (RLIN) 2098
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Cluster analysis.
9 (RLIN) 2099
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Adjusted Rand Index.
9 (RLIN) 2100
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Multivariate techniques.
9 (RLIN) 2101
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element MAR (Missing at random).
9 (RLIN) 2102
650 17 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element MCAR (Missing completely at random).
9 (RLIN) 2103
658 ## - INDEX TERM--CURRICULUM OBJECTIVE
Main curriculum objective Undergraduate Thesis
Curriculum code AMAT200
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN)
a Fi
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN)
a UP
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Library of Congress Classification
Koha item type Thesis
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Status Collection Home library Current library Shelving location Date acquired Source of acquisition Accession Number Total Checkouts Full call number Barcode Date last seen Price effective from
    Library of Congress Classification   Not For Loan Preservation Copy University Library University Library Archives and Records 2009-07-28 donation UAR-T-gd1230   LG993.5 2009 A64 M67 3UPML00032503 2022-10-05 2022-10-05
    Library of Congress Classification   Not For Loan Room-Use Only College of Science and Mathematics University Library Theses 2009-07-22 donation CSM-T-gd2104   LG993.5 2009 A64 M67 3UPML00012377 2022-10-05 2022-10-05
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved