TY  - BOOK
AU  - Labayan, Luchie Marie A.
TI  - Hierarchical clustering for mixed dataset based on variance and entropy
PY  - 2009///
KW  - Aggregation
KW  - Clustering
KW  - Dissimilarity coefficients
KW  - Distance hierarchy
KW  - Entropy
KW  - Variance
KW  - Mixed data
KW  - Hierarchical clustering
KW  - Dissimilarity measures
KW  - Datasets
KW  - Undergraduate Thesis
KW  - AMAT200
N1  - Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2009
N2  - Hsu's coefficient for mixed data was modified in terms of the aggregation technique and the entropy-based distance function the efficiently cluster mixed datasets. There are six proposed dissimilarly coefficients which use variance for numerical attributes and entropy such as Shannon's weighted entropy. Havrda-Charvat's structural a-entropy and Jensen-Shannon divergence for categorical attributes. The type of data has a significant effect on the clustering produced accompanied by the aggregation function used and the entropy measure employed. For data whose categorical values have no level of similarity. The proposed dissimilarity coefficients that are closely related and generated similar dendrograms are those which have the same aggregation function. Based on the performance, the proposed dissimilarity coefficients that used De Carvallo's dissimilarity measure as aggregation function produced better clustering solution. On the other hand, for data whose categorical values have different degrees of similarity, only the proposed dissimilarity coefficients that used Shannon's entropy weighted by the distance in the distance hierarchy deviated from the group. The proposed dissimilarity coefficients that used De Carvallo's extension of Ichino and Yaguchi's dissimilarity as aggregation function worked well in clustering. The six proposed dissimilarity coefficients performed better with mixed data compared to the existing dissimilarity measures for mixed data
ER  -