Local cover image
Local cover image
Local cover image
Local cover image

An implementation of the K-means clustering algorithm using Silhouette plot as evaluation technique / Karen A. Abroguena

By: Material type: TextTextLanguage: English Publication details: 2005Description: 68 leavesSubject(s): Dissertation note: Thesis (BS Computer Science) -- University of the Philippines Mindanao, 2005 Abstract: This project developed a software that clusters data according to similarities. The k-Means Clustering Algorithm used in this study was the classical k-Means Algorithm but with enhanced feature which is important to the users, the evaluation method. The random technique was used in initializing the centroids. Then, the distance between the centroid and a data point is compared using the Euclidean Square Formula. The smaller the distance between the two, the more similar they are with each other. The Transferring pass-Global best improving technique was used in relocating data points. An evaluation technique known as the Silhouette Plot was added in order for the user to visualize how good or bad the clustering was. The software accepts numerical, input data with no missing values in a spreadsheet form only. Categorical of mixed (numerical and categorical) data cannot be accepted by the software. The larger the sample data is, the longer it would take for the software to cluster the data. With this software, users who analyze their data using the k-Means Algorithm need not open another application in order to evaluate the clusters generated. After clustering the iris (flower) data using the newly implemented software, differences were noted in the results obtained as compared to the results generated by the established software. There were also some data points that were misclassified. But the misclassified data points were resolved in the evaluation technique wherein the user was given a report on which cluster the data point belongs. The implemented algorithm may not be as different as the ones used in established software, but the evaluation technique compensated the said weakness
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Collection Call number Status Date due Barcode
Thesis Thesis University Library General Reference Room-Use Only LG993.5 2005 C6 A27 (Browse shelf(Opens below)) Not For Loan 3UPML00011342
Thesis Thesis University Library Archives and Records Preservation Copy LG993.5 2005 C6 A27 (Browse shelf(Opens below)) Not For Loan 3UPML00022040

Thesis (BS Computer Science) -- University of the Philippines Mindanao, 2005

This project developed a software that clusters data according to similarities. The k-Means Clustering Algorithm used in this study was the classical k-Means Algorithm but with enhanced feature which is important to the users, the evaluation method. The random technique was used in initializing the centroids. Then, the distance between the centroid and a data point is compared using the Euclidean Square Formula. The smaller the distance between the two, the more similar they are with each other. The Transferring pass-Global best improving technique was used in relocating data points. An evaluation technique known as the Silhouette Plot was added in order for the user to visualize how good or bad the clustering was. The software accepts numerical, input data with no missing values in a spreadsheet form only. Categorical of mixed (numerical and categorical) data cannot be accepted by the software. The larger the sample data is, the longer it would take for the software to cluster the data. With this software, users who analyze their data using the k-Means Algorithm need not open another application in order to evaluate the clusters generated. After clustering the iris (flower) data using the newly implemented software, differences were noted in the results obtained as compared to the results generated by the established software. There were also some data points that were misclassified. But the misclassified data points were resolved in the evaluation technique wherein the user was given a report on which cluster the data point belongs. The implemented algorithm may not be as different as the ones used in established software, but the evaluation technique compensated the said weakness

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image Local cover image
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved