Local cover image
Local cover image
Local cover image
Local cover image

Approaches in handling missing values in randomly amplified polymorphic DNA (RAPD) analysis / Mabele Palmes Malagamba.

By: Material type: TextTextLanguage: English Publication details: 2006Description: 114 leavesSubject(s): Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2006 Abstract: Randomly amplified polymorphic DNA (RAPD) experiments produce large amount of data. The hierarchical is commonly used in constructing a dendrogram. However, RAPD data usually contain missing values caused by experimental errors that even at a low rate can be a major drawback for computing the similarity and the use of clustering methods. Thus, missing values were treated with case deletion and data imputation. This study focuses on the approaches in handling missing values present in RAPD data and its effects on the construction of phylogenetic tree. This was obtained by comparing the existing techniques in handling missing values such as zero replacement, K-nearest neighbor imputation (KNN) and by developing an alternative approach for obtaining similarity indices that will accommodate incomplete data sets. The results of the study present the modified similarity coefficients and comparative experiments of the methods in handling missing values. In comparing the methods in handling missing values, in general, the KNN outperformed the zero replacement and the modified similarity coefficients at almost all levels of degradation. However, at a low rate of missing values modified similarity coefficients outdid the KNN and zero replacement methods. Moreover, the single linkage seemed the most stable levels of degradation. The average linkage performed fairly among the clustering algorithm. In addition, the complete linkage gives the worst result because of its low recovery in all levels of degradation. Furthermore, both Jaccard and Sorensen-Dice similarity coefficients had similar performance. Thus, the impact of missing values depends on the hierarchical clustering algorithm used. Also, the performance of an approach in handling missing values depends on the rate of the missing values.
List(s) this item appears in: BS Applied Mathematics
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Cover image Item type Current library Collection Call number Status Date due Barcode
Thesis Thesis University Library Theses Room-Use Only LG993.5 2006 A64 M36 (Browse shelf(Opens below)) Not For Loan 3UPML00011647
Thesis Thesis University Library Archives and Records Preservation Copy LG993.5 2006 A64 M36 (Browse shelf(Opens below)) Not For Loan 3UPML00021976

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2006

Randomly amplified polymorphic DNA (RAPD) experiments produce large amount of data. The hierarchical is commonly used in constructing a dendrogram. However, RAPD data usually contain missing values caused by experimental errors that even at a low rate can be a major drawback for computing the similarity and the use of clustering methods. Thus, missing values were treated with case deletion and data imputation. This study focuses on the approaches in handling missing values present in RAPD data and its effects on the construction of phylogenetic tree. This was obtained by comparing the existing techniques in handling missing values such as zero replacement, K-nearest neighbor imputation (KNN) and by developing an alternative approach for obtaining similarity indices that will accommodate incomplete data sets. The results of the study present the modified similarity coefficients and comparative experiments of the methods in handling missing values. In comparing the methods in handling missing values, in general, the KNN outperformed the zero replacement and the modified similarity coefficients at almost all levels of degradation. However, at a low rate of missing values modified similarity coefficients outdid the KNN and zero replacement methods. Moreover, the single linkage seemed the most stable levels of degradation. The average linkage performed fairly among the clustering algorithm. In addition, the complete linkage gives the worst result because of its low recovery in all levels of degradation. Furthermore, both Jaccard and Sorensen-Dice similarity coefficients had similar performance. Thus, the impact of missing values depends on the hierarchical clustering algorithm used. Also, the performance of an approach in handling missing values depends on the rate of the missing values.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image Local cover image
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved