Approaches in handling missing values in randomly amplified polymorphic DNA (RAPD) analysis / Mabele Palmes Malagamba.

By:

Malagamba, Mabele Palmes

Material type: Text

TextLanguage: English Publication details: 2006Description: 114 leavesSubject(s):

Undergraduate Thesis AMAT200

Dissertation note: Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2006 Abstract: Randomly amplified polymorphic DNA (RAPD) experiments produce large amount of data. The hierarchical is commonly used in constructing a dendrogram. However, RAPD data usually contain missing values caused by experimental errors that even at a low rate can be a major drawback for computing the similarity and the use of clustering methods. Thus, missing values were treated with case deletion and data imputation. This study focuses on the approaches in handling missing values present in RAPD data and its effects on the construction of phylogenetic tree. This was obtained by comparing the existing techniques in handling missing values such as zero replacement, K-nearest neighbor imputation (KNN) and by developing an alternative approach for obtaining similarity indices that will accommodate incomplete data sets. The results of the study present the modified similarity coefficients and comparative experiments of the methods in handling missing values. In comparing the methods in handling missing values, in general, the KNN outperformed the zero replacement and the modified similarity coefficients at almost all levels of degradation. However, at a low rate of missing values modified similarity coefficients outdid the KNN and zero replacement methods. Moreover, the single linkage seemed the most stable levels of degradation. The average linkage performed fairly among the clustering algorithm. In addition, the complete linkage gives the worst result because of its low recovery in all levels of degradation. Furthermore, both Jaccard and Sorensen-Dice similarity coefficients had similar performance. Thus, the impact of missing values depends on the hierarchical clustering algorithm used. Also, the performance of an approach in handling missing values depends on the rate of the missing values.

List(s) this item appears in: BS Applied Mathematics

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 2 )
Title notes ( 2 )
Comments ( 0 )
Images

Holdings
Cover image	Item type	Current library	Collection	Call number	Status	Date due	Barcode
	Thesis	University Library Theses	Room-Use Only	LG993.5 2006 A64 M36 (Browse shelf(Opens below))	Not For Loan		3UPML00011647
	Thesis	University Library Archives and Records	Preservation Copy	LG993.5 2006 A64 M36 (Browse shelf(Opens below))	Not For Loan		3UPML00021976

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2006

Randomly amplified polymorphic DNA (RAPD) experiments produce large amount of data. The hierarchical is commonly used in constructing a dendrogram. However, RAPD data usually contain missing values caused by experimental errors that even at a low rate can be a major drawback for computing the similarity and the use of clustering methods. Thus, missing values were treated with case deletion and data imputation. This study focuses on the approaches in handling missing values present in RAPD data and its effects on the construction of phylogenetic tree. This was obtained by comparing the existing techniques in handling missing values such as zero replacement, K-nearest neighbor imputation (KNN) and by developing an alternative approach for obtaining similarity indices that will accommodate incomplete data sets. The results of the study present the modified similarity coefficients and comparative experiments of the methods in handling missing values. In comparing the methods in handling missing values, in general, the KNN outperformed the zero replacement and the modified similarity coefficients at almost all levels of degradation. However, at a low rate of missing values modified similarity coefficients outdid the KNN and zero replacement methods. Moreover, the single linkage seemed the most stable levels of degradation. The average linkage performed fairly among the clustering algorithm. In addition, the complete linkage gives the worst result because of its low recovery in all levels of degradation. Furthermore, both Jaccard and Sorensen-Dice similarity coefficients had similar performance. Thus, the impact of missing values depends on the hierarchical clustering algorithm used. Also, the performance of an approach in handling missing values depends on the rate of the missing values.

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer