Exploratory analysis on DPCoA on small data sets with missing values using imputation methods /

Mahinay, Kristine Karen C.

Exploratory analysis on DPCoA on small data sets with missing values using imputation methods / Kristine Karen C. Mahinay - 2010 - 64 leaves.

Thesis (BS Applied Mathematics) -- University of the Philippines Mindanao, 2010

A new ordination method, DPCoA, allows comparison among several communities containing species that differ in taxonomic features. However, missing information in a data set are inevitable, and DPCoA does not have an internal method that can handle missing value in a data set. As an introductory and exploratory analysis, the study determined how the commonly used imputation methods, namely, mean imputation, k-nearest neighbor imputation, regression imputation, and expectation-maximization imputation, used as pre-processing step to DPCoA for incomplete abundance data sets affect the quadratic entropy and DPCoA plots of the sites studied. Also, different levels of degradation -1%, 5% and 15% - were investigated as to how well these imputation approaches behave when high amount of missing values are present in the data set. Rao DIVCs generated and DPCoA plots obtained from the complete data and the imputed data sets were compared using Spearman rank correlation and Procrustes analysis, respectively. Results showed that the imputation methods employed yield high Spearman correlation coefficients and correlation of Procrustes rotation when missing values are relatively small as they estimate close to the real value that was lost. Consequently, as greater amounts of missing values exist, a weaker performance of the imputation methods, especially that of the expectation-maximization imputation, cold be obtained. Although it is expected that expectation-maximization imputation yield good estimates, a lower Spearman coefficient and correlation of Procrustes rotation were computed. This would lead to higher risk of misinterpretation in the relationships of several communities. However, since this study is time-bound, it is suggested that this study be repeated several times to further evaluate the performance of the imputation methods.


Double principal coordinate analysis
Imputation
Missing values
Rao diversity coefficient


Undergraduate Thesis --AMAT200
 
University of the Philippines Mindanao
The University Library, UP Mindanao, Mintal, Tugbok District, Davao City, Philippines
Email: library.upmindanao@up.edu.ph
Contact: (082)295-7025
Copyright @ 2022 | All Rights Reserved