Thesis Defense Announcement

The College of Arts and Sciences announces the Final Thesis Defense of

Lauren Sobral

for the Degree of Master of Science

November 6, 2018 at 2:00 PM in Dunn Hall, Room 203

Advisor: Dale Bowman

Comparing and Contrasting Clustering Analysis Methods: K-means and Vector in Partition

ABSTRACT: This paper delves into the similarities and differences between two methods of exploratory cluster analysis, K-means and Vector in Partition. Known as the traditional clustering approach, K-means does have some limitations when dealing with clustering complex datasets, specifically datasets with variables of multidimensional vectors. This is the gap the Vector in Partition (VIP) algorithm aims to fill. As a novel approach for clustering multidimensional datasets of both continuous and categorical data, the VIP algorithm has preliminary results that support its ability to correctly cluster simulated datasets of the genetic factors, gene expression (GE), CPG, and SNP. After explaining both the K-means algorithm and the VIP algorithm, we will look at an example of simulated genetic data containing variables with multidimensional vectors that will be analyzed with both algorithms. The results will then be summarized using accuracy, sensitivity, and specificity while we highlight the benefits and limitations of each clustering method.