PhD Dissertation Defense - Mustafa Hajeer
Handling Big Data with a Data-Aware HDFS using Evolutionary Clustering Technique
Mustafa Hajeer, PhD Candidate
Friday, September 9, 2016, 9:00 am
Dunn Hall 311
Committee Members:
Prof. Dipankar Dasgupta, Advisor
Prof. Vasile Rus
Prof. Lan Wang
Prof. Zhuo Lu
Many tools and techniques have been developed to analyze big collections of data. The increase use of cyber-enabled systems such as Internet-of-Things (IoT) and sensors are generating a massive amount of data with different structures. Most of the new big data solutions are built on top of Hadoop eco-system, or at least use its distributed file system (HDFS). However, studies showed inefficiency in such systems while dealing with modern data. Although Some research overcame these problems for specific types of graph data, modern data are more than one type. Such efficiency issues lead to larger-scale problems such as, larger datacenters space, waste in resource like networks usage and power consumption, which in turn leads to environmental problems such as more carbon emission. This dissertation proposes a data-aware packaging for Hadoop eco-system and its distributed file system. Such a framework allows Hadoop to manage the distribution and the placement of data based on cluster analysis of the data itself. Unlike previous efforts, we are able to handle broader range of data types, optimize wider range of processes as well as optimize query time and resource usage.