PhD Dissertation Defense - Pulin Agrawal

Applications of Sparse Representations

Pulin Agrawal, PhD Candidate

Friday, Apr. 5, 2019, 3:00 pm
Dunn Hall 375 Conference Room

Committee Members:

Prof. Vasile Rus, Chair
Prof. Bernie J. Daigle, Jr.
Prof. Emeritus Stan Franklin
Prof. Deepak Venugopal


In this dissertation I explore the properties and uses of sparse representations. Sparse representations use high dimensional binary vectors for representing information. They have many properties which make this representation useful for applications involving pattern recognition in highly noisy and complex environments. Sparse representations have a very high capacity. A typical sparse representation vector has a capacity of 10^84 distinct vectors, which is more than the number of atoms in the universe. Sparse representations are highly noise robust. They can tolerate even up to 50% noise. A very powerful and useful property of sparse representations is that they allow us to easily measure similarity between two things by directly comparing their representations. These properties allow them to have applications in a variety of fields, like Artificial Intelligence and Molecular Biology, that need to encode information that is complex and noisy in nature. In this dissertation, I show how sparse representations can be used for representing complex environments for an agent based on Learning Intelligent Decision Agent (LIDA) model. Sparse representations allowed us to achieve a two-fold goal of producing information rich representations of things in the environment while proposing a method of generating grounded representations for the LIDA model. Sparse representations also allowed us to ground the representations used by LIDA in the sensory apparatus of the agent while still allowing a perfect fidelity communication between the sensory memory of LIDA and the rest of the model. I also show how sparse representations are useful in Molecular Biology for discovering data-driven patterns in heterogeneous and noisy gene expression data. We used a sparse auto-encoder to learn sparse representations of transcriptomics experiments taken from a huge publicly available dataset. These representations were then used to identify biological patterns in the form of gene sets. The representation provided a unique signature for a set of samples originating from the same experimental condition. Applications of our method include the identification of previously undiscovered gene sets as well as supervised classification of samples from different biological classes. Overall, our results show that sparse representations are useful in a variety of fields that involve finding patterns in a complex and noisy environment.