Biostatistics Research

Variable Selection, Clustering, and Network Testing …. What can Biostatisticians do?

(1) Clustering and joint clustering: Use probability models to (jointly) group subjects and variables such that those variables have a similar association with an external variable of interest, e.g., age or time. This research was supported by NIAID/NIH aiming to improve the homogeneity in clusters with respect to the association of allergic sensitization over time, as well as DNA methylation patterns at different loci (see Areas that the above methods applied to for its layman definition).

(2) Variable selections with and without measurement errors: Develop variable selection methods with the ability to select variables in linear regressions via improvement of Zellner’s g-prior. Data collected in various studies are sometimes accompanied by measurement errors or misclassifications, e.g., gene expression data or self-reported smoking status. Incorporating the errors into the analytical model has the potential to improve the analysis quality. This study was supported by NHLBI/NIH for the purpose of identify genetic markers via gene expressions related to lung function.

(3) Variable selections in clustering: The goal of this study is to select dependent variables and in the meantime cluster those variables showing similar patterns with external variables of interest. This is part of an NIAID/NIH funded project and aims to detect stable and dynamic DNA methylation.

(4) Graphical modeling: Develop methods to construct Bayesian network and compare directed and undirected networks from multiple populations. The concert work of genetic variants and of epigenomic features on health outcomes and its connection to the identification of genetic/epigenetic markers motivated this project.

Areas that the above methods applied to: (1) Allergic diseases including eczema, asthma, and rhinitis, lung function, and their related risk factors. (2) Obesity and related risk factors. (3) DNA methylation patterns and its change over time (DNA methylation is basically an addition of a methyl group, a chemical group, to a DNA molecule). (4) single nucleotide polymorphisms, DNA methylation, and gene expressions with respect to their joint effects.

All the projects could not succeed without tremendous contribution from the great graduate students.

Assessing Disease Risk

Yu (Joyce) Jiang, PhD

Understanding gene × environment interactions and identifying potential molecular markers are critical in the study of chronic disease, such as cancer and inflammatory disease. Most existing genetic and epigenetic association studies either focused on one single type of data, genetic (single nucleotide polymorphisms; -SNP) or epigenetic data separately. Association studies with a single type of genetic or epigenetic measurement cannot comprehensively detect underlying biological processes, in particular, the failure to capture the joint work between genetic and epigenetic factors. This is likely to result in false or non-informative molecular markers. Our research interest is to develop hierarchical Bayesian integrative models, which can comprehensively and integrative assess both genetic and epigenetic effects on the risk of disease.

Clustering fMRI meta data to Identify Significant Regions of Brain Activation

Meredith Ray, PhD

Dr. Ray and her team have developed a Bayesian clustering method for identifying significant regions of brain activation (foci). Coordinate-based meta data originating from functional magnetic resonance imaging (fMRI), which has the ability to measure the intensity of blood flow and oxygen to a location within the brain that was activated by a given thought or emotion, was of primary interest. Click here for the research abstract.