-
Statistical Genomics
-
PI: Chiara Sabatti
Sequences of entire genomes, genotypes of individual variations in thousand of polymorphic loci and hundreds of individuals, gene expression measurements via cDNA chips on thousand of genes in a variety of conditions: these are some of the types of datasets are now available to genetic researchers. And they are examples of what are the challenges coming from genetics to the information sciences. The statistical genetics laboratory use tools from information theory, Bayesian statistics, Markov chain Monte Carlo to identify in these massive datasets scientifically valuable information. -
Bio-data Refining and Dimension Reduction
-
PI: Ker-Chau Li
The post-genome era has arrived with a torrent of high throughput genomic and proteomic data, useful for dissecting the complex genetic circuitry within cells of an organism. The goal of biodata-refining is to process such data in a way like a refinery processes crude oil. With an array of analysis tools, many of them yet to be invented, we hope to distil information of various kind to meet diverse needs such as pathway studies, disease gene searching and pharmacogenomic research. Our lab currently focuses on microarray gene expression data analysis. The aim is to build an integrated system for exploring multiple public-accessible gene expression databases. This system is based on the newly introduced concept of liquid association (LA). It also employs clustering and other statistical dimension reduction techniques to enhance the analysis. The system will integrate data from protein complex, transcription factor binding, genetic markers, drug sensitivity profiling and worldwide genomic knowledgebases to distil biological information from microarray data. -
Qing Zhou’s Home Page
-
PI: Qing Zhou
- Computational biology: (1) Statistical models and computational algorithms for detecting transcription factor binding motifs and cis-regulatory modules, emphasizing on the use of combinatorial control and evolutionary conservation information to enhance predictive power. (2) Statistical methodology for integrated analyses of gene expression data, transcription factor binding data (ChIP-chip/seq), and multiple genomic sequence data, with applications to statistical inference of gene regulatory networks in important cellular processes.
- Monte Carlo methods: Energy-temperature design in population-based Markov chain Monte Carlo methods, exploration and characterization of statistical and topological properties of energy landscapes from Monte Carlo samples, applications in statistics and physics.
- Bayesian statistics: Missing data problems, Bayesian inference, Bayesian hierarchical models.