Bio-data Refining and Dimension Reduction

PI: Ker-Chau Li

The post-genome era has arrived with a torrent of high throughput genomic and proteomic data, useful for dissecting the complex genetic circuitry within cells of an organism. The goal of biodata-refining is to process such data in a way like a refinery processes crude oil. With an array of analysis tools, many of them yet to be invented, we hope to distil information of various kind to meet diverse needs such as pathway studies, disease gene searching and pharmacogenomic research. Our lab currently focuses on microarray gene expression data analysis. The aim is to build an integrated system for exploring multiple public-accessible gene expression databases. This system is based on the newly introduced concept of liquid association (LA). It also employs clustering and other statistical dimension reduction techniques to enhance the analysis. The system will integrate data from protein complex, transcription factor binding, genetic markers, drug sensitivity profiling and worldwide genomic knowledgebases to distil biological information from microarray data.

Qing Zhou’s Group

PI: Qing Zhou

Our goal is to develop statistical methodology for efficient analysis of large-scale high-throughput genomic data. We employ likelihood-based methods, such as Bayesian modeling and regularization, to make statistical inference on these data. We are interested in a detailed understanding of gene regulation and aim to decode regulatory circuits by integrating gene expression data, RNA-Seq data, ChIP-Seq data, and DNA sequence data. The focus of our biological applications is on mouse embryonic stem cells. We have found novel transcription factor binding motifs and have constructed regulatory networks in this model system. In addition, we also have biological applications in alternative splicing and complex diseases via collaboration with experimental groups.