Professor, Chinese Academy of Sciences, 2010-
Associate Professor, University of Southern California, 2005-2010
Assistant Professor, University of Southern California, 2002-2005
Assistant Professor, Florida State University, 1998-2002
Fellow IPAM, UCLA, 2000, fall
Visiting scientist, Walter and Eliza Hall Institute, 2000, summer
Statistical Consultant, University of California, Berkeley, 1997, summer
Research Associate, Institute of Systems Sciences, Chinese Academy of Science, 1991-1993
Type 2 diabetes (T2D) is a complex and polygenic disease yet in need of a complete picture of its development mechanisms. To better understand the mechanisms, we examined gene expression profiles of multi-tissues from outbred mice fed with a high-fat diet (HFD) or regular chow at weeks 1, 9, and 18. To analyze such complex data, we proposed a novel dual eigen-analysis, in which the sample- and gene-eigenvectors correspond respectively to the macro- and micro-biology information. The dual eigen-analysis identified the HFD eigenvectors as well as the endogenous eigenvectors for each tissue. The results imply that HFD influences the hepatic function or the pancreatic development as an exogenous factor, while in adipose HFD’s impact roughly coincides with the endogenous eigenvector driven by aging. The enrichment analysis of the eigenvectors revealed diverse HFD impact on the three tissues over time. The diversity includes: inflammation, degradation of branched chain amino acids (BCAA), and regulation of peroxisome proliferator activated receptor gamma (PPARγ). We reported that in the pancreas remarkable up-regulation of angiogenesis as downstream of the HIF signaling pathway precedes hyperinsulinemia. The dual eigen-analysis and discoveries provide new evaluations/guidance in T2D prevention and therapy, and will also promote new thinking in biology and medicine.
Base-calling accuracy is crucial for high-throughput DNA sequencing and downstream analysis such as read mapping and genome assembly. Accordingly, we made an endeavor to reduce DNA sequencing errors of Illumina systems by correcting three kinds of crosstalk in the cluster intensity data. We discovered that signal crosstalk between adjacent clusters accounts for a large portion of sequencing errors in Illumina systems, even after correcting color crosstalk caused by the overlap of dye emission spectra and phasing/pre-phasing caused by out-of-step nucleotide synthesis. Interestingly and importantly, spatial crosstalk between adjacent clusters is cluster-specific and often asymmetric, which cannot be corrected by existing deconvolution methods. Therefore, we introduce a novel mathematical method able to estimate and remove spatial crosstalk, thereby reducing base-calling errors by 44–69% at a given mapping rate from Illumina systems. Furthermore, the resolution gained from this work provides new room for higher throughput of DNA sequencing and of general measurement systems using fluorescence-based imaging technology. The resulting base-caller 3Dec is available for academic users at http://github.com/flishwnag/3dec. Not only does it reduce 62.1% errors compared to the standard pipeline, but also its implementation is fast enough for daily sequencing.
In this study, we used logistic regression models to evaluate quality scores from predictive features, which include different aspects of the sequencing signals as well as local DNA contents. Sparse models were further obtained by three methods: the backward deletion with either AIC or BIC and the L 1 regularization learning method. The L 1-regularized one was then compared with the Illumina scoring method.
We applied sub–sub normalization with one-knot SPLINE to Agilent two-color microarray. We made gene set enrichment analysis and transcriptional inferences. • We proposed the BASE2.0 method to separately infer a TF’s up- and down-regulation. • [Gd@C82(OH)22]n induces apoptosis through ER stress and TP53 in the MCF-7 cells. • [Gd@C82(OH)22]n modifies a network around TP53 including HOXA5, PLZF, and FOXO3.
We propose a fast-mapping approach, referred to as "SEME," which has two core steps: First it scans a read sequentially in a specific order for a k-mer exact match seed; next it extends the alignment on both sides allowing, at most, one short INDEL each using a novel method called “auto-match function.”
Astrocytic GFAP expression increases during normal aging in many brain regions and in primary astrocyte cultures derived from aging rodent brains. As shown below, we unexpectedly found that the age-related increase of GFAP expression was suppressed in mixed glia (astrocytes + microglia). However, the age-related increase of GFAP was observed when E18 neurons were co-cultured with mixed glia. Thus, the presence of microglia can suppress the age-related increase of GFAP, in primary cultures of astrocytes. To more broadly characterize how aging and co-culture with neurons alters glial gene expression, we profiled gene expression in mixed glia from young (3 mo) and old (24 mo) male rat cerebral cortex by Affymetrix microarray (Rat230 2.0). The majority of age changes were independent of the presence of neurons. Overall, the expression of 2-fold more genes increased with age than decreased with age. The minority of age changes that were either suppressed or revealed by the presence of neurons may be useful to analyze glial-neuron interaction during aging. Some in vitro changes are shared with those of aging rat hippocampus in studies from the Landfield group (Rowe et al., 2007; Kadish et al., 2009).
The massively parallel sequencing technologies have recently flourished and dramatically cut the cost to sequence personal human genomes. Haplotype assembly from personal genomes sequenced using the massively parallel sequencing technologies is becoming a cost-effective and promising tool for human disease study. Computational assembly of haplotypes has been proved to be very accurate, but obviously contains errors. Here we present a tool, HapEdit, to assess the accuracy of assembled haplotypes and edit them manually. Using this tool, a user can break erroneous haplotype segments into smaller segments, or concatenate haplotype segments if the concatenated haplotype segments are sufficiently supported. A user can also edit bases with low-quality scores. HapEdit displays haplotype assemblies so that a user can easily navigate and pinpoint a region of interest. As inputs, HapEdit currently takes reads from the Polonator, Illumina, SOLiD, 454 and Sanger sequencing technologies.
In an attempt to elucidate the underlying longevity-promoting mechanisms of mutants lacking SCH9, which live three times as long as wild type chronologically, we measured their time-course gene expression profiles. We interpreted their expression time differences by statistical inferences based on prior biological knowledge, and identified the following significant changes: (i) between 12 and 24 h, stress response genes were up-regulated by larger fold changes and ribosomal RNA (rRNA) processing genes were down-regulated more dramatically; (ii) mitochondrial ribosomal protein genes were not up-regulated between 12 and 60 h as wild type were; (iii) electron transport, oxidative phosphorylation and TCA genes were down-regulated early; (iv) the up-regulation of TCA and electron transport was accompanied by deep down-regulation of rRNA processing over time; and (v) rRNA processing genes were more volatile over time, and three associated cis-regulatory elements [rRNA processing element (rRPE), polymerase A and C (PAC) and glucose response element (GRE)] were identified. Deletion of AZF1, which encodes the transcriptional factor that binds to the GRE element, reversed the lifespan extension of sch9Δ. The significant alterations in these time-dependent expression profiles imply that the lack of SCH9 turns on the longevity programme that extends the lifespan through changes in metabolic pathways and protection mechanisms, particularly, the regulation of aerobic respiration and rRNA processing.
Haplotype assembly is becoming a very important tool in genome sequencing of human and other organisms. Although haplotypes were previously inferred from genome assemblies, there has never been a comparative haplotype browser that depicts a global picture of whole-genome alignments among haplotypes of different organisms. We introduce a whole-genome HAPLotype brOWSER (HAPLOWSER), providing evolutionary perspectives from multiple aligned haplotypes and functional annotations. Haplowser enables the comparison of haplotypes from metagenomes, and associates conserved regions or the bases at the conserved regions with functional annotations and custom tracks. The associations are quantified for further analysis and presented as pie charts. Functional annotations and custom tracks that are projected onto haplotypes are saved as multiple files in FASTA format. Haplowser provides a user-friendly interface, and can display alignments of haplotypes with functional annotations at any resolution.
The effect of calorie restriction (CR) on life span extension, demonstrated in organisms ranging from yeast to mice, may involve the down-regulation of pathways, including Tor, Akt, and Ras. Here, we present data suggesting that yeast Tor1 and Sch9 (a homolog of the mammalian kinases Akt and S6K) is a central component of a network that controls a common set of genes implicated in a metabolic switch from the TCA cycle and respiration to glycolysis and glycerol biosynthesis. During chronological survival, mutants lacking SCH9 depleted extracellular ethanol and reduced stored lipids, but synthesized and released glycerol. Deletion of the glycerol biosynthesis genes GPD1, GPD2, or RHR2, among the most up-regulated in long-lived sch9Δ, tor1Δ, and ras2Δ mutants, was sufficient to reverse chronological life span extension in sch9Δ mutants, suggesting that glycerol production, in addition to the regulation of stress resistance systems, optimizes life span extension. Glycerol, unlike glucose or ethanol, did not adversely affect the life span extension induced by calorie restriction or starvation, suggesting that carbon source substitution may represent an alternative to calorie restriction as a strategy to delay aging.
In this paper, we propose a computational approach that integrates microarray expression data with the transcription factor binding site information to systematically identify transcription factors associated with patient survival given a specific cancer type. This approach was applied to two gene expression data sets for breast cancer and acute myeloid leukemia. We found that two transcription factor families, the steroid nuclear receptor family and the ATF/CREB family, are significantly correlated with the survival of patients with breast cancer; and that a transcription factor named T-cell acute lymphocytic leukemia 1 is significantly correlated with acute myeloid leukemia patient survival.
Calorie restriction (CR), the only non-genetic intervention known to slow aging and extend life span in organisms ranging from yeast to mice, has been linked to the down-regulation of Tor, Akt, and Ras signaling. In this study, we demonstrate that the serine/threonine kinase Rim15 is required for yeast chronological life span extension caused by deficiencies in Ras2, Tor1, and Sch9, and by calorie restriction. Deletion of stress resistance transcription factors Gis1 and Msn2/4, which are positively regulated by Rim15, also caused a major although not complete reversion of the effect of calorie restriction on life span. The deletion of both RAS2 and the Akt and S6 kinase homolog SCH9 in combination with calorie restriction caused a remarkable 10-fold life span extension, which, surprisingly, was only partially reversed by the lack of Rim15. These results indicate that the Ras/cAMP/PKA/Rim15/Msn2/4 and the Tor/Sch9/Rim15/Gis1 pathways are major mediators of the calorie restriction-dependent stress resistance and life span extension, although additional mediators are involved. Notably, the anti-aging effect caused by the inactivation of both pathways is much more potent than that caused by CR.
We propose a method to infer the effective regulatory activities of miRNAs by integrating microarray expression data with miRNA target predictions. The method is based on the idea that regulatory activity changes of miRNAs could be reflected by the expression changes of their target transcripts measured by microarray. To validate this method, we apply it to the microarray data sets that measure gene expression changes in cell lines after transfection or inhibition of several specific miRNAs. The results indicate that our method can detect activity enhancement of the transfected miRNAs as well as activity reduction of the inhibited miRNAs with high sensitivity and specificity. Furthermore, we show that our inference is robust with respect to false positives of target prediction.
We propose a Probe-Treatment-Reference (PTR) model to streamline normalization and summarization by allowing multiple references. We estimate parameters in the model by the Least Absolute Deviations (LAD) approach and implement the computation by median polishing. We show that the LAD estimator is robust in the sense that it has bounded influence in the three-factor PTR model. This model fitting, implicitly, defines an "optimal reference" for each probe-set. We evaluate the effectiveness of the PTR method by two Affymetrix spike-in data sets. Our method reduces the variations of non-differentially expressed genes and thereby increases the detection power of differentially expressed genes.
We propose a novel method, referred to as BASE (binding association with sorted expression), to infer TF activity changes from microarray expression profiles with the help of binding affinity data. It searches the maximum association between bind affinity profile of a TF and expression change profile along the direction of sorted differentiation. The method does not make hard target gene selection, rather, the significances of TF activity changes are evaluated by permutation tests of binding association at the end. To show the effectiveness of this method, we apply it to three typical examples using different kinds of binding affinity data, namely, ChIP-chip data, motif discovery data, and positional weighted matrix scanning data, respectively. The implications obtained from all three examples are consistent with established biological results. Moreover, the inferences suggest new and biological meaningful hypotheses for further investigation.
地址:No.55, Zhongguancun East Road, Academy of Mathematics and Systems Science, Chinese Academy of Sciences
电话:010-8254158