科研主页

Professor, Chinese Academy of Sciences, 2010-

Associate Professor, University of Southern California, 2005-2010

Assistant Professor, University of Southern California, 2002-2005

Assistant Professor, Florida State University, 1998-2002

Fellow IPAM, UCLA, 2000, fall

Visiting scientist, Walter and Eliza Hall Institute, 2000, summer

Statistical Consultant, University of California, Berkeley, 1997, summer

Research Associate, Institute of Systems Sciences, Chinese Academy of Science, 1991-1993

研究方向

: Computational Biology

DNA sequencing; DNA Assembly; Functional Genomics; Preprocssing of microarrays: Sub-sub normalization and Probe-treatment-referecne summarization; Inference of Cis- and Trans- regulation (BASE2.0)

: Biology and Medicine

Aging and longevity mechanisms -- yeast model; Diabetes; Cancer Genomics

: Statistics

Blind inversion problem and principles --- Blind inversion needs distribution (BIND) Parametric deconvolution; Algorithm of exact least trimmed squares

: Computation and Information Theory

Empirical version of Shannon's first compression theorem; Mutual innformation between past and future; Reflectrum identity

学术论文

Li L M, Liu X, Wang L, et al. A Novel Dual Eigen-Analysis of Mouse Multi-Tissues’ Expression Profiles Unveils New Perspectives into Type 2 Diabetes[J]. Scientific Reports, 2017, 7(1): 5044.
Type 2 diabetes (T2D) is a complex and polygenic disease yet in need of a complete picture of its development mechanisms. To better understand the mechanisms, we examined gene expression profiles of multi-tissues from outbred mice fed with a high-fat diet (HFD) or regular chow at weeks 1, 9, and 18. To analyze such complex data, we proposed a novel dual eigen-analysis, in which the sample- and gene-eigenvectors correspond respectively to the macro- and micro-biology information. The dual eigen-analysis identified the HFD eigenvectors as well as the endogenous eigenvectors for each tissue. The results imply that HFD influences the hepatic function or the pancreatic development as an exogenous factor, while in adipose HFD’s impact roughly coincides with the endogenous eigenvector driven by aging. The enrichment analysis of the eigenvectors revealed diverse HFD impact on the three tissues over time. The diversity includes: inflammation, degradation of branched chain amino acids (BCAA), and regulation of peroxisome proliferator activated receptor gamma (PPARγ). We reported that in the pancreas remarkable up-regulation of angiogenesis as downstream of the HIF signaling pathway precedes hyperinsulinemia. The dual eigen-analysis and discoveries provide new evaluations/guidance in T2D prevention and therapy, and will also promote new thinking in biology and medicine.
Wang, Bo, Lin Wan, Anqi Wang, and Lei M. Li. "An adaptive decorrelation method removes Illumina DNA base-calling errors caused by crosstalk between adjacent clusters." Scientific Reports 7 (2017): 41348.
Base-calling accuracy is crucial for high-throughput DNA sequencing and downstream analysis such as read mapping and genome assembly. Accordingly, we made an endeavor to reduce DNA sequencing errors of Illumina systems by correcting three kinds of crosstalk in the cluster intensity data. We discovered that signal crosstalk between adjacent clusters accounts for a large portion of sequencing errors in Illumina systems, even after correcting color crosstalk caused by the overlap of dye emission spectra and phasing/pre-phasing caused by out-of-step nucleotide synthesis. Interestingly and importantly, spatial crosstalk between adjacent clusters is cluster-specific and often asymmetric, which cannot be corrected by existing deconvolution methods. Therefore, we introduce a novel mathematical method able to estimate and remove spatial crosstalk, thereby reducing base-calling errors by 44–69% at a given mapping rate from Illumina systems. Furthermore, the resolution gained from this work provides new room for higher throughput of DNA sequencing and of general measurement systems using fluorescence-based imaging technology. The resulting base-caller 3Dec is available for academic users at http://github.com/flishwnag/3dec. Not only does it reduce 62.1% errors compared to the standard pipeline, but also its implementation is fast enough for daily sequencing.
Zhang, Sheng, Bo Wang, Lin Wan, and Lei M. Li. "Estimating Phred scores of Illumina base calls by logistic regression and sparse modeling." BMC bioinformatics 18, no. 1 (2017): 335.
In this study, we used logistic regression models to evaluate quality scores from predictive features, which include different aspects of the sequencing signals as well as local DNA contents. Sparse models were further obtained by three methods: the backward deletion with either AIC or BIC and the L 1 regularization learning method. The L 1-regularized one was then compared with the Illumina scoring method.
Lin Wang, Weipeng Cao, Qizhai Li, Yuqing Qiu, Xingjie Liang, Baoyun Sun, Yuliang Zhao, Jie Meng, Lei M. Li, Induction of apoptosis through ER stress and TP53 in MCF-7 cells by the nanoparticle [Gd@C82(OH)22]n: A systems biology study, Methods, 67(3):394-406. doi:10.1016/j.ymeth.2014.01.007. Epub 2014 Jan 15.
We applied sub–sub normalization with one-knot SPLINE to Agilent two-color microarray. We made gene set enrichment analysis and transcriptional inferences. • We proposed the BASE2.0 method to separately infer a TF’s up- and down-regulation. • [Gd@C82(OH)22]n induces apoptosis through ER stress and TP53 in the MCF-7 cells. • [Gd@C82(OH)22]n modifies a network around TP53 including HOXA5, PLZF, and FOXO3.
Shijian Chen, Anqi Wang, Lei M. Li (2013)，SEME: A fast mapper of Illumina sequencing reads with statistical evaluation. Journal of Computational Biology, 847–860, DOI: 10.1089/cmb.2013.0111
We propose a fast-mapping approach, referred to as "SEME," which has two core steps: First it scans a read sequentially in a specific order for a k-mer exact match seed; next it extends the alignment on both sides allowing, at most, one short INDEL each using a novel method called “auto-match function.”
Isaac Kremsky, Todd E. Morgan, Xiaogang Hou, Lei Li, Caleb E. Finch，Age-changes in gene expression in primary mixed glia cultures from young vs. old rat cerebral cortex are modified by interactions with neurons，Brain, Behavior, and Immunity，2012 Jul;26(5): 797-802.
Astrocytic GFAP expression increases during normal aging in many brain regions and in primary astrocyte cultures derived from aging rodent brains. As shown below, we unexpectedly found that the age-related increase of GFAP expression was suppressed in mixed glia (astrocytes + microglia). However, the age-related increase of GFAP was observed when E18 neurons were co-cultured with mixed glia. Thus, the presence of microglia can suppress the age-related increase of GFAP, in primary cultures of astrocytes. To more broadly characterize how aging and co-culture with neurons alters glial gene expression, we profiled gene expression in mixed glia from young (3 mo) and old (24 mo) male rat cerebral cortex by Affymetrix microarray (Rat230 2.0). The majority of age changes were independent of the presence of neurons. Overall, the expression of 2-fold more genes increased with age than decreased with age. The minority of age changes that were either suppressed or revealed by the presence of neurons may be useful to analyze glial-neuron interaction during aging. Some in vitro changes are shared with those of aging rat hippocampus in studies from the Landfield group (Rowe et al., 2007; Kadish et al., 2009).
Jong Hyun Kim, Woo-Cheol Kim, Lei M. Li and Sanghyun Park (2011), HapEdit: an accuracy assessment viewer for haplotype assembly using massively parallel DNA-sequencing technologies Nucl. Acids Res. (2011) first published online May 16, 2011 doi:10.1093/nar/gkr354
The massively parallel sequencing technologies have recently flourished and dramatically cut the cost to sequence personal human genomes. Haplotype assembly from personal genomes sequenced using the massively parallel sequencing technologies is becoming a cost-effective and promising tool for human disease study. Computational assembly of haplotypes has been proved to be very accurate, but obviously contains errors. Here we present a tool, HapEdit, to assess the accuracy of assembled haplotypes and edit them manually. Using this tool, a user can break erroneous haplotype segments into smaller segments, or concatenate haplotype segments if the concatenated haplotype segments are sufficiently supported. A user can also edit bases with low-quality scores. HapEdit displays haplotype assemblies so that a user can easily navigate and pinpoint a region of interest. As inputs, HapEdit currently takes reads from the Polonator, Illumina, SOLiD, 454 and Sanger sequencing technologies.
Ge, Huanying; Wei, Min; Fabrizio, Paola; Hu, Jia; Cheng, Chao; Longo, Valter; Li, Lei M (2010) Comparative analyses of time-course gene expression profiles of the long-lived sch9Δmutant，Nucleic Acids Research, doi:10.1093/nar/gkp849.
In an attempt to elucidate the underlying longevity-promoting mechanisms of mutants lacking SCH9, which live three times as long as wild type chronologically, we measured their time-course gene expression profiles. We interpreted their expression time differences by statistical inferences based on prior biological knowledge, and identified the following significant changes: (i) between 12 and 24 h, stress response genes were up-regulated by larger fold changes and ribosomal RNA (rRNA) processing genes were down-regulated more dramatically; (ii) mitochondrial ribosomal protein genes were not up-regulated between 12 and 60 h as wild type were; (iii) electron transport, oxidative phosphorylation and TCA genes were down-regulated early; (iv) the up-regulation of TCA and electron transport was accompanied by deep down-regulation of rRNA processing over time; and (v) rRNA processing genes were more volatile over time, and three associated cis-regulatory elements [rRNA processing element (rRPE), polymerase A and C (PAC) and glucose response element (GRE)] were identified. Deletion of AZF1, which encodes the transcriptional factor that binds to the GRE element, reversed the lifespan extension of sch9Δ. The significant alterations in these time-dependent expression profiles imply that the lack of SCH9 turns on the longevity programme that extends the lifespan through changes in metabolic pathways and protection mechanisms, particularly, the regulation of aerobic respiration and rRNA processing.
Kim JH, Kim W-C, Waterman MS, Park S, and Li LM (2009) HAPLOWSER: a whole-genome haplotype browser for personal genome and metagenome, Bioinformatics, btp399 [pii] 10.1093/bioinformatics/btp399
Haplotype assembly is becoming a very important tool in genome sequencing of human and other organisms. Although haplotypes were previously inferred from genome assemblies, there has never been a comparative haplotype browser that depicts a global picture of whole-genome alignments among haplotypes of different organisms. We introduce a whole-genome HAPLotype brOWSER (HAPLOWSER), providing evolutionary perspectives from multiple aligned haplotypes and functional annotations. Haplowser enables the comparison of haplotypes from metagenomes, and associates conserved regions or the bases at the conserved regions with functional annotations and custom tracks. The associations are quantified for further analysis and presented as pie charts. Functional annotations and custom tracks that are projected onto haplotypes are saved as multiple files in FASTA format. Haplowser provides a user-friendly interface, and can display alignments of haplotypes with functional annotations at any resolution.
Wei M, Fabrizio P, Madia F, Hu J, Ge H, Li LM, Longo VD. (2009) Tor1/Sch9-regulated carbon source substitution is as effective as calorie restriction in life span extension, PLoS Genet. 5(5):e1000467. Epub 2009 May 8.
The effect of calorie restriction (CR) on life span extension, demonstrated in organisms ranging from yeast to mice, may involve the down-regulation of pathways, including Tor, Akt, and Ras. Here, we present data suggesting that yeast Tor1 and Sch9 (a homolog of the mammalian kinases Akt and S6K) is a central component of a network that controls a common set of genes implicated in a metabolic switch from the TCA cycle and respiration to glycolysis and glycerol biosynthesis. During chronological survival, mutants lacking SCH9 depleted extracellular ethanol and reduced stored lipids, but synthesized and released glycerol. Deletion of the glycerol biosynthesis genes GPD1, GPD2, or RHR2, among the most up-regulated in long-lived sch9Δ, tor1Δ, and ras2Δ mutants, was sufficient to reverse chronological life span extension in sch9Δ mutants, suggesting that glycerol production, in addition to the regulation of stress resistance systems, optimizes life span extension. Glycerol, unlike glucose or ethanol, did not adversely affect the life span extension induced by calorie restriction or starvation, suggesting that carbon source substitution may represent an alternative to calorie restriction as a strategy to delay aging.
Cheng C, Li LM, Alves P, Gerstein M. (2009) Systematic identification of transcription factors associated with patient survival in cancers. BMC Genomics.10:225.
In this paper, we propose a computational approach that integrates microarray expression data with the transcription factor binding site information to systematically identify transcription factors associated with patient survival given a specific cancer type. This approach was applied to two gene expression data sets for breast cancer and acute myeloid leukemia. We found that two transcription factor families, the steroid nuclear receptor family and the ATF/CREB family, are significantly correlated with the survival of patients with breast cancer; and that a transcription factor named T-cell acute lymphocytic leukemia 1 is significantly correlated with acute myeloid leukemia patient survival.
Wei M, Fabrizio P, Jia H, Ge H, Cheng C, Li LM, Longo VD. (2008) Life Span Extension by Calorie Restriction Depends on Rim15 and Transcription Factors Downstream of Ras/PKA, Tor, and Sch9. PLoS Genet. 4(1): e13. doi:10.1371/journal.pgen.0040013
Calorie restriction (CR), the only non-genetic intervention known to slow aging and extend life span in organisms ranging from yeast to mice, has been linked to the down-regulation of Tor, Akt, and Ras signaling. In this study, we demonstrate that the serine/threonine kinase Rim15 is required for yeast chronological life span extension caused by deficiencies in Ras2, Tor1, and Sch9, and by calorie restriction. Deletion of stress resistance transcription factors Gis1 and Msn2/4, which are positively regulated by Rim15, also caused a major although not complete reversion of the effect of calorie restriction on life span. The deletion of both RAS2 and the Akt and S6 kinase homolog SCH9 in combination with calorie restriction caused a remarkable 10-fold life span extension, which, surprisingly, was only partially reversed by the lack of Rim15. These results indicate that the Ras/cAMP/PKA/Rim15/Msn2/4 and the Tor/Sch9/Rim15/Gis1 pathways are major mediators of the calorie restriction-dependent stress resistance and life span extension, although additional mediators are involved. Notably, the anti-aging effect caused by the inactivation of both pathways is much more potent than that caused by CR.
Chao Cheng, Lei M. Li, Inferring MicroRNA Activities by Combining Gene Expression with MicroRNA Target Prediction
We propose a method to infer the effective regulatory activities of miRNAs by integrating microarray expression data with miRNA target predictions. The method is based on the idea that regulatory activity changes of miRNAs could be reflected by the expression changes of their target transcripts measured by microarray. To validate this method, we apply it to the microarray data sets that measure gene expression changes in cell lines after transfection or inhibition of several specific miRNAs. The results indicate that our method can detect activity enhancement of the transfected miRNAs as well as activity reduction of the inhibited miRNAs with high sensitivity and specificity. Furthermore, we show that our inference is robust with respect to false positives of target prediction.
Ge H, Cheng C and Li LM (2008), A Probe-Treatment-Reference (PTR) Model for the Analysis of Oligonucleotide Expression Microarrays, BMC Bioinformatics, 9: 194.
We propose a Probe-Treatment-Reference (PTR) model to streamline normalization and summarization by allowing multiple references. We estimate parameters in the model by the Least Absolute Deviations (LAD) approach and implement the computation by median polishing. We show that the LAD estimator is robust in the sense that it has bounded influence in the three-factor PTR model. This model fitting, implicitly, defines an "optimal reference" for each probe-set. We evaluate the effectiveness of the PTR method by two Affymetrix spike-in data sets. Our method reduces the variations of non-differentially expressed genes and thereby increases the detection power of differentially expressed genes.
Chao Cheng, Xiting Yan, Fengzhu Sun and Lei M Li, (2007), Inferring activity changes of transcription factors by binding association with sorted expression profiles, BMC Bioinformatics, 8:452.
We propose a novel method, referred to as BASE (binding association with sorted expression), to infer TF activity changes from microarray expression profiles with the help of binding affinity data. It searches the maximum association between bind affinity profile of a TF and expression change profile along the direction of sorted differentiation. The method does not make hard target gene selection, rather, the significances of TF activity changes are evaluated by permutation tests of binding association at the end. To show the effectiveness of this method, we apply it to three typical examples using different kinds of binding affinity data, namely, ChIP-chip data, motif discovery data, and positional weighted matrix scanning data, respectively. The implications obtained from all three examples are consistent with established biological results. Moreover, the inferences suggest new and biological meaningful hypotheses for further investigation.