好文档就是一把金锄头!
欢迎来到金锄头文库![会员中心]
电子文档交易市场
安卓APP | ios版本
电子文档交易市场
安卓APP | ios版本

EST-Course20040318.ppt

32页
  • 卖家[上传人]:cl****1
  • 文档编号:587227164
  • 上传时间:2024-09-05
  • 文档格式:PPT
  • 文档大小:1.87MB
  • / 32 举报 版权申诉 马上下载
  • 文本预览
  • 下载提示
  • 常见问题
    • EST李瑞强Beijing Genomics Institute (BGI)2004-03-18---- Exploring the transcriptome 1. What is EST2. Why EST sequencing3. Processing of ESTs4. Usage of ESTs What is EST ?1. Take a cell or tissue of interest;2. Isolation of mRNAs from tissue(s);3. reverse transcribe into cDNA, reflecting parts of the RNAs;4. Cloning of cDNAs into a vector (often random orientation);5. End sequencing of the clones.EST – Expressed sequence tags (表达基因标签表达基因标签) An overview of the process of protein synthesisImage adopted by http://ncbi.nlm.nih.gov/About/primer/est.html An overview of how ESTs are generated.Image adopted from ncbi.nlm.nih.gov/About/primer/est.html Cell or tissueIsolate mRNA andReverse transcribe intocDNAClone cDNA fragments into vectors toMake a cDNA library5’3’ESTPick a cloneAnd sequence the 5’ and 3’Ends of cDNA insertVectors •Systematic sampling of the transcribed portion of the genome (“transcriptome”)•Provides experimental evidence for the positions of exons•Provides regions coding for potentially new proteins•Provides clones for DNA microarraysWhy EST sequencing ? Characteristics of ESTs•400~600 bp•only fragments of genes not complete coding sequences•Highly redundant•Low sequence quality•(Cheap)•Reflect expressed genes•May be tissue/stage specific Processing…1.Trim off low quality sequences; phred, Q202.Screen vector and bacterial contaminant sequences; cross_match, vectors and contaminants as library3.Remove mtRNAs, rRNAs; compare to mtRNAs and rRNAs using cross_match, blastn…4.Mask transposons;repeatmasker 5.Ignore sequences <100bp;6.Clustering - associate individual EST sequences with unique transcripts or genes;D2_cluster, sequences similarity7.Assembly - derive consensus sequences from overlapping ESTs belonging to the same cluster.Phrap, cap3 Functional annotation:Ø InterproØ GOØ KEGG GO: KEGG: Pipelines:Ø UniGeneØ HGI (Human Gene Index)Ø TIGR AssemblerØ STACK (Sequence Tag Alignment and Consensus Knowledgebase)Ø CAT TIGR_ASSEMBLER•THC_BUILD: BLAST-FASTA id all overlaps and are stored.•Tigr-assembler then uses rapid oligo nucleotide comparison and assembles non-repeat overlaps. (95% ID over 40bp)•matching constraints on sequence ends•minimum sequence id within a sequence group - more fragmented as a result•Other TIGR approaches are similar UniGene EST database: dbESTdbEST release 030504 Total: 20,151,345 public entries; 660 organisms. Homo sapiens (human) 5,487,412Mus musculus + domesticus (mouse) 4,067,826Rattus sp. (rat) 592,059Triticum aestivum (wheat) 549,926Gallus gallus (chicken) 460,385Danio rerio (zebrafish) 450,652Zea mays (maize) 393,719Xenopus laevis (African clawed frog) 368,783Bos taurus (cattle) 365,581Hordeum vulgare + subsp. vulgare (barley) 356,848Glycine max (soybean) 346,582Xenopus tropicalis 300,267Oryza sativa (rice) 283,935Drosophila melanogaster (fruit fly) 274,367Sus scrofa (pig) 272,188Caenorhabditis elegans (nematode) 231,096Arabidopsis thaliana (thale cress) 204,396 •Low sequence quality, framshift•Chimeric cDNA clones•Retained introns•Other limitationsProblems in EST sequencing Usage of ESTs:Ø Get coding region; cDNA sequences can discover many new protein coding genes.Ø Know genome coverage;Ø Help Genome annotation;Ø Compare expression patterns;Ø Detect alternative splicing;Ø Find SNPs(Single Nucleotide Polymorphisms);Ø Provide data for array. Genome annotation: Ensembl Analysis of gene expressiontissue-specificity Counting frequency of EST derived from a specific tissue within one sequence cluster Searching for cluster/contigs which are tissue specific (e.g. tumor) Searching for alternative splice variants which are potentially tissue specific Types of alternative splicing•Skipped exons•Retained introns•Alternative donor or acceptor site Alternative splicing Three subassembliesPotential alternateexpression form Detect SNPs from ESTsSNP or basecalling error Large-Scale Statistical Analyses of Rice ESTs RevealCorrelated Patterns of Gene ExpressionGenome Research 1054-9803/99In this report, we go a step further in showing that computer analyses of plant EST data can be used to generate evidence of correlated expression patterns of genes across various tissues. Furthermore, tissue types and organs can be classified with respect to one another on the basis of their global gene expression patterns. As in previous studies, expression profiles are first estimated from EST counts. By clustering gene expression profiles or whole cDNA library profiles, we show that genes with similar functions, or cDNA libraries expected to share patterns of gene expression, are grouped together. Promising uses of this technique include functional genomics, in which evidence of correlated expression might complement (or substitute for) those of sequence similarity in the annotation of anonymous genes and identification of surrogate markers. The analysis presented here combines the application of a correlation-based clustering method with a graphical color map allowing intuitive visualization of patterns within a large table of expression measurements. EST Analysis of the Cnidarian Acropora milleporaReveals Extensive Gene Loss and Rapid SequenceDivergence in the Model InvertebratesCurrent Biology, Vol. 13, 2190–2195, December 16, 2003,A significant proportion of mammalian genes are not represented in the genomes of Drosophila, Caenorhabditis or Saccharomyces, and many of these are assumed to have been vertebrate innovations. To test this assumption, we conducted a preliminary EST project on the anthozoan cnidarian, Acropora millepora, a basal metazoan. More than 10% of the Acropora ESTs with strong metazoan matches to the databases had clear human homologs but were not represented in the Drosophila or Caenorhabditis genomes; this category includes a surprising diversity of transcription factors and metabolic proteins that were previously assumed to be restricted to vertebrates. Consistent with higher rates of divergence in the model invertebrates, three-way comparisons show that most Acropora ESTs match human sequences much more strongly than they do any Drosophila or Caenorhabditis sequence. Gene loss has thus been much more extensive in the model invertebrate lineages than previously assumed and, as a consequence, some genes formerly thought to be vertebrate inventions must have been present in the common metazoan ancestor. The complexity of the Acropora genome is paradoxical, given that this organism contains apparently few tissue types and the simplest extant nervous system consisting of a morphologically homogeneous nerve net. Thanks! 。

      点击阅读更多内容
      关于金锄头网 - 版权申诉 - 免责声明 - 诚邀英才 - 联系我们
      手机版 | 川公网安备 51140202000112号 | 经营许可证(蜀ICP备13022795号)
      ©2008-2016 by Sichuan Goldhoe Inc. All Rights Reserved.