好文档就是一把金锄头!
欢迎来到金锄头文库![会员中心]
电子文档交易市场
安卓APP | ios版本
电子文档交易市场
安卓APP | ios版本

6-Re-sequencing(华大基因高通量测序内部培训资料).pdf

41页
  • 卖家[上传人]:ali****an
  • 文档编号:119112337
  • 上传时间:2020-01-06
  • 文档格式:PDF
  • 文档大小:1.42MB
  • / 41 举报 版权申诉 马上下载
  • 文本预览
  • 下载提示
  • 常见问题
    • Genetic variation detection for Solexa Re-sequencing Outline  Why Re-sequencing?  When Re-sequencing?  What type of Re-sequencing?  Flowchart of Re-sequencing data analysis  Methodology of Re-sequencing based genetic variation detection 1 Why Re-sequencing? Re-sequencing including DNA and RNA sequencing, it aims to get the sequences of a region which is already available. DNA re-sequencing is fundamental for genetic variation identification while RNA sequencing can discover new alternative splicing and digital expression profile analysis. Both are essential tools for biological issue study. • Clinical research, bring science to medicine • Population genetic studies • Association study • Evolutionary ……. 2 When Re-sequencing? • Figure out genetic variation among different individuals • Figure out genetic variation between different strands/species which share recent common ancestor -- Closely evolution relationship between different species (99% identity) • Figure out genetic variation between populations • Survey the diversity of a novo genome with a closely species which already had a genome sequence. • Survey the heterozygosity for a de novo assembly 3 Type of Re-sequencing? – Whole genome Re-sequencing – Target region Re-sequencing • PCR based Re-sequencing • Exon capture Re-sequencing – Expression profile analysis – Epigenetics …. 3.1 Whole genome re-sequencing Genome-wide variation detection (SNP/Short InDels/Structure variation) Chromosome ploidy study (DownSyndrome) 3.2 Target region re-sequencing PCR based re-sequencing and Exome re-sequencing Only sequence target region of interest, usually design based on previous study. cost effective but wet lab intensive especially PCR based re-sequencing More biases than whole genome re-sequencing Coverage and depth distribution of Exon capture 4 Flowchart of Re-sequencing data analysis Three steps involved in most re-sequencing project.  Alignment  Variation detection including SNP/Short InsDel/Structure variation  Variation based annotation and biological issue study 4 Methodology of Re-sequencing based genetic variation detection Genetic variation detection is key step for re-sequencing, BGI is contributing to this field. SOAP(Short Oligonucleotide Analysis Package) provide a solution for variation detection(for more information, please refer to •SOAPaligner: Alignment of short reads (free download) •SOAPsnp: Consensus calling scoring matrix; dynamic programming and trace-back •Next-Gen aligner: Indexing & Bitwise operation Does blastall/blat still work? SOAP: developed by BGI () MAQ: developed by Sanger ( ELAND:developed by Illumina Bowtie: developed by University of Maryland (http://bowtie- RMAP: developed by Cold Spring Harbor Laboratory (http://rulai.cshl.edu/rmap/) SHRiMP: developed by University of Toronto (http://compbio.cs.toronto.edu/shrimp/) Myrialign:GNU project, written in Python (http://savannah.nongnu.org/projects/myrialign) …… 4.1.2 What aligners are available for short reads? 4.1.3 Algorithm of SOAPaligner Indexing Split read into parts, which used to anchor the exact matching region in the reference, excluding much of the unwanted region  2way-BWT (Burrows-Wheeler transform) provide a excellent solution for the computing complexity Memory effective (~7G memory need for 3G genome) Fast indexing (2 minutes to finish 1M 35bp single end alignment)  Thread Parallel Computing Make fully use of process and save time  Bitwise operation Encode each base into 2 binary bits, and use exclusive-or to check if two bases are the same  Bitwise operation Memory effective: Both reference and reads are encoded into bits, each base occupy two bits, one byte contain 4 bases. Fast: Bit is the basic unit in computer, bitwise operation is faster the string operation Two base compare become exclusive-or operation A CTG 00 011011 A (00)C (01)T (10)G (11) A (00)00011011 C (01)01001110 T (10)10110001 G (11)11100100 Xor01 001 110 4.1.3 Algorithm of SOAPaligner  2way-BWT (Burrows-Wheeler transform) •Memory effective: Same string are compressed, 3G human reference could be stored in 1.3G •Fast Indexing: Apply suffix array to index the BWT compressed reference BWT Reference Suffix Array Common Prefix HASH Reference 2 BWT Reference building 4.1.3 Algorithm of SOAPaligner Single-nucleotide polymorphism(SNP) is the most common genetic variation among individuals. Next-generation sequencing technology provide a cost-effective tool for SNP detection. SOAPsnp was developed for consensus calling and SNP detection based on the Solexa sequencing technology. SOAPsnp takes Bayes’s theorem as statistic model for SNP calling, it considers: Sequencing quality Likelihood calculation based on observed data Experiment factors Prior probability Alignment uniqueness and accuracy Using dbSNP as prior probability 4.2 SNP identification ATGACGGTATGCT ACGAGAT ACGAGAT ACGAGAT ACGAGAT ACGAGAT ACGGGAT ACGAGAT Original QualityCalibra。

      点击阅读更多内容
      关于金锄头网 - 版权申诉 - 免责声明 - 诚邀英才 - 联系我们
      手机版 | 川公网安备 51140202000112号 | 经营许可证(蜀ICP备13022795号)
      ©2008-2016 by Sichuan Goldhoe Inc. All Rights Reserved.