二代测序实验与测序原理
二代测序的建库与测序原理,何有裕yyhesibs.ac.cnyyhebiosino.com.cn上海生物信息技术研究中心上海众信生物技术有限公司苏州众信生物技术有限公司,内容,样本处理与测序原理简介罗氏454Illumina solexa原始数据质量控制,TruSeq RNA and DNA Sample Preparation,Cluster Generation Overview, 1000-6000 molecules per cluster,OH,Cluster Generation, Template Hybridization,diol,diol,1st cycle denaturation,Cluster Generation, Bridge PCR,Template preparation-bridge RCR,Adaptor ligation,Surface attachment,Bridge amplification,Denaturation,Trends in Genet 24:133(2008),First base incorporated,Cycle 1: Add sequencing reagents,Detect Signal,Cleave Terminator and Dye,Cycle 2-n: Add sequencing reagentsand repeat,Sequencing by Synthesis Overview,Cyclic reversible termination,All four labeled reversible terminators are added per cycleRemove unincorporated bases and detect signalRemove the terminating group and the fluorescent dye,Trends in Genet 24:133(2008),Terminating group,Fluorophore cleavage,Nat Rev Genet 11:31(2010),Base calling,Flowcell layout on GAII,A flow cell contains 8 lanes,Lane 1,Lane 2,Lane 8,.,Column 1Column 2,Each lane contains 2 columns,Each column contains 60 tiles,Each tile is imaged 4 times per cycle,Primary Data Analysis By Firecrest and Bustard in RTA/OLB,tiff image file,Intensity file,Firecrest,Bustard,Sequence file,OH,diol,diol,OH,Cluster Generation, Sequencing Primer Hybridization(Single测序方式处理步骤),Sequence multiple samples in the same lanes,DNA insert,Read 1,Index Read,Read 2,DNA insert,Index,Index SP,Rd2 SP,Rd1 SP,Multiplexing multiple samples in the same lanes,Pair-end 测序优势,Mate-pair 建库和测序,Molecular Ecology Resources (2011),Template preparation- emulsion PCR,Trends in Genet 24:133(2008),Pyrosequencing,Single dNTP type flows per cycleInorganic pyrophosphate (PPi) drives visible light through a series of reactionsRemove unincorporated nucleotide,Trends in Genet 24:133(2008),Base calling,Homopolymer error,GV6330,20,灵活的多样本标签技术,454、solexa测序模式,Detect H+ released as a voltage changefast Common microchip design standardslow-cost manufacturingSequencing volume is increasing,Semiconductor sequencing,Fasta序列格式,Fastq 文件用4行记录一条序列,第一行以字符开头,跟在后面的是序列标识和描述 第二行是序列字符 第三行以+字符开头,后面可以为空,或者和第一行一样 第四行是第二行序列质量数据的编码,长度需和第二行一样,HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGGCGACAATTTTTTTTGATATTAATAAAGATAGAACTTTCTTCCTATGAGTTTTCTCTC+CCCFFDFFHHHHGJJGHIIJGIIJJJJIIJJHJJJJJIJJIIIGIIIJGGIHJDIJIGAHEHFFGHGHE,Example:,Illumina sequence identifiers,HWI-EAS364_0004:4:1:995:9044#0/1,Casava 1.8以前的序列标识,Illumina sequence identifiers,HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG,Casava 1.8的序列标识,序列质量,附:Solexa 1.3以前的quality计算公式是:,SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS. .XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII.JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ. LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL. !"#$%?ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqr| | | | |33 59 64 73 1040.26.31.40 -5.0.9.40 0.9.40 3.9.40 0.26.31.41 S - Sanger Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33, raw reads typically (0, 41),Q值对应ASCII码,454原始数据图片、sff格式、fasta格式(qual),>HSAPGDX01D1KDA length=181 xy=1540_3788 region=1 run=R_2012_08_01_00_39_39ACGTGTTCTGAGCCATATTGCGGTACTGGAAGGTGCGCCTGCACTGTCTGAGCACTGGTCACTGCTCGATACCAATGAAGCCTTATTTGATGAGGCGCGCACCACGCAGGCGGCGACTATTATCTTCTCGTTTGATCCAGAATAACCAAATCGAAAACGCTGGCAAGGCACACAGGGGATA>HSAPGDX01D1KDA length=181 xy=1540_3788 region=1 run=R_2012_08_01_00_39_3940 40 40 40 40 40 40 39 37 38 36 34 24 23 19 19 19 24 20 19 18 18 26 26 18 18 19 18 20 20 20 25 25 26 19 20 20 22 22 22 25 28 26 24 22 22 22 25 24 28 28 28 29 29 28 30 30 30 26 2626 27 27 27 31 31 30 28 28 28 30 30 30 30 26 21 21 20 20 26 27 28 24 25 20 20 20 20 19 19 19 27 28 28 30 30 31 30 28 28 30 31 31 32 32 31 31 30 30 30 31 27 24 24 22 20 20 20 22 2626 22 22 23 16 16 16 19 22 16 13 13 13 16 22 23 23 23 26 26 24 24 26 13 13 11 11 12 12 19 22 18 18 11 11 13 13 18 24 24 24 24 26 26 26 27 29 29 31 33 32 31 31 27 27 27 29 29 28 2622,