好文档就是一把金锄头!
欢迎来到金锄头文库![会员中心]
电子文档交易市场
安卓APP | ios版本
电子文档交易市场
安卓APP | ios版本

生物信息学:功能预测与注释.pdf

77页
  • 卖家[上传人]:w****i
  • 文档编号:108479310
  • 上传时间:2019-10-24
  • 文档格式:PDF
  • 文档大小:1.89MB
  • / 77 举报 版权申诉 马上下载
  • 文本预览
  • 下载提示
  • 常见问题
    • http://zhangroup.aporc.org 生物信息学 功能预测与注释 吴凌云 中国科学院数学与系统科学研究院 基因组注释 • Genome annotation • 利用生物信息学方法,对基因组各组成部 分进行识别,并对其生物学功能进行注释 • 主要内容 – 基因识别与功能注释 – 非编码基因的识别与功能注释 – 调控元件的识别与功能注释 – 影响染色体结构和动力学的序列 基因的识别与功能注释 • 基因预测 • 序列搜索 • 序列motif • 直系同源序列聚类分析(COG) • 亚细胞定位 • 结构比较 • 蛋白质组学 序列搜索 • 假设:序列相似=同源=功能相似 • 数据库 – NCBI-NT(非冗余核酸序列数据库) – NCBI-NR(非冗余蛋白质序列数据库) – InterPro(Swissprot)(蛋白质序列数据库) – KEGG – PDBseq(已知三维结构的蛋白质序列数据库) 序列motif • 查找序列上的局部特征 • 在序列同源性不明显的情况下使用 • Motif数据库构建 – 对蛋白质家族成员进行多序列比对 • 数据库 – Prosite 同源 • 直系同源(Ortholog) – 不同物种中由同一祖先进化而来的多个基因 – 功能较一致 • 旁系同源(Paralog) – 同一基因组内由于基因复制而来的多个基因 – 功能差异较大 直系同源与旁系同源 直系同源序列聚类分析 • 假设:直系同源=功能相似 • 数据库 – COGs(Clusters of Orthologous Groups of proteins) – Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. 亚细胞定位 • 假设:蛋白质的亚细胞定位与功能相关 • 通过预测亚细胞定位来预测功能 结构比较 • 假设:结构决定功能 • 预测未知基因的蛋白质结构,再通过结构 比较预测其功能 蛋白质组学 • 假设:功能相关的蛋白质可能倾向于有相 互作用 • 从蛋白质相互作用网络或者其他生物分子 网络来预测蛋白功能 Function • The word function within a biological context is an evolving concept and is used in many ways. • Function can be described at many levels, ranging from biochemical function to biological processes, all the way up to the organism level. • If only say a protein has some function, that has few meaning to biologist. Classification of Function • EC – Enzyme Commission scheme • FunCat – MIPS Functional Catalogue • GO – Gene Ontology EC • Enzyme Commission number – Based on the chemical reactions they catalyze. – Every EC number is associated with a recommended name for the respective enzyme. – Strictly speaking, EC numbers do not specify enzymes, but enzyme-catalyzed reactions. – If different enzymes (for instance from different organisms) catalyze the same reaction, then they receive the same EC number. KEGG Patyway MIPS FunCat Gene Ontology (GO) • Unify the representation of gene and gene product attributes across all species – Maintain and further develop its controlled vocabulary of gene and gene product attributes – Annotate genes and gene products, and assimilate and disseminate annotation data – Provide tools to facilitate access to all aspects of the data provided by the Gene Ontology project GO Domains • Three separate GO domains – Molecular functions – Biological processes – Cellular components • Each gene or gene product may – have more than one molecular function – take part in more than one biological process – act in more than one cellular component Structure of GO • Show the relation between different terms – One term may be a more specific description of another more general term • Directed Acyclic Graph (DAG) – Similar to hierarchy – Allow a child node to have more than one parent Example of GO Graph Relations in GO • Three relations – is_a (is a subtype of) – part of – Regulates, negatively regulates, positively regulates 蛋白质功能注释情况 功能预测 • 数据 – 序列 – 整体结构 – 局部结构 – 蛋白相互作用 • 方法 Sequence, Structure, Function Function Annotation Liu et al., AA, 2008 Prediction from Domain 局部结构(Local Structure) 局部结构的表示 局部结构的定义 • Pocket – A pocket is an empty concavity on a protein surface into which solvent can gain access, i.e. these concavities have mouth openings connecting their interior with the outside bulk solution. • Void – A void is an interior unoccupied space that is not accessible to the solvent probe. It has no mouth openings to the outside bulk solution. 局部结构的探测 • Computational Geometry – Voronoi Diagram – Delaunay Triangulation – Alpha Shape Pocket Similarity Network Liu et al. PPL, 2008 Community Structure Property Small World Property Scale Free Property Hub Pockets Clustering Liu et al. IJBRA, 2008 Cluster Example Functional Association Prediction Workflow Scoring Scheme Scoring Scheme GO Specificity • GO term probability • GO depth Influence of GO Specificity Influence of Sequence Similarity • Remove the redundancy – Sequence similarity – Multiple experiments • PDBselect database is a subset of PDB that does not contain highly homologous sequences • PDFselect 25 – No two proteins have more than 25% sequence similarity Influence of Global Structure Protein Similarity Network Local vs Global Structure Functional Structure Motif 蛋白相互作用 Physical interaction 蛋白相互作用网络 无向图 节点:蛋白质 边:相互作用 遗传相互作用 Genetic Interaction two mutations have a combined effect not exhibited by either mutation alone PPI & Protein annotation Biological process Molecular function Cellular component Un-annotated protein 1 例子 概率估计 A B C DE F Protein complex interaction Protein binary interaction Microarray expression correlated P(S|M) = ? P(S|B) = ? P(S|C) = ? S: the function similarity between protein X and Y B C M 功能赋值 Protein complex interaction Protein binary interaction Microarray coexpressed A, B and C{Function i, i = 1…n} Given the known probabilities P(S|M), P(S|B) and P(S|C) How to assign a function Fi to for uncharacterized protein X ? X B AC M B C A, B and C are the all and only function-known proteins that interact with protein X Neighbor Countin。

      点击阅读更多内容
      关于金锄头网 - 版权申诉 - 免责声明 - 诚邀英才 - 联系我们
      手机版 | 川公网安备 51140202000112号 | 经营许可证(蜀ICP备13022795号)
      ©2008-2016 by Sichuan Goldhoe Inc. All Rights Reserved.