好文档就是一把金锄头!
欢迎来到金锄头文库![会员中心]
电子文档交易市场
安卓APP | ios版本
电子文档交易市场
安卓APP | ios版本

大数据技术交流.ppt

39页
  • 卖家[上传人]:工****
  • 文档编号:601278147
  • 上传时间:2025-05-16
  • 文档格式:PPT
  • 文档大小:34.80MB
  • / 39 举报 版权申诉 马上下载
  • 文本预览
  • 下载提示
  • 常见问题
    • Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,*,Click to edit Master text styles,Second level,Third level,2011 IBM Corporation,*,Click to edit Master title style,Information Management,大数据平台技术交流,吴敏达,资深技术顾问,2,从各种各样类型的巨大数据中,快速获得有价值信息的能力,就,是大数据,技术,什么是大数据,Variety:,管理复杂的多角度关系和非关系类型的数据 (你是否忽略利用的非结构化数据进行决策吗),Velocity:,流数据或者大量数据的移动 (你是否希望通过实时操作提供更好的结果),Volume:,数据量从,TB,级到,ZB,级 (你是否收集了所有数据,并在使用它吗),Veracity,:,1/3,的领导在做业务决策时候不相信获得的信息,大数据参考架构,超越传统的数据仓库概念,流计算,Internet,级别,传统,数据仓库,In-Motion Analytics,Data Analytics,Data Operations&Model Building,Results,Internet Scale,Database&,Warehouse,At-Rest Data Analytics,Results,Ultra Low Latency Results,InfoSphere BigInsights,传统,/,关系型,数据源,非传统,/,非关系型,数据源,传统,/,关系型,数据源,非传统,/,非关系型,数据源,Cloud|Mobile|Security,IBM,大数据平台和应用框架,通过可视化的方法,采集、抽取、以及探查数据,应用加速器,加速应用开发,快速实现分析价值,BI/Reporting,BI/Reporting,Exploration/Visualization,FunctionalApp,IndustryApp,Predictive Analytics,Content Analytics,Analytic,Applications,(分析应用),IBM Big Data Platform,(大,数据平台),Systems Management,Applications&Development,Visualization&Discovery,分析流数据,以及在大数据的是谁数据洞察,数据管控,(数据质量、生命周期、,),低成本地分析,PB,级结构化和非结构化数据,操作型数据或者历史数据的,基于数据仓库内嵌分析,Accelerators,(加速器),Information Integration&Governance,信息整合和管控,HadoopSystem,Stream Computing,Data Warehouse,Contextual Discovery,索引和联邦的上下文相关分析,议程,IBM hadoop,平台,BigInsights,IBM,流计算,Stream,s,IBM,数据仓库平台,pure Data,基于大数据平台的数据分析,-DataExplorer,IBM,大数据优势汇总,6,Forrester Wave,关于大数据的报告,BigInsights,企业版,连接和集成,Streams,Netezza,Text processing engine and library,JDBC,Flume,基础架构,Jaql,Hive,Pig,HBase,MapReduce,HDFS,ZooKeeper,Indexing,Lucene,Adaptive MapReduce,Oozie,Text compression,Enhanced security,Flexible scheduler,可选,IBM,产品,分析和探查,应用,DB2,BigSheets,Web Crawler,Distrib file copy,DB export,Boardreader,DB import,Ad hoc query,Machine learning,Data processing,.,管理和开发工具,管理控制台,Monitor cluster health,jobs,etc.,Add/remove nodes,Start/stop services,Inspect job status,Inspect workflow status,Deploy applications,Launch apps/jobs,Work with distrib file system,Work with spreadsheet interface,Support REST-based API,.,R,Eclipse,开发工具,Text analytics,MapReduce programming,Jaql,Hive,Pig development,BigSheets plug-in development,Oozie workflow generation,Integrated installer,Open Source,IBM,IBM,Cognos BI,Big SQL,Accelerator for machine data analysis,Accelerator for social data analysis,Guardium,DataStage,Data Explorer,Sqoop,HCatalog,GPFS FPO,BigInsights,优势列表,High Performance&Availability,GPFS-FPO,At least 2X faster than open source Hadoop,17x throughput speedup for document index lookups,Fault resistance for Real Time Data,POSIX,Adaptive MapReduce,SQL,Interface(BigSQL),Integrated Install&Mgt Consoles,Security LDAP+,High speed LZO Compression,Development Tooling,environment,testing,and,optimization,Warehouse RDBMS&Streams Integration,SystemT Text Analytics,Blazing Fast,Uses Unstructured data does not require structuring,(MapReduce),Customized Annotators,BigSheets,Insight Engine for analytics on Massive amounts of data in BigInsights.,Power of Map/Reduce within reach of the Business professional with a familiar Spreadsheet-like environment.,Built in Visualizations,SystemML Machine Leaning(Watson),Directly implemented ML algorithms on MapReduce,Deep Statistical/Mining embedded into BigInsights Platform,BigIndex,Distributed indexing and search,Parallel indexing and search,企业级别基础设施,企业级别分析能力,GPFS-FPO,与,HDFS,各项指标对比,BigInsights GPFS-FPO,开源,HDFS,或其他方案,健壮性,无单点故障,99.99%,NameNode,存在,单点故障,数据一致性,高,数据,可能,会丢失,可扩展性,数千节点,,实测,4000+,数千节点,POSIX,兼容,完全,兼容,有限,数据管理能力,安全、备份、快照、缓存、复制,有限,传统应用性能,好,兼顾读写性能,随机读写性能差,安全性,支持,ACL,容量限制,安全认证,不支持,IBM Adaptive,MapReduce,提供,强大的企业级管理,用于在可扩展的共享网格上运行分布式应用程序和大数据分析。

      它可加速数十个并行应用程序,以加快实现成果并更好地利用所有可用资源TeraSort,Throughput,SWIM,10 times fewer CPU cores,6 times faster,60 times faster,Berkley SWIM is a workload benchmark developed at University of California at Berkley.,Measure core scheduling efficiency of MapReduce workloads at Hadoop World 2011,Multi-tenant resource management,10 x Less hardware for the fastest TeraSort score.,Big SQL:,让,Hadoop,原生支持,SQL,原生,SQL,支持,BigInsights,ANSI SQL 92+,Standard syntax support(joins,data types,),真正的,JDBC/ODBC,Prepared statements,Cancel support,Database metadata API support,Secure socket connections(SSL),优化,Leveraging MapReduce parallelismor,Direct access for low-latency queries,多种数据源,HBase(including secondary indexes),CSV,Delimited files,Sequence files,JSON,Hive tables,Big SQL Engine,BigInsights,Data Sources,SQL,Hive Tables,HBase tables,CSV Files,Application,JDBC/ODBC Server,JDBC/ODBC Driver,使用报表工具,Cognos BI server,可以下推计算到,BigInsights,更快响应时间,没有,Hive,的限制,Application,(Map-Reduce),Storage,(HBase,HDFS),InfoSphere BigInsights,Cognos BI Server,Explore&Analyze,Report&Act,SQL Interface,via,JDBC,可以使用已有的工具,:SQuirreL SQL,Using existing SQL tooling against BigData,Support for“standard”authentication!,(not supported for Hive,but supported by Big SQL!),13,可以使用已有的工具,:Eclipse,Using existing SQL tooling against BigData,Same setup as for existing SQL so。

      点击阅读更多内容
      关于金锄头网 - 版权申诉 - 免责声明 - 诚邀英才 - 联系我们
      手机版 | 川公网安备 51140202000112号 | 经营许可证(蜀ICP备13022795号)
      ©2008-2016 by Sichuan Goldhoe Inc. All Rights Reserved.