好文档就是一把金锄头!
欢迎来到金锄头文库![会员中心]
电子文档交易市场
安卓APP | ios版本
电子文档交易市场
安卓APP | ios版本

华为-云计算与云数据管理.ppt

171页
  • 卖家[上传人]:博****1
  • 文档编号:588727233
  • 上传时间:2024-09-08
  • 文档格式:PPT
  • 文档大小:3.97MB
  • / 171 举报 版权申诉 马上下载
  • 文本预览
  • 下载提示
  • 常见问题
    • 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: 云计算概述云计算概述 Google 云计算技术:云计算技术:GFS,,Bigtable 和和MapreduceYahoo云计算技术和云计算技术和Hadoop云数据管理的挑战云数据管理的挑战 喝茶,上网,在家兼职,注册送分站: 分布式系统概述分布式系统概述分布式云计算技术综述分布式云计算技术综述分布式云计算平台分布式云计算平台分布式云计算程序开发分布式云计算程序开发 喝茶,上网,在家兼职,注册送分站: 第二章:客户第二章:客户-效劳器端构架效劳器端构架 第三章:分布式对象第三章:分布式对象 第四章:公共对象请求代理结构第四章:公共对象请求代理结构 (CORBA) 喝茶,上网,在家兼职,注册送分站: 云计算综述云计算综述 5第五章:云计算入门第五章:云计算入门 第六章:云效劳第六章:云效劳 第七章:云相关技术比较第七章:云相关技术比较7.1网格计算和云计算网格计算和云计算7.2 Utility计算〔效用计算〕和云计算计算〔效用计算〕和云计算 7.3并行和分布计算和云计算并行和分布计算和云计算 7.4集群计算和云计算集群计算和云计算 喝茶,上网,在家兼职,注册送分站: 云计算平台云计算平台6第八章:Google云平台的三大技术 第九章:Yahoo云平台的技术 第十章:Aneka 云平台的技术第十一章:Greenplum云平台的技术第十二章:Amazon dynamo云平台的技术 喝茶,上网,在家兼职,注册送分站: 云计算平台开发云计算平台开发7第十三章:基于Hadoop系统开发 第十四章:基于HBase系统开发 第十五章:基于Google Apps系统开发 第十六章:基于MS Azure系统开发 第十七章:基于Amazon EC2系统开发 喝茶,上网,在家兼职,注册送分站: computing 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: we use cloud computing? 喝茶,上网,在家兼职,注册送分站: we use cloud computing?Case 1:Write a fileSaveComputer down, file is lostFiles are always stored in cloud, never lost 喝茶,上网,在家兼职,注册送分站: we use cloud computing?Case 2:Use IE --- download, install, useUse --- download, install, useUse C++ --- download, install, use……Get the serve from the cloud 喝茶,上网,在家兼职,注册送分站: is cloud and cloud computing?CloudDemand resources or services over Internetscale and reliability of a data center. 喝茶,上网,在家兼职,注册送分站: is cloud and cloud computing? Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a serve over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them. 喝茶,上网,在家兼职,注册送分站: of cloud computinglVirtual. software, databases, Web servers, operating systems, storage and networking as virtual servers. lOn demand. add and subtract processors, memory, network bandwidth, storage. 喝茶,上网,在家兼职,注册送分站: as a ServicePaaSPlatform as a ServiceSaaSSoftware as a ServiceTypes of cloud service 喝茶,上网,在家兼职,注册送分站: delivery modellNo hardware or software to managelService delivered through a browserlCustomers use the service on demandlInstant ScalabilitySaaS 喝茶,上网,在家兼职,注册送分站: current CRM package is not managing the load or you simply don’t want to host it in-house. Use a SaaS provider such as Salesforce Your email is hosted on an exchange server in your office and it is very slow. Outsource this using Hosted Exchange.SaaS 喝茶,上网,在家兼职,注册送分站: delivery modellPlatforms are built upon Infrastructure, which is expensivelEstimating demand is not a science!lPlatform management is not fun!PaaS 喝茶,上网,在家兼职,注册送分站: need to host a large file (5Mb) on your website and make it available for 35,000 users for only two months duration. Use Cloud Front from Amazon.lYou want to start storage services on your network for a large number of files and you do not have the storage capacity…use Amazon S3.PaaS 喝茶,上网,在家兼职,注册送分站: infrastructure delivery modellA platform virtualization environmentlComputing￿resources,￿such￿as￿storing￿and￿processing￿capacity. lVirtualization taken a step furtherIaaS 喝茶,上网,在家兼职,注册送分站: want to run a batch job but you don’t have the infrastructure necessary to run it in a timely manner. Use Amazon EC2. lYou want to host a website, but only for a few days. Use Flexiscale.IaaS 喝茶,上网,在家兼职,注册送分站: computing and other computing techniques 喝茶,上网,在家兼职,注册送分站: 21st Century Vision Of ComputingLeonard Kleinrock , one of the chief scientists of the original Advanced Research Projects Agency Network (ARPANET) project which seeded the Internet, said: “As of now, computer networks are still in theirinfancy, but as they grow up and become sophisticated, we will probably see the spread of ‘computer utilities’ which, like present electric and telephone utilities, will service individual homes and offices across the country.〞 喝茶,上网,在家兼职,注册送分站: 21st Century Vision Of ComputingSun Microsystemsco-founder Bill Joy He also indicated “It would take time until these markets to mature to generate this kind of value. Predicting now which companies will capture the value is impossible. Many of them have not even been created yet.〞 喝茶,上网,在家兼职,注册送分站: 21st Century Vision Of Computing 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: computing is the packaging of computing resources, such as computation and storage, as a metered service similar to a traditional public utility 喝茶,上网,在家兼职,注册送分站: computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. 喝茶,上网,在家兼职,注册送分站: computing is the application of several computers to a single problem at the same time — usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data 喝茶,上网,在家兼职,注册送分站: computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. 喝茶,上网,在家兼职,注册送分站: Computing & Cloud Computinglshare a lot commonality intention, architecture and technology lDifference programming model, business model, compute model, applications, and Virtualization. 喝茶,上网,在家兼职,注册送分站: Computing & Cloud Computinglthe problems are mostly the samelmanage large facilities;ldefine methods by which consumers discover, request and use resources provided by the central facilities; limplement the often highly parallel computations that execute on those resources. 喝茶,上网,在家兼职,注册送分站: Computing & Cloud ComputinglVirtualizationlGridldo not rely on virtualization as much as Clouds do, each individual organization maintain full control of their resources lCloudlan indispensable ingredient for almost every Cloud 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: question and any comments ? 喝茶,上网,在家兼职,注册送分站: 云计算概述云计算概述 Google 云计算技术:云计算技术:GFS,,Bigtable 和和MapreduceYahoo云计算技术和云计算技术和Hadoop云数据管理的挑战云数据管理的挑战 喝茶,上网,在家兼职,注册送分站: Cloud computing techniques 喝茶,上网,在家兼职,注册送分站: Google File System 喝茶,上网,在家兼职,注册送分站: Google File System(GFS)A scalable distributed file system for large distributed data intensive applicationsMultiple GFS clusters are currently deployed.The largest ones have:1000+ storage nodes300+ TeraBytes of disk storageheavily accessed by hundreds of clients on distinct machines 喝茶,上网,在家兼职,注册送分站: many same goals as previous distributed file systemsperformance, scalability, reliability, etcGFS design has been driven by four key observation of Google application workloads and technological environment 喝茶,上网,在家兼职,注册送分站: Observations 1l1. Component failures are the normconstant monitoring, error detection, fault tolerance and automatic recovery are integral to the systeml2. Huge files (by traditional standards)Multi GB files are commonI/O operations and blocks sizes must be revisited 喝茶,上网,在家兼职,注册送分站: Observations 2l3. Most files are mutated by appending new dataThis is the focus of performance optimization and atomicity guaranteesl4. Co-designing the applications and APIs benefits overall system by increasing flexibility 喝茶,上网,在家兼职,注册送分站: DesignCluster consists of a single master and multiple chunkservers and is accessed by multiple clients 喝茶,上网,在家兼职,注册送分站: MasterMaintains all file system metadata.names space, access control info, file to chunk mappings, chunk (including replicas) location, etc.Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state 喝茶,上网,在家兼职,注册送分站: MasterHelps make sophisticated chunk placement and replication decision, using global knowledgeFor reading and writing, client contacts Master to get chunk locations, then deals directly with chunkserversMaster is not a bottleneck for reads/writes 喝茶,上网,在家兼职,注册送分站: are broken into chunks. Each chunk has a immutable globally unique 64-bit chunk-handle.handle is assigned by the master at chunk creationChunk size is 64 MBEach chunk is replicated on 3 (default) servers 喝茶,上网,在家兼职,注册送分站: to apps using the file system API.Communicates with master and chunkservers for reading and writingMaster interactions only for metadataChunkserver interactions for dataOnly caches metadata informationData is too large to cache. 喝茶,上网,在家兼职,注册送分站: LocationsMaster does not keep a persistent record of locations of chunks and replicas.Polls chunkservers at startup, and when new chunkservers join/leave for this.Stays up to date by controlling placement of new chunks and through HeartBeat messages (when monitoring chunkservers) 喝茶,上网,在家兼职,注册送分站: LogRecord of all critical metadata changesStored on Master and replicated on other machinesDefines order of concurrent operationsAlso used to recover the file system state 喝茶,上网,在家兼职,注册送分站: Interactions: Leases and Mutation OrderLeases maintain a mutation order across all chunk replicasMaster grants a lease to a replica, called the primaryThe primary choses the serial mutation order, and all replicas follow this orderMinimizes management overhead for the Master 喝茶,上网,在家兼职,注册送分站: Record AppendClient specifies the data to write; GFS chooses and returns the offset it writes to and appends the data to each replica at least onceHeavily used by Google’s Distributed applications.No need for a distributed lock managerGFS choses the offset, not the client 喝茶,上网,在家兼职,注册送分站: Record Append: How?•Follows similar control flow as mutations•Primary tells secondary replicas to append at the same offset as the primary•If a replica append fails at any replica, it is retried by the client. So replicas of the same chunk may contain different data, including duplicates, whole or in part, of the same record 喝茶,上网,在家兼职,注册送分站: Record Append: How?•GFS does not guarantee that all replicas are bitwise identical.Only guarantees that data is written at least once in an atomic unit.Data must be written at the same offset for all chunk replicas for success to be reported. 喝茶,上网,在家兼职,注册送分站: Stale Replicas•Master has a chunk version number to distinguish up to date and stale replicas•Increase version when granting a lease•If a replica is not available, its version is not increased•master detects stale replicas when a chunkservers report chunks and versions•Remove stale replicas during garbage collection 喝茶,上网,在家兼职,注册送分站: collectionWhen a client deletes a file, master logs it like other changes and changes filename to a hidden file.Master removes files hidden for longer than 3 days when scanning file system name spacemetadata is also erasedDuring HeartBeat messages, the chunkservers send the master a subset of its chunks, and the master tells it which files have no metadata.Chunkserver removes these files on its own 喝茶,上网,在家兼职,注册送分站: Tolerance:High Availability•Fast recovery•Master and chunkservers can restart in seconds•Chunk Replication•Master Replication•“shadow〞〞 masters provide read-only access when primary master is down•mutations not done until recorded on all master replicas 喝茶,上网,在家兼职,注册送分站: Tolerance:Data IntegrityChunkservers use checksums to detect corrupt dataSince replicas are not bitwise identical, chunkservers maintain their own checksumsFor reads, chunkserver verifies checksum before sending chunkUpdate checksums during writes 喝茶,上网,在家兼职,注册送分站: to MapReduce 喝茶,上网,在家兼职,注册送分站: Insight 〞Consider the problem of counting the number of occurrences of each word in a large collection of documents〞How would you do it in parallel ? 喝茶,上网,在家兼职,注册送分站: Programming Model lInspired from map and reduce operations commonly used in functional programming languages like Lisp.lUsers implement interface of two primary methods:l1. Map: (key1, val1) → (key2, val2)l2. Reduce: (key2, [val2]) → [val3] 喝茶,上网,在家兼职,注册送分站: operation lMap, a pure function, written by the user, takes an input key/value pair and produces a set of intermediate key/value pairs. le.g. (doc—id, doc-content)lDraw an analogy to SQL, map can be visualized as group-by clause of an aggregate query. 喝茶,上网,在家兼职,注册送分站: operation lOn completion of map phase, all the intermediate values for a given output key are combined together into a list and given to a reducer.lCan be visualized as aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute. 喝茶,上网,在家兼职,注册送分站: input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); 喝茶,上网,在家兼职,注册送分站: Execution overview 喝茶,上网,在家兼职,注册送分站: Example 喝茶,上网,在家兼职,注册送分站: in Parallel: Example 喝茶,上网,在家兼职,注册送分站: Fault TolerancelHandled via re-execution of tasks.Task completion committed through master lWhat happens if Mapper fails ?lRe-execute completed + in-progress map taskslWhat happens if Reducer fails ?lRe-execute in progress reduce taskslWhat happens if Master fails ?lPotential trouble !! 喝茶,上网,在家兼职,注册送分站: Walk through of One more Application 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: : PageRankPageRank models the behavior of a “random surfer〞.C(t) is the out-degree of t, and (1-d) is a damping factor (random jump)The “random surfer〞 keeps clicking on successive links at random not taking content into consideration.Distributes its pages rank equally among all pages it links to.The dampening factor takes the surfer “getting bored〞 and typing arbitrary URL. 喝茶,上网,在家兼职,注册送分站: : Key Insights Effects at each iteration is local. i+1th iteration depends only on ith iterationAt iteration i, PageRank for individual nodes can be computed independently 喝茶,上网,在家兼职,注册送分站: using MapReduce Use Sparse matrix representation (M)Map each row of M to a list of PageRank “credit〞 to assign to out link neighbours.These prestige scores are reduced to a single PageRank value for a page by aggregating over them. 喝茶,上网,在家兼职,注册送分站: PageRank using MapReduceMap: distribute PageRank “credit” to link targetsReduce: gather up PageRank “credit” from multiple sources to compute new PageRank valueIterate untilconvergenceSource of Image: Lin 2021 喝茶,上网,在家兼职,注册送分站: 1: Process HTML Map task takes (URL, page-content) pairs and maps them to (URL, (PRinit, list-of-urls))PRinit is the “seed〞 PageRank for URLlist-of-urls contains all pages pointed to by URLReduce task is just the identity function 喝茶,上网,在家兼职,注册送分站: 2: PageRank Distribution lReduce task gets (URL, url_list) and many (URL, val) valueslSum vals and fix up with d to get new PRlEmit (URL, (new_rank, url_list))lCheck for convergence using non parallel component 喝茶,上网,在家兼职,注册送分站: Some More AppslDistributed Grep.lCount of URL Access Frequency.lClustering (K-means)lGraph Algorithms.lIndexing SystemsMapReduce Programs In Google Source Tree 喝茶,上网,在家兼职,注册送分站: Extensions and similar apps lPIG (Yahoo)lHadoop (Apache)lDryadLinq (Microsoft) 喝茶,上网,在家兼职,注册送分站: Scale Systems Architecture using MapReduce 喝茶,上网,在家兼职,注册送分站: A Distributed Storage System for Structured Data 喝茶,上网,在家兼职,注册送分站: is a distributed storage system for managing structured data.lDesigned to scale to a very large sizelPetabytes of data across thousands of serverslUsed for many Google projectslWeb indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …lFlexible, high-performance solution for all of Google’s products 喝茶,上网,在家兼职,注册送分站: of (semi-)structured data at GooglelURLs:lContents, crawl metadata, links, anchors, pagerank, …lPer-user data:lUser preference settings, recent queries/search results, …lGeographic locations:lPhysical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, …lScale is largelBillions of URLs, many versions/page (~20K/version)lHundreds of millions of users, thousands or q/secl100TB+ of satellite image data 喝茶,上网,在家兼职,注册送分站: not just use commercial DB?lScale is too large for most commercial databaseslEven if it weren’t, cost would be very highlBuilding internally means system can be applied across many projects for low incremental costlLow-level storage optimizations help performance significantlylMuch harder to do when running on top of a database layer 喝茶,上网,在家兼职,注册送分站: asynchronous processes to be continuously updating different pieces of datalWant access to most current data at any timelNeed to support:lVery high read/write rates (millions of ops per second)lEfficient scans over all or interesting subsets of datalEfficient joins of large one-to-one and one-to-many datasetslOften want to examine data changes over timelE.g. Contents of a web page over multiple crawls 喝茶,上网,在家兼职,注册送分站: multi-level maplFault-tolerant, persistentlScalablelThousands of serverslTerabytes of in-memory datalPetabyte of disk-based datalMillions of reads/writes per second, efficient scanslSelf-managinglServers can be added/removed dynamicallylServers adjust to load imbalance 喝茶,上网,在家兼职,注册送分站: BlockslBuilding blocks:lGoogle File System (GFS): Raw storagelScheduler: schedules jobs onto machineslLock service: distributed lock managerlMapReduce: simplified large-scale data processinglBigTable uses of building blocks:lGFS: stores persistent data (SSTable file format for storage of data)lScheduler: schedules jobs involved in BigTable servinglLock service: master election, location bootstrapping lMap Reduce: often used to read/write BigTable data 喝茶,上网,在家兼职,注册送分站: Data ModellA BigTable is a sparse, distributed persistent multi-dimensional sorted map(row, column, timestamp) -> cell contentslGood match for most Google applications 喝茶,上网,在家兼职,注册送分站: ExamplelWant to keep copy of a large collection of web pages and related informationlUse URLs as row keyslVarious aspects of web page as column nameslStore contents of web pages in the contents: column under the timestamps when they were fetched. 喝茶,上网,在家兼职,注册送分站: is an arbitrary stringlAccess to data in a row is atomiclRow creation is implicit upon storing datalRows ordered lexicographicallylRows close together lexicographically usually on one or a small number of machines 喝茶,上网,在家兼职,注册送分站: (cont.)Reads of short row ranges are efficient and typically require communication with a small number of machines.lCan exploit this property by selecting row keys so they get good locality for data access.lExample: , , , VS , , , 喝茶,上网,在家兼职,注册送分站: have two-level name structure:lfamily:optional_qualifierlColumn familylUnit of access controllHas associated type informationlQualifier gives unbounded columnslAdditional levels of indexing, if desired 喝茶,上网,在家兼职,注册送分站: to store different versions of data in a celllNew writes default to current time, but timestamps for writes can also be set explicitly by clientslLookup options:l“Return most recent K values〞l“Return all values in timestamp range (or all values)〞lColumn families can be marked w/ attributes:l“Only retain most recent K values in a cell〞l“Keep values until they are older than K seconds〞 喝茶,上网,在家兼职,注册送分站: – Three Major ComponentslLibrary linked into every clientlOne master serverlResponsible for:lAssigning tablets to tablet serverslDetecting addition and expiration of tablet serverslBalancing tablet-server loadlGarbage collectionlMany tablet serverslTablet servers handle read and write requests to its tablelSplits tablets that have grown too large 喝茶,上网,在家兼职,注册送分站: (cont.)lClient data doesn’t move through master server. Clients communicate directly with tablet servers for reads and writes.lMost clients never communicate with the master server, leaving it lightly loaded in practice. 喝茶,上网,在家兼职,注册送分站: tables broken into tablets at row boundarieslTablet holds contiguous range of rowslClients can often choose row keys to achieve localitylAim for ~100MB to 200MB of data per tabletlServing machine responsible for ~100 tabletslFast recovery:l100 machines each pick up 1 tablet for failed machinelFine-grained load balancing:lMigrate tablets away from overloaded machinelMaster makes load-balancing decisions 喝茶,上网,在家兼职,注册送分站: LocationlSince tablets move around from server to server, given a row, how do clients find the right machine?lNeed to find tablet whose row range covers the target row 喝茶,上网,在家兼职,注册送分站: AssignmentlEach tablet is assigned to one tablet server at a time.lMaster server keeps track of the set of live tablet servers and current assignments of tablets to servers. Also keeps track of unassigned tablets.lWhen a tablet is unassigned, master assigns the tablet to an tablet server with sufficient room. 喝茶,上网,在家兼职,注册送分站: operationslCreate/delete tables, column families, change metadatalWrites (atomic)lSet(): write cells in a rowlDeleteCells(): delete cells in a rowlDeleteRow(): delete all cells in a rowlReadslScanner: read arbitrary cells in a bigtablelEach row read is atomiclCan restrict returned rows to a particular rangelCan ask for just data from 1 row, all rows, etc.lCan ask for all columns, just certain column families, or specific columns 喝茶,上网,在家兼职,注册送分站: CompressionlMany opportunities for compressionlSimilar values in the same row/column at different timestampslSimilar values in different columnslSimilar values across adjacent rowslTwo-pass custom compressions schemelFirst pass: compress long common strings across a large windowlSecond pass: look for repetitions in small windowlSpeed emphasized, but good space reduction (10-to-1) 喝茶,上网,在家兼职,注册送分站: Bloom FilterslRead operation has to read from disk when desired SSTable isn’t in memorylReduce number of accesses by specifying a Bloom filter.lAllows us ask if an SSTable might contain data for a specified row/column pair.lSmall amount of memory for Bloom filters drastically reduces the number of disk seeks for read operationslUse implies that most lookups for non-existent rows or columns do not need to touch disk 喝茶,上网,在家兼职,注册送分站: Bloom FilterslRead operation has to read from disk when desired SSTable isn’t in memorylReduce number of accesses by specifying a Bloom filter.lAllows us ask if an SSTable might contain data for a specified row/column pair.lSmall amount of memory for Bloom filters drastically reduces the number of disk seeks for read operationslUse implies that most lookups for non-existent rows or columns do not need to touch disk 喝茶,上网,在家兼职,注册送分站: 云计算概述云计算概述 Google 云计算技术:云计算技术:GFS,,Bigtable 和和MapreduceYahoo云计算技术和云计算技术和Hadoop云数据管理的挑战云数据管理的挑战 喝茶,上网,在家兼职,注册送分站: Cloud computing 喝茶,上网,在家兼职,注册送分站: Cloud StackProvisioning (Self-serve)Horizontal Cloud Services…YCSYCPI BrooklynEDGEMonitoring/Metering/SecurityHorizontal Cloud Services…HadoopBATCHHorizontal Cloud Services…SherpaMOBStorSTORAGEHorizontal Cloud ServicesVM/OS…APPHorizontal Cloud ServicesVM/OSyApacheWEBData HighwayServing GridPHPApp Engine 喝茶,上网,在家兼职,注册送分站: Data ManagementLarge data analysis(Hadoop)Structured record storage(PNUTS/Sherpa)Blob storage(SAN/NAS)•Scan oriented workloads•Focus on sequential disk I/O•$ per cpu cycle•CRUD •Point lookups and short scans•Index organized table and random I/Os•$ per latency•Object retrieval and streaming•Scalable file storage•$ per GB 喝茶,上网,在家兼职,注册送分站: World Has ChangedlWeb serving applications need:lScalability!lPreferably elasticlFlexible schemaslGeographic distributionlHigh availabilitylReliable storagelWeb serving applications can do without:lComplicated querieslStrong transactions 喝茶,上网,在家兼职,注册送分站: /SHERPATo Help You Scale Your Mountains of Data 喝茶,上网,在家兼职,注册送分站: Serving Storage ProblemlSmall records – 100KB or lesslStructured records – lots of fields, evolvinglExtreme data scale - Tens of TBlExtreme request scale - Tens of thousands of requests/seclLow latency globally - 20+ datacenters worldwidelHigh Availability - outages cost $millionslVariable usage patterns - as applications and users change 110 喝茶,上网,在家兼职,注册送分站: 75656 CA 42342 EB 42521 WC 66354 WD 12352 EF 15677 EWhat is PNUTS/Sherpa?E 75656 CA 42342 EB 42521 WC 66354 WD 12352 EF 15677 ECREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…)Parallel databaseGeographic replicationStructured, flexible schemaHosted, managed infrastructureA 42342 EB 42521 WC 66354 WD 12352 EE 75656 CF 15677 E112 喝茶,上网,在家兼职,注册送分站: Will It Become? E 75656 CA 42342 EB 42521 WC 66354 WD 12352 EF 15677 EE 75656 CA 42342 EB 42521 WC 66354 WD 12352 EF 15677 EE 75656 CA 42342 EB 42521 WC 66354 WD 12352 EF 15677 EIndexes and views 喝茶,上网,在家兼职,注册送分站: of machineslEasy to add capacitylRestrict query language to avoid costly queriesGeographic replicationlAsynchronous replication around the globelLow-latency local accessHigh availability and fault tolerancelAutomatically recover from failureslServe reads and writes despite failuresDesign Goals115ConsistencylPer-record guaranteeslTimeline model lOption to relax if neededMultiple access pathslHash table, ordered tablelPrimary, secondary accessHosted servicelApplications plug and playlShare operational cost 喝茶,上网,在家兼职,注册送分站: ElementsPNUTS • Query planning and execution• Index maintenanceDistributed infrastructure for tabular data • Data partitioning • Update consistency• ReplicationYDOT FS • Ordered tablesApplicationsTribble• Pub/sub messagingYDHT FS • Hash tablesZookeeper• Consistency serviceYCA: AuthorizationPNUTS APITabular API116 喝茶,上网,在家兼职,注册送分站: ManipulationlPer-record operationslGetlSetlDeletelMulti-record operationslMultigetlScanlGetrange117 喝茶,上网,在家兼职,注册送分站: TableAppleLemonGrapeOrangeLimeStrawberryKiwiAvocadoTomatoBananaGrapes are good to eatLimes are greenApple is wisdomStrawberry shortcakeArrgh! Don’t get scurvy!But at what price?How much did you pay for this lemon?Is this a vegetable?New ZealandThe perfect fruitNameDescriptionPrice$12$9$1$900$2$3$1 $14$2$80x00000xFFFF0x911F0x2AF3118 喝茶,上网,在家兼职,注册送分站: Table119AppleBananaGrapeOrangeLimeStrawberryKiwiAvocadoTomatoLemonGrapes are good to eatLimes are greenApple is wisdomStrawberry shortcakeArrgh! Don’t get scurvy!But at what price?The perfect fruitIs this a vegetable?How much did you pay for this lemon?New Zealand$1$3$2$12$8$1$9 $2$900$14NameDescriptionPriceAZQH 喝茶,上网,在家兼职,注册送分站: SchemaPosted dateListing idItemPrice6/1/07424252Couch$5706/1/07763245Bike$866/3/07211242Car$11236/5/07421133Lamp$15ColorRedConditionGoodFair 喝茶,上网,在家兼职,注册送分站: ControllerREST APIClientsLocal regionRemote regionsTribbleDetailed Architecture121 喝茶,上网,在家兼职,注册送分站: Splitting and Balancing122Each storage unit has many tablets (horizontal partitions of the table)Tablets may grow over timeOverfull tablets splitStorage unit may become a hotspotShed load by moving tablets to other serversStorage unitTablet 喝茶,上网,在家兼职,注册送分站: PROCESSING123 喝茶,上网,在家兼职,注册送分站: Data124SUSUSU1Get key k2Get key k3Record for key k4Record for key k 喝茶,上网,在家兼职,注册送分站: Read125SUScatter/gather serverSUSU1{k1, k2, … kn}2Get k1Get k2Get k3 喝茶,上网,在家兼职,注册送分站: unit 1Storage unit 2Storage unit 3Range Queries in YDOTlClustered, ordered retrieval of recordsStorage unit 1CanteloupeStorage unit 3LimeStorage unit 2StrawberryStorage unit 1RouterAppleAvocadoBananaBlueberryCanteloupeGrapeKiwiLemonLimeMangoOrangeStrawberryTomatoWatermelonAppleAvocadoBananaBlueberryCanteloupeGrapeKiwiLemonLimeMangoOrangeStrawberryTomatoWatermelonGrapefruit…Pear?Grapefruit…Lime?Lime…Pear?Storage unit 1CanteloupeStorage unit 3LimeStorage unit 2StrawberryStorage unit 1 喝茶,上网,在家兼职,注册送分站: key k2Write key k7Sequence # for key k8Sequence # for key kSUSUSU3Write key k45SUCCESS6Write key kRoutersMessage brokers127 喝茶,上网,在家兼职,注册送分站: REPLICATION AND CONSISTENCY128 喝茶,上网,在家兼职,注册送分站: Replication129 喝茶,上网,在家兼职,注册送分站: Make it easier for applications to reason about updates and cope with asynchronylWhat happens to a record with primary key “Alice〞?Consistency Model130TimeRecord insertedUpdateUpdateUpdateUpdateUpdateDeleteTimev. 1v. 2v. 3v. 4v. 5v. 7Generation 1v. 6v. 8UpdateUpdateAs the record is updated, copies may get out of sync. 喝茶,上网,在家兼职,注册送分站: Social AliceUserStatusAliceBusyWestEastUserStatusAliceFreeUserStatusAlice???UserStatusAlice???UserStatusAliceBusyUserStatusAlice______BusyFreeFreeRecord Timeline 喝茶,上网,在家兼职,注册送分站: 1v. 2v. 3v. 4v. 5v. 7Generation 1v. 6v. 8Current versionStale versionStale versionReadConsistency Model132In general, reads are served using a local copy 喝茶,上网,在家兼职,注册送分站: 1v. 2v. 3v. 4v. 5v. 7Generation 1v. 6v. 8Read up-to-dateCurrent versionStale versionStale versionConsistency Model133But application can request and get current version 喝茶,上网,在家兼职,注册送分站: 1v. 2v. 3v. 4v. 5v. 7Generation 1v. 6v. 8Read ≥ v.6Current versionStale versionStale versionConsistency Model134Or variations such as “read forward〞—while copies may lag themaster record, every copy goes through the same sequence of changes 喝茶,上网,在家兼职,注册送分站: 1v. 2v. 3v. 4v. 5v. 7Generation 1v. 6v. 8WriteCurrent versionStale versionStale versionConsistency Model135Achieved via per-record primary copy protocol(To maximize availability, record masterships automaticlly transferred if site fails)Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors) 喝茶,上网,在家兼职,注册送分站: 1v. 2v. 3v. 4v. 5v. 7Generation 1v. 6v. 8Write if = v.7ERRORCurrent versionStale versionStale versionConsistency Model136Test-and-set writes facilitate per-record transactions 喝茶,上网,在家兼职,注册送分站: TechniqueslPer-record masteringlEach record is assigned a “master region〞lMay differ between recordslUpdates to the record forwarded to the master regionlEnsures consistent ordering of updateslTablet-level masteringlEach tablet is assigned a “master region〞lInserts and deletes of records forwarded to the master regionlMaster region decides tablet splitslThese details are hidden from the applicationlExcept for the latency impact! 喝茶,上网,在家兼职,注册送分站: 42342 EB 42521 WC 66354 WD 12352 EE 75656 CF 15677 EA 42342 EB 42521 WC 66354 WD 12352 EE 75656 CF 15677 EA 42342 EB 42521 WC 66354 WD 12352 EE 75656 CF 15677 ETablet master 喝茶,上网,在家兼职,注册送分站: Insert/Update/ReplaceClientSource DataBulk manager1.Client feeds records to bulk manager2.Bulk loader transfers records to SU’s in batches•Bypass routers and message brokers•Efficient import into storage unit 喝茶,上网,在家兼职,注册送分站: Load in YDOTlYDOT bulk inserts can cause performance hotspotslSolution: preallocate tablets 喝茶,上网,在家兼职,注册送分站: MaintenancelHow to have lots of interesting indexes and views, without killing performance?lSolution: Asynchrony!lIndexes/views updated asynchronously when base table updated 喝茶,上网,在家兼职,注册送分站: CONTEXT142 喝茶,上网,在家兼职,注册送分站: of Record StoreslQuery expressivenessSimpleFeature richObject retrievalRetrieval from single table of objects/recordsSQLS3PNUTSOracle 喝茶,上网,在家兼职,注册送分站: of Record StoreslConsistency modelBest effortStrong guaranteesEventual consistencyTimeline consistencyACIDS3PNUTSOracleProgram centric consistencyObject-centric consistency 喝茶,上网,在家兼职,注册送分站: of Record StoreslData modelFlexibility,Schema evolutionOptimized forFixed schemasCouchDBPNUTSOracleConsistency spans objectsObject-centric consistency 喝茶,上网,在家兼职,注册送分站: of Record StoreslElasticity (ability to add resources on demand)InelasticElasticLimited (via data distribution)VLSD(Very Large Scale Distribution /Replication)OraclePNUTSS3 喝茶,上网,在家兼职,注册送分站: Stores ComparisonlUser-partitioned SQL storeslMicrosoft Azure SDSlAmazon SimpleDBlMulti-tenant application databaseslSalesforce lOracle on DemandlMutable object storeslAmazon S3Versus PNUTSlMore expressive querieslUsers must control partitioninglLimited elasticitylHighly optimized for complex workloadslLimited flexibility to evolving applicationslInherit limitations of underlying data management systemlObject storage versus record management 喝茶,上网,在家兼职,注册送分站: Design SpaceRecordsFilesGet a few thingsScan everythingSherpaMObStorEverestHadoopYMDBMySQLFilerOracleBigTable148 喝茶,上网,在家兼职,注册送分站: MatrixElasticOperabilityGlobal lowlatencyAvailabilityStructuredaccessSherpaY! UDBMySQLOracleHDFSBigTableDynamoUpdatesCassandraConsistency modelSQL/ACID149 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: do you scale up applications?lRun jobs processing 100’s of terabytes of datalTakes 11 days to read on 1 computerlNeed lots of cheap computerslFixes speed problem (15 minutes on 1000 computers), but…lReliability problemslIn large clusters, computers fail every daylCluster size is not fixedlNeed common infrastructurelMust be efficient and reliable 喝茶,上网,在家兼职,注册送分站: Source Apache ProjectlHadoop Core includes:lDistributed File System - distributes datalMap/Reduce - distributes applicationlWritten in JavalRuns on lLinux, Mac OS/X, Windows, and SolarislCommodity hardware 喝茶,上网,在家兼职,注册送分站: Hardware Cluster of HadooplTypically in 2 level architecturelNodes are commodity PCsl40 nodes/racklUplink from rack is 8 gigabitlRack-internal is 1 gigabit 喝茶,上网,在家兼职,注册送分站: File SystemlSingle namespace for entire clusterlManaged by a single namenode.lFiles are single-writer and append-only.lOptimized for streaming reads of large files.lFiles are broken in to large blocks.lTypically 128 MBlReplicated to several datanodes, for reliabilitylAccess from Java, C, or command line. 喝茶,上网,在家兼职,注册送分站: PlacementlDefault is 3 replicas, but settablelBlocks are placed (writes are pipelined):lOn same nodelOn different racklOn the other racklClients read from closest replicalIf the replication for a block drops below target, it is automatically re-replicated. 喝茶,上网,在家兼职,注册送分站: is Yahoo using Hadoop?lStarted with building better applicationslScale up web scale batch applications (search, ads, …)lFactor out common code from existing systems, so new applications will be easier to writelManage the many clusters 喝茶,上网,在家兼职,注册送分站: Production WebMaplSearch needs a graph of the “known〞 weblInvert edges, compute link text, whole graph heuristicslPeriodic batch job using Map/ReducelUses a chain of ~100 map/reduce jobslScalel1 trillion edges in graphlLargest shuffle is 450 TBlFinal output is 300 TB compressedlRuns on 10,000 coreslRaw disk used 5 PB 喝茶,上网,在家兼职,注册送分站: Sort BenchmarklStarted by Jim Gray at Microsoft in 1998lSorting 10 billion 100 byte recordslHadoop won the general category in 209 secondsl910 nodesl2 quad-core Xeons @ 2.0Ghz / nodel4 SATA disks / nodel8 GB ram / nodel1 gb ethernet / nodel40 nodes / rackl8 gb ethernet uplink / racklPrevious records was 297 seconds 喝茶,上网,在家兼职,注册送分站: clusterslWe have ~20,000 machines running HadooplOur largest clusters are currently 2000 nodeslSeveral petabytes of user data (compressed, unreplicated)lWe run hundreds of thousands of jobs every month 喝茶,上网,在家兼职,注册送分站: Cluster Usage 喝茶,上网,在家兼职,注册送分站: Uses Hadoop?lAmazon/A9lAOLlFacebooklFox interactive medialGoogle / IBMlNew York TimeslPowerSet (now Microsoft)lQuantcastlRackspace/MailtrustlVeohlYahoo!lMore at :// 喝茶,上网,在家兼职,注册送分站: more information:lWebsite: :///corelMailing lists: lcore-dev@hadoop.apachelcore-user@hadoop.apache 喝茶,上网,在家兼职,注册送分站: 云计算概述云计算概述 Google 云计算技术:云计算技术:GFS,,Bigtable 和和MapreduceYahoo云计算技术和云计算技术和Hadoop云数据管理的挑战云数据管理的挑战 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: 喝茶,上网,在家兼职,注册送分站: ReadingEfficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2021)Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2021)Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana YerneniAsynchronous View Maintenance for VLSD Databases,Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu RamakrishnanSIGMOD 2021 Cloud Storage Design in a PNUTShellBrian F. Cooper, Raghu Ramakrishnan, and Utkarsh SrivastavaBeautiful Data, O’Reilly Media, 2021 喝茶,上网,在家兼职,注册送分站: ReadingF. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. G. DeCandia et al. Dynamo: Amazon’s highly available key-value store. In SOSP, 2007. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, 2003. D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000. 喝茶,上网,在家兼职,注册送分站: 。

      点击阅读更多内容
      关于金锄头网 - 版权申诉 - 免责声明 - 诚邀英才 - 联系我们
      手机版 | 川公网安备 51140202000112号 | 经营许可证(蜀ICP备13022795号)
      ©2008-2016 by Sichuan Goldhoe Inc. All Rights Reserved.