The Tukwila Data Integration SystemUniversity of Washington 高级数据库技术TheTukwilaDataIntegrationSystem课件ContentnTukwila ArchitecturenAdaptive Query ExecutionnThe Need for AdaptivitynInterleaving Planning and ExecutionnAdaptive Query OperatorsnImplementationnThe work nextTukwila ArchitectureTukwila ArchitecturenClient/user interface nUser poses queries in terms of mediated relational schema.nThe querys are fed to the reformulatorTukwila ArchitecturenReformulatornMiniCon, mapping the input query from the mediated schema to the data sources . nUse data source catalog. Tukwila ArchitecturenData source catalognMappings from the available data sources to the mediated schema. nOverlap information about pairs of Data Sources. nEstimated access costs, sizes of relations, etc. Tukwila ArchitecturenOptimizernSpecify partial plansnIncrementally re-optimize a query nGenerates event-condition-action rulesTukwila ArchitecturenAdaptive query execution engine nOperator Execution processes query plans nEvent Handler for dynamically interpreting ruleTukwila ArchitecturenWrappers nAvailable for ODBC-compliant.nTranslate data formatsnLocation-independentAdaptive Query ExecutionnThe Need for Adaptivity nAbsence of statisticsnCardinalities/histogramsnUnpredictable data arrival characteristicsnInitial delays before data arrivalnBursty arrivals of data thereafternOverlap and redundancy among sourcesnslow or unavailable data sources can often be replaced by overlapping or mirrored sourcesAdaptive Query ExecutionnInterleaving Planning and ExecutionnQuery PlansnQuery ExecutionnEvent HandlingAdaptive Query ExecutionnQuery PlansnMakeupnA partially-set of fragments nA global rulenProcessnFragments in partial order execute in the ordernFragments unrelated in partial order execute in parallelnThe end of a fragmentnResults are materializednThe rest of the plan can be re-optimized or reschedulednChoose next fragment by rule or partial orderQuery PlansnFragments and OperatorsnFragmentnA fully pipelined tree of physical operatorsnA set of local rulesnOperatornAlgebraic: selection, join…nChosen physical implementation: double pipelined join…nchildren the node:nMemory allocated:Query PlansnRulesnRe-optimizationnCardinality estimate significantly differ from actual sizenEnd of a fragmentnContingent planningnWhich fragment to selectnEnd of a fragmentnAdaptive operatorsnPolicy of Memory overflow resolution in Double Pipelined JoinnPolicy of source selection in Dynamic CollectornReschedulingnWhen a plan should be rescheduledna source times out (as in query scrambling)Query PlansnRulennOwner: Operator or Fragment nTrigger When: active rule with active ownernExample:nwhen closed(frag1) if card(join1) > 2 * est_card(join1) then replanQuery PlansnRulenEventnopen, closed: fragment/operator starts or completesnerror: operator failurentimeout(n): data source cannot response in n msecnout_of_memory: join has insufficient memorynthreshold(n): n tuples processed by operatorn Condition: Comparator termsnstate(operator): Current statencard(operator): Number of tuples producedntime(operator): Time waiting since last tuplenmemory(operator): memory used so farQuery PlansnRulenActionnSet the overflow method for a double pipelined joinnAlter a memory allotmentnDeactivate an operator or fragmentnReschedule the query operator treenRe-optimize the plannReturn an error to usernAttentionnAll Actions executed before another event firednRules with inactive owner is inactiveEvent HandlingnEventnGenerated at any timenEvent queue nEvent handlingnFor each event, uses hash table to find all matching rules in active setnAll Actions executed before another event firedAdaptive Query OperatorsnDynamic collectorsnDouble Pipelined joinnDouble Pipelined joinnHandling Memory OverflownIncremental Left Flush (fewer I/O)nIncremental Symmetric Flush (reduced latency)Dynamic collectorsnTask nPerform a Union over overlapping sourcesnDisjunction on leavesnHandle errorsnDeciding to ignore slow mirror source by rulesnOperator’s informationnA set of children (sources) nA Policy for contact the sources a set of triples , actually event-condition-action rule:ni: ith child nai: activation conditionnti: a termination conditionDynamic collectorsnExample when opend(coll1) if true then activate(coll1,A); activate(coll1,B)when threshold(A,10)if true then deactivate(coll1,B)when threshold(B,10)if true then deactivate(coll1,A)when timeout(A)if true then activate(Coll1,C);deactivate(Coll1,B); deavtivate(Coll1,A)Double Pipelined joinnCharacternSymmetric and Incremental JoinnProduces tuples almost immediatelynMask the slow data source transmission ratenData-driven ProcessnEach of join relations sends tuples as quick as possiblenUse tuple to probe the hash table for the other relation and add the tuple to the hash table for current relationnTuple is output immediately if Joined Double Pipelined joinnAdvantagenTime to first output is minimizednTuples are output as quickly as data source allownOptimizer doesn’t need to choose “inner” relation nThe operator is symmetric nCompensate(mask) the slow data sourcenData-driven, process the other source more quicklynMake efficient use of CPU, process one waiting the otherDouble Pipelined joinnDisadvantage and solutionnDisadvantage 1: Data-dirven and bottom-up model but system is top-down and iterator-basednSolution:nUse multithreading:nOutputnLeft ChildnRight child nTuples from children are put into the tuple transfer queue.Double Pipelined joinnDisadvantage and solutionnDisadvantage 2: Requires memory for both two join relationsnSolution:nTrade off some total execution time to get first tuples soonernOptimizer may use conventional joinsnSeveral strategies for efficiently dealing with memory overflowDouble Pipelined joinnIncremental Left Flush:nEssential:nReading only tuples from right-side relationnFlush a bucket from the left-side relation’s hash tablenGradually degrade into hybrid hash ushing buckets lazilyDouble Pipelined joinnIncremental Left Flush:nProcess:n(1)Pause reading from A( A left B right)n(2)Flush Some buckets from A’s hash tablen(3)Read tuple from B, probe A’s (partial)hash table; mark the tuple if its match bucket have been flushed.n(4)B’hash table runs out of memory, flush some bucketsn(5)All of B read, resume processing tuples from A, like (3), tuples are probed or markedn(6)A and B processed, do a recursive hybird hash join.nUnmarked tuples from A with marked tuples from BnMarked tuples from A with unmarked and marked tuples BDouble Pipelined joinnIncremental Symmetric FlushnEssential:nFlush a bucket from the both side relation’s hash tablenGradually degrade into hybrid hash ushing buckets lazilyDouble Pipelined joinnIncremental Symmetric Flush:nProcess:n(1)Flush a bucket from A or B’s hash tablen(2)Continue reading tuples from A and Bn(3)For newly tuple, probe the opposite hash table; mark the tuple if its match bucket have been flushed.n(4)A and B processed, do a recursive hybird hash join.nUnmarked tuples with marked tuples nMarked tuples with unmarked and marked tuplesUnmarked have been joined with unmarkedImplementationnImplementation:nCommunication:nPre-existing Standards such as Socket, XML, UnicodenTCP Socket interface between modulesnCharacters:nScalability and distributionnAll components share single or separate machines nLanguage- and platform-independentnExecution Engine: C++ on Windows NTnOptimizer and Wrappers: JavaImplementationnImplementation:nPlan:nXML- based query plannArchitecture:nMultithreadnPrefetchingnDynamic ColletorsnDouble Pipelined JoinnDone by OS, Controlled by execution enginenCustom Memory-management systemThe work nextnWork:nFully integrated support for processing of XML Datanbased on the x-scan operator nSupport for output of incremental results nProviding "preview" answers to queries nsemantics of preview results noptimization and execution strategies for "preview queries." nDevelopment of techniques and algorithms for "convergent query processing." nre-optimize a query at any point nfairly complex classes of queries。