您所在位置：网站首页 > 医学/心理学 > 基础医学 > 第12章MPEG视频编码II

第12章MPEG视频编码II.ppt

35页

卖家[上传人]：桔****

文档编号：585429365

上传时间：2024-09-02

文档格式：PPT

文档大小：1.64MB

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

20金贝

下载

/ 35 举报版权申诉马上下载

文本预览

下载提示

常见问题

第12章 MPEG视频编码IISlide 1目录•MPEG-4概述•可视对象编码•合成对象编码MPEG-4 overviewSlide 3MPEG-4 可视对象编码的特点•综合性：自然音视频对象与合成音视频对象的集成•交互性：选择播放，超链等•高效率的压缩编码：1/5～1/10的MPEG2码率，几乎相同的质量MPEG-4可视对象的编码Slide 5第1代视频编码•The smallest entity in a picture is a pixel with its associated texture (color), and motion•Message to be coded for every pixel: texture (color) + motionSlide 6第1代视频编码的不足–与人的视觉本质不同–不易控制场景中的不同对象–潜力有限Slide 7第2代视频编码•将一个场景分为一系列组成对象，对每个对象分别编码Slide 8第2代视频编码Slide 9第2代视频编码•The smallest entity in a picture is an object with its associated shape, texture (color), and motion•Message to be coded for every pixel: shape + texture (color) + motionSlide 10MPEG-4的音视场景Slide 11MPEG-4音视场景的描述•在MPEG-4中，音视场景采用基于对象的描述方式，场景由媒体对象以层次方式组合而成（树），叶节点是初级(primitive) 媒体对象，例如: –静止图像 (固定不变的背景), –视频对象 (没有背景的说话人) –音频对象 (说话人所发出的声音); –其他，如文本和图形. •初级媒体对象可以是自然的，也可以是人造(合成)的, 可以是 2维,也可以是3维. •使用BIFS的（Binary Format for Scenes）语言来对场景的组成、场景中的音视对象的时空关系进行描述Slide 12MPEG-4的音视场景假想的观察者位置视频复合投影平面场景坐标系用户输入下载的数据/控制复合流上载的数据/控制复合流场景人2D背景家具演示地球仪讲台声音教师(场景的逻辑结构)Slide 13MPEG-4 场景描述的优点•可以集成各种对象，无缝地集成自然媒体(源于麦克风、摄象机等)与人造媒体(计算机生成) 、实时信息与存储信息， AV0可以是单／双／多声道音频信息、单／双／多镜头2D／3D视频信息。

•提供更强的交互能力，场景中的对象（人、桌子、地球仪、白扳、人的声音）以及多媒体演示声音均作为单个对象而独立编码，用户可以有选择地与其中某(几)个对象交互•具有良好的重用性，可重新组合音视对象 AVO (Audio Visual Object)构造新场景Slide 14BIFS 示例示例Slide 15MPEG-4视频流结构视觉对象序列（VS:Visual Object Sequence）视频对象（VO:Video Object）视频对象层（VOL:Video Object Layer）视频对象平面组（GOV: Group Of VOP）视频对象平面（VOP:Video Object plane）Slide 16VOP的编码•VOP的描述：形状(shape)、运动(motion)、纹理(texture)MUXBuffertexturecodingmotioncompensationmotionestimationprevious reconstruction VOP+-shapecoding VOP of arbitrary shape VOP of arbitrary shape shape infomotion infotexture info输入VOPVOP编码器Slide 17基于VOP的运动补偿•MC-based VOP coding in MPEG-4 again involves three steps:–Motion Estimation.–MC-based Prediction.–Coding of the prediction error.• Only pixels within the VOP of the current (Target) VOP are considered for matching in MC.•To facilitate MC, each VOP is divided into many macro blocks (MBs). MBs are by default 16×16 in luminance images and 8×8 in chrominance images.Slide 18Motion CompensationSlide 19Padding An example of Repetitive Padding in a boundary macroblock of a Reference VOP: (a) Original pixels within the VOP, (b) After Horizontal Repetitive Padding, (c) Followed by Vertical Repetitive Padding.Slide 20Motion Vector•目标VOP中的每个宏块在参考VOP中寻找一个最佳匹配宏块。

•N －－ the size of the MB. Map(p; q) = 1 when C(p; q) is a pixel within the target VOP, otherwise Map(p; q) = 0.•运动矢量编码与h.263类似，采用预测编码Slide 21Texture Coding•Texture coding in MPEG-4 can be based on:–DCT or–Shape Adaptive DCT (SA-DCT).•I. Texture coding based on DCT– In I-VOP, the gray values of the pixels in each MB of the VOP are directly coded using the DCT followed by VLC, similar to what is done in JPEG.– In P-VOP or B-VOP, MC-based coding is employed －－ it is the prediction error that is sent to DCT and VLC.Slide 22Texture Coding（cont.）•Coding for the Interior MBs:–Each MB is 16×16 in the luminance VOP and 8×8 in the chrominance VOP.–Prediction errors from the six 8×8 blocks of each MB are obtained after the conventional motion estimation step.•Coding for Boundary MBs:–For portions of the Boundary MBs in the Target VOP outside of the VOP, zeros are padded to the block sent to DCT since ideally prediction errors would be near zero inside the VOP.–After MC, texture prediction errors within the Target VOP are obtained.Slide 23Shape Adaptive DCTn优点：不会产生多余的系数n缺点：需要额外的模板记录最初的形状nShape Adaptive DCT (SA-DCT) is another texture coding method for boundary MBs.n Due to its efctiveness, SA-DCT has been adopted for coding boundary MBs in MPEG-4 Version 2.Slide 24 Shape Adaptive DCT(cont.)Slide 25Shape Coding•MPEG-4 supports two types of shape information, binary and gray scale.•Binary shape information can be in the form of a binary map (also known as binary alpha map) that is of the size as the rectangular bounding box of the VOP.• A value '1' (opaque) or '0' (transparent) in the bitmap indicates whether the pixel is inside or outside the VOP.• Alternatively, the gray-scale shape information actually refers to the transparency of the shape, with gray values ranging from 0 (completely transparent) to 255 (opaque).Slide 26分割出来的前景图像作为一个任意形状的VO进行编码只在视频序列的第1帧画面时传输1次,保存在背景缓冲器中, 此后仅仅传输描述镜头运动的8个参数Sprite Coding在编码前从一系列的视频画面中把背景图像抽出并拼合而成使用8个参数，对背景进行仿射变换，重建出每一帧画面的背景MPEG-4的合成对象编码Slide 282D Mesh Coding•Uniformmesh•DelaunaySlide 29Coding of Delaunay Triangulation•Except for the first location (x0, y0), all subsequent coordinates are coded differentially ── that is, for n≥1,dxn = xn − xn−1; dyn=yn−yn−1; •and afterward, dxn, dyn are variable-length coded.Slide 302D Mesh Motion coding•A new mesh structure can be created only in the Intra-frame, and its triangular topology will not alter in the subsequent Inter-frames ── enforces a one-to-one mapping in 2D mesh motion estimation.•For any MOP triangle (Pi, Pj, Pk), if the motion vectors for Pi and Pj are known to be MVi and MVj, then a prediction Predk will be made for the motion vector of Pk and this is rounded to a half-pixel precision:Predk = 0.5 (MVi + MVj) •The prediction error ek is coded asek = MVk − Predk Slide 313D合成对象的编码 • 人脸动画–MPEG-4定义了人脸定义参数（FDP）和人脸动画参数（FAP）,也定义了身体的模型参数和动画参数。

在解码器中的人脸模型能通过传来的动画参数产生各种运动,如表情、说话等也可以通过下载人脸的模型参数由一个通用的人脸模型生成一个特定的人脸 Slide 32Slide 33Slide 34。

点击阅读更多内容