
立体视频的4维矩阵DCT编码算法临床医学论文.doc
8页立体视频的4维矩阵DCT编码算法_临床医学论文 【摘要】 基于4维矩阵DCT变换理论提出了一种有效的立体视频编码方法通常的方法是对每个立体对都进行视差估计和运动估计,该 论文 的方法采用4维矩阵DCT变换去除双目图像数据间的冗余,因此只需进行三分之一的运动估计 计算 而不需做任何视差估计第一个立体对被称为“I 立体对”,接下来的每三个相邻的立体对被称为一个“P 立体对”,并且以4维单元的形式进行保存仅对“P立体对”的中间时刻的块进行运动估计,中间时刻的运动矢量被视为4维单元的统一的运动矢量来计算运动补偿然后,用4维矩阵DCT变换进一步的去除时域和空域上的冗余性实验结果表明在计算复杂度大大降低的情况下该方法可以得到较好的编码性能 【关键词】 立体视频编码; 立体对; 4维矩阵DCT变换; 4维单元As early as in 1928, John Logie Baird in Great Britain as well as Paul Nipkow in Germany mentioned stereoscopic television[1]. However, since then, few people paid attention to this issue. During the last 5 years the situation has changed, stereoscopic television has again become more important in both research and industry. This is related to the advances in camera technology, computing power, rendering technology and 3D displays. Stereo video can provide users 3D scene perception by showing two frames to different eyes simultaneously, so it has wide application in many fields, such as 3D television, 3D video application, robot vision, virtual machine, medical surgery and so on[2]. Although stereo video is attractive, the amount of video data and the computational complexity is doubled. A good coding system is required to solve the problem of huge data with limited bandwidth.Coding schemes for stereo video have to be adapted to the selected scene representation. One type of such data is polygonal 3D points and connectivity. Typically these are stored and transmitted in text format being a tremendous waste of capacities. ISO MPEG has recognized the importance of efficient 3D mesh compression and included a tool called 3D Mesh Compression (3DMC) for static meshes exploiting spatial dependencies of adjacent polygons in MPEG-4[3]. The common system is a straightforward solution, which encodes all the video signals independently using a state-of-the-art video codec such as MPEG-2/H.264[4], or wavelet[5]. These stereo video systems refer one of the stereo video sequences as the main sequence and encodes it by the MPEG-2/H.264 video coding standard in high visual quality, and refer the other sequences as auxiliary sequences and predict them both from the main sequences using DC, and from the previous auxiliary ones using MC[6-10].The main principle of image compression techniques is to remove data redundancy, which is inherent in spatial and temporal data correlation. Instead of calculating both DE and ME of every stereo pairs, we proposed in this paper a novel method based on 4D-MDCT theory. Only of the every third ME and none of DE calculation is needed, the computation complexity is reduced greatly.The rest of the sections are organized as follows. Section 1 describes the proposed stereo video coding system. Experimental results are shown in Section 2. Finally, Section 3 gives the conclusion.1 Proposed stereo video coding systemIn the proposed method, only of the every third ME should be calculated, compared to the traditional algorithms computing every stereo pair by both DE and ME. Then 4D-MDCT was used to eliminate the spatial and temporal redundancy further.According to the characteristics of HVS, a quantization table based on 4D-M was generated and used. Then, the transformed coefficients were scanned according to difference modes and encoded by context-based adaptive variable length coding[4].1.1 Group of stereo pairsA group of stereo pairs described in this paper was consisted of one “I stereo pair” and three “P stereo pair”. In Fig.1, the first stereo pair was considered as “I stereo pair” and each next adjacent three stereo pairs were referred to as “P stereo pair”. The left frame, right frame of the stereo pairs were labeled as L and R, respectively, and the width and height of each stereo frame were described as X and Y, respectively. Only the frames in middle time of “P stereo pair” were calculated by ME from the former reconstructed frames.As shown in Fig.1, the first “P stereo pair” consisted of T=k+1, T=k+2 and T=k+3. Shadow blocks in the first “P stereo pair” were taken into a uniform 4DU, which could be described as PI×J×2×3= [aijck]I×J×2×3 in 4D Matrix. The four dimensions of 4DU are image width, image height, stereo video channels and adjacent three frames in each channel, respectively. ME was only calculated between the blocks in T=k+2 and T=k. MVk+2 is the motion vector of T=k+2, but we referred it to be the uniform MVs of the shadow 4DU in the first “P stereo pair”, which could be described as MVs=MVk+2. Then MC of the shadow 4DU was calculated by this uniform MVs from the former reconstructed frames. So the computation complexity of ME could be greatly reduced.Fig. 1 A group of stereo pairs 1.2 Architecture of 4D-MDCT based stereo video encoder Fig. 2 shows the architecture of 4D-MDCT based stereo video encoder.In Fig. 2, Ik represented “I stereo pair”, which T=k. Pk+n represented “P stereo pair”. F′ represented the reconstructed reference frame. D was residual information, and D′ was consisted of some erro。