Unlike H.264/AVC, where parallelism was an afterthought, the current HEVC draft contains several proposals aiming at making the codec better “parallelizable”. H.264/AVC supports slices, which were introduced mainly to prevent loss of quality in the case of transmission errors, but can also be used to parallelize the decoder. Employing slices for parallelism, however, has several problems. First and foremost, using many slices to increase parallelism incurs significant coding losses. Second, the number of slices is determined by the encoder and if the decoder relies on slices to obtain real-time performance, it may not achieve this if it receives a video sequence with only one or a few slices per frame. One of the two parallelization approaches included in the HEVC is Wavefront Parallel Processing (WPP), WPP allows creating picture partitions that can be processed in parallel without incurring high coding losses.
In Wavefront Parallel Processing (WPP) processes rows of treeblocks in parallel while preserving all coding dependencies. Since a treeblock being processed requires the left, top-left, top, and topright treeblocks to be available in order for predictions to operate correctly, a shift of at least two treeblocks is enforced between consecutive rows of treeblocks processed in parallel. Therefore, WPP requires, compared to Tiles in the non-cross border filtering mode, additional inter-core communication. Typically inter-core communication is not a burden for today’s multi-core processor architectures and WPP is therefore suited for software and hardware implementations. Especially, implementations of WPP are straight forward, since WPP does not affect the ability to perform single step processing, i.e. entropy coding, predictive coding as well as in-loop filtering can be applied in a single processing step. An example use case for WPP may be high-quality streaming over robust channels. In combination with Dependent Slices this tool can be also used in ultra-low delay applications.
Overlapped Wavefront (OWF) allows for overlapping the execution of consecutive pictures using Wavefronts. When a thread has finished a treeblock row in the current picture and no more rows are available it can start processing the next picture instead of waiting for the current picture to finish.
Related Publications
Chi Ching Chi, Mauricio Alvarez-Mesa, Ben Juurlink, Gordon Clare, Félix Henry, Stéphane Pateux and Thomas Schierl:
Parallel Scalability and Efficiency of HEVC Parallelization Approaches,
IEEE Transactions on Circuits and Systems for Video Technology, IEEE TCSVT, Special Issue on Emerging Research and Standards in Next Generation Video Coding, to appear 2012.
Chi Ching Chi, Mauricio Alvarez Mesa, Ben Juurlink, Valeri George, and Thomas Schierl:
Improving the Parallelization Efficiency of HEVC Decoding,
Proceedings of IEEE International Conference on Image Processing (ICIP 2012), Orlando, FL, USA, September 2012, accepted.
Mauricio Alvarez-Mesa, Chi Ching Chi, Ben Juurlink, Valeri George, and Thomas Schierl:
Parallel Video Decoding In The Emerging HEVC Standard,
Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan, March 2012.