|
After
the great success of Intel's MMX technology, the increasing demand for
more complex algorithms based on floating-point calculations drove Intel
to define yet another new technology. This time around, it defined a
new set of instructions and data types for floating-point based algorithms,
such as 3D and advanced signal & image processing algorithms, and
extended MMX technology support for integer-based algorithms, all while
maintaining compatibility with the existing software designed for the
Intel architecture. It also included new memory operations that could
accelerate any memory-based algorithm - especially multimedia applications,
which typically use large blocks of memory.
Subsequent
projects in 3D and video applications have demonstrated that the Pentium
III processor is an excellent processor for multimedia applications.
One of the most impressive such projects is the high resolution, real-time
MPEG2 Encoder. This paper describes how the Pentium III processor and
Streaming SIMD Extensions can improve the performance of integer-based
applications, using examples from the MPEG encoder application.
Motion
Estimation & Motion Compensation
For a
better understanding, the following examples introduce two of the most
basic operations in video compression techniques applications: Motion
Estimation (ME), and Motion Compensation (MC).
ME is
performed during encoding. It makes use of the fact that the next frame
in a sequence is almost the same as the previous frame. The technique
looks for the location of a given block in the previous frame by comparing
the block to certain related blocks in the previous frame. The output
of this operation for each block is a motion vector.
MC is
the opposite operation. Given a certain motion vector and a difference
block, MC builds a new block by taking the block, which can be located
by the motion vector from the previous frame, and adding it to the difference
block.
Streaming
SIMD Extensions
The Streaming
SIMD Extensions meet the demand for specific, advanced, and yet basic
operations for video and communication.
The Streaming
SIMD Extensions include the following instructions:
pavgb
- SIMD averaging of two absolute byte-sized operands. A crucial
operation in MC & ME algorithms
psadb
- Absolute subtract and sum of two byte-sized operands. Crucial
for block matching algorithms
pmin
& pmax - SIMD minimum or maximum of two signed operands.
As the
following examples show, these new instructions ease and speed up a
lot of the basic kernels in video applications and other integer-based
algorithms.
The following
example shows the basic loop for MC using MMX technology:
|
Motion_Comp_Loop:
Movq mm0,[eax+ecx]
// read eight pixels from one block.
Movq mm4,[eax+ecx+8] // next eight pixels.
Movq mm1,[ebx+ecx] // read eight pixels from second block.
Movq mm5,[ebx+ecx+8] // next eight pixels.
Movq mm2,mm0
Movq mm3,mm1
Movq mm6,mm4 // No MMX registers left.
// mm7 was initialized to be zero.
Punpcklbw mm0,mm7 // convert the first four pixels
Punpcklbw mm1,mm7 // from byte format to short format.
Punpcklbw mm4,mm7
Punpckhbw mm2,mm7 // convert the second four pixels
Punpckhbw mm3,mm7 // from byte format to short format.
Punpckhbw mm6,mm7
// Calculate
the average values.
Paddw mm0,mm1 // after add values are 9 bits.
Paddw mm2,mm3
Movq mm1,mm5
// Now mm1 is free.
Punpcklbw mm5,mm7
Punpckhbw mm1,mm7
Paddw mm4,mm5
Paddw mm6,mm1
Psrlw mm0,1
// divide by two.
Psrlw mm2,1 // after division values are 8 bits.
Psrlw mm4,1 // divide by two.
Psrlw mm6,1 // after division values are 8 bits.
Packuswb mm0,mm2 // convert back to byte format.
Packuswb mm4,mm6 // convert back to byte format.
Movq [edx+ecx],mm0
// store results.
Movq [edx+ecx+8],mm4 // store results.
// Increment
pointer to the next line.
Jmp back while not end of macro block
|
|
Example
1. Motion Compensation Using MMX Technology
|
Since
the data range after adding two pixels is more than eight bits, you
have to convert the values to short format and then calculate the average.
Although we could do this with a shift (divide by 2) before the adding,
this would reduce one bit of accuracy.
|