**Pulling It All Together**

Now
that we've stepped all the way down through the rasterization hierarchy, let's
go back and look again at the rasterization descent overview we started with,
this time with a detailed understanding what's going on.

Figure
27 shows a triangle and a 64×64 tile to which the triangle is to be drawn, with
the tile subdivided into 16×16 blocks; Figure 27 is a repeat of Figure 8, but
this time I've added dashed extensions of the edges to the border of the tile,
so we can see what blocks and pixels are on what sides of the edges.

To
rasterize the triangle in Figure 27, we first calculate the values of the
triangle's three edge equations at the tile's trivial accept and trivial reject
corners and find that the tile is neither trivially rejected nor trivially
accepted by any edge. (Again, this would actually only be done for a large
triangle; we would use bounding box tests for such a small triangle.) We set up
the various step tables we'll use, and then we step the edge equations to their
respective trivial accept and trivial reject corners of the 16 blocks, each
16×16 in size, that make up the tile, and make a mask containing the signs of
the results.

We
then bit-scan through the resulting mask, find that 12 of the 16 blocks are
trivially rejected, and descend into each of the remaining 4 blocks in turn. In
three of the blocks, we'll ultimately find that there's nothing to draw, so for
the purposes of this discussion, we'll ignore those and look at the more
interesting case of what happens when we descend into the block that the
triangle lies inside - the block outlined in yellow. (Note that if the triangle
were large enough to fully cover a 16×16 block, that block would be trivially
accepted and no further descent into that block would be required.)

Before
we look at what happens when we descend into the 16×16 block containing the
triangle, there's one more thing in Figure 27 that we should examine. You may
have noticed that in the earlier version of this figure, Figure 8, only the one
block in yellow was found, not the three green blocks. Why did the bit-scan
find 4 blocks this time, when the triangle is entirely contained in one block?
The reason is that the Larrabee rasterization approach, as discussed in this
article, can only eliminate blocks by trivially rejecting them. If you look
closely, you will see that none of the three green blocks is trivially rejected
by any edge. This is an inefficiency of this rasterization method, although
there are techniques, which are beyond the scope of this article, that remove
much of the waste.

Descending
the rasterization hierarchy, we take the 16×16 block containing the triangle,
subdivide it into 16 4×4 blocks, and evaluate which of those are touched by the
triangle by stepping to evaluate the edge equation at each of their trivial
accept and trivial reject corners for each edge, as in Figure 28. We find that 10
of the blocks are trivially rejected, and that none of the 6 remaining blocks
are trivially accepted against all three edges.

We've
finally reached the bottom of the rasterization hierarchy, so we can bit-scan
through the partial-accept mask generated for the 16×16, to find the partially
accepted 4×4 blocks, and generate the 4×4 pixel mask for each of the blocks in
turn, as in Figure 29.

Here,
we see once again that the reliance on trivial reject to eliminate blocks has
caused a false positive on a block that actually doesn't touch the triangle
(the left-most block). It's possible to do bounding box tests to eliminate such
blocks, but it's not clear whether that's more efficient than just testing for
empty masks - that is, masks with no pixels enabled - and skipping those
blocks.

After
completing this 16×16 block, we pop back up to rasterize the other 16×16 blocks
that weren't trivially rejected (which in this case, turned out not to contain
any of the triangle). And that's really all there is to it!

**Notes on Rasterization**

Now
that we understand the basic rasterization algorithm, let's take a quick look
at some interesting implementation refinements.

In
software, we don't have the luxury of custom data and ALU sizes, but we do have
the luxury of adapting to input data, and this adaptive rasterization helps
boost our efficiency. For example, edge evaluations have to be done with 48
bits in the worst case. For those cases, being software, we have to use 64 bit
because there is no 48-bit integer support in Larrabee. However, we don't have
to do that at all for the 90+% of all triangles that fit in a 128×128 bounding
box because, in those cases, 32 bits is enough.

When
we do have to do 64-bit edge evaluation, we only have to use it for tile
assignment. As it turns out, within tiles up to 128×128 in size (and 128×128 is
our largest tile size), any edge that the tile is not trivially accepted or
rejected against can always be rasterized using 32 bits.

We
can also detect triangles that fit in a 16×16 bounding box and process them
with one less descent level, less set-up, and no trivial accept test (because
there will rarely be trivially accepted 4×4s in such small triangles). Finally,
triangles that fit in very small bounding boxes can be done simply by directly
calculating the masks for the 16 or 32 pixels directly, with little set-up and
minimal processing.

In
fact, for small triangles we could even take the z value of the closest vertex
and compare it to the z buffer for the triangle's bounding box, and possibly
z-reject the triangle before we even rasterize it!

There
are other optimization possibilities I won't get into because there's just not
space in this article, and of course, there's no telling how well they'll work
until we try them. But one nice thing about software is that it's easy to run
the experiments to check them out.

**Final Thoughts**

And
with that, we conclude our lightning tour of the Larrabee rasterization
approach, and our examination of how vector programming can be applied to a
semi-parallel task. As I mentioned earlier, software rasterization will never
match dedicated hardware peak performance and power efficiency for a given area
of silicon, but so far, it's proven to be efficient enough. It also has a
significant advantage, in that because it uses general-purpose cores, the same
resources that are used for rasterization can be used for other purposes at
other times, and vice versa.

As Tom Forsyth puts it, because the whole chip is
programmable, we can effectively bring more square millimeters to bear on any
specific task as needed - up to and including the whole chip. In other words,
the pipeline can dynamically reconfigure its processing resources as the
rendering workload changes. If we get a heavy rasterization load, we can have
all the cores working on it.

It wouldn't be the most efficient rasterizer per
square millimeter, but it would be one heck of a lot of square millimeters of
rasterizer, all doing what was most important at that moment; in contrast to a
traditional graphics chip with a hardware rasterizer, where most of the
circuitry would be idle when there was a heavy rasterization load. A little
while later, when the load switches to shading, the whole Larrabee chip can
become a shader if necessary. Software simply brings a whole different set of
strengths and weaknesses to the table.

There's
a lot to learn and rethink with Larrabee, and a lot of potential to be
exploited. Only time will tell how well it all works out - but meanwhile, it
certainly is an interesting time to be a performance programmer!

Further
information about Larrabee is available at www.intel.com/software/graphics.

**About the Author**

Michael
Abrash is a programmer at RAD Game Tools working on the Larrabee project, a video game programming veteran, and the author of numerous books and
articles on graphics programming and performance optimization.

*[This article was originally published in association with Gamasutra's sister publication Dr. Dobb's, which delivers in-depth coverage of the art and business of
software development from a cross-platform, language-independent point of view.
Dr. Dobb's Report (print), Digest (digital) and Online all contain articles
written exclusively for professional software development leaders and
architects, and provides tools and techniques using relevant, real-world
solutions.]
*