**Tile Assignment**

As
noted earlier, Larrabee uses "chunked" (also known as "binned" or "tiled")
rendering, where the target is divided into multiple rectangles or tiles. The
rendering commands are sorted according to the tiles they touch and stored in
the corresponding bins, then the contents of each bin are rendered separately
to the corresponding tile. It's a bit complex, but it considerably improves
cache coherence and parallelization.

For
chunking, rasterization consists of two steps: The first identifies which tiles
a triangle touches, and the second rasterizes the triangle within each tile. So
it's a two-stage process and I'm going to discuss the two stages separately.

Figure
12 shows an example of a triangle to be drawn to a tiled render target. The
light blue area is a 256×256 render target, subdivided into four 128×128 tiles.

With
Larrabee's chunking architecture, the first step in rasterizing the triangle in
Figure 12 is to determine which tiles the triangle touches and put the triangle
in the bins for those tiles (tiles 1 and 3). Once all triangles have been
binned, the second step, intra-tile rasterization, is to rasterize the triangle
to tile 1 when the tile 1 bin is rendered, and to rasterize the triangle to
tile 3 when the tile 3 bin is rendered.

Assignment
of triangles to tiles can easily be performed for relatively small triangles -
say, up to a tile in size, which covers 90% of all triangles - by doing
bounding box tests. For example, it would be easy with bounding box tests to
find out what two tiles the triangle in Figure 12 is in. Larger triangles are
currently assigned to tiles by simply walking the bounding box and testing each
tile against the triangle; that doesn't sound very efficient, but
tile-assignment time is generally an insignificant part of total rendering time
for larger triangles because there's usually a lot of shading work and the like
to do for those triangles. However, if large-triangle tile-assignment time does
turn out to be significant, we could use a sweep approach, as discussed
earlier, or a variation of the hierarchical approach used for intra-tile
rasterization, which I'll discuss next. This is a good example of how a CPU
makes it easy to use two completely different approaches in order to do the 90%
case well and the 10% case adequately (or well, but in a different way), rather
than having to have one size fit all.

Large-triangle
assignment to tiles is performed with scalar code for simplicity and because
it's not a significant performance factor. Let's look at how that scalar
process works because it will help us understand vectorized intra-tile
rasterization later. I'll use a small triangle for the example to make the
figures legible, but as previously noted, such a small triangle normally would
be assigned to its tile or tiles using bounding box tests.

Once
we've set up the equation for an edge (by calculating B and C, as discussed
when we looked at Figure 1), the first thing we do is calculate its value at
the trivial reject corner of each tile. The trivial reject corner is the corner
at which an edge's equation is most negative within a tile; the selection of
the trivial reject corner for a given edge is based on its slope, as we'll see
shortly. We set things up so that negative means "inside" in order to allow us
to generate masks directly from the sign bit. So you can think of the trivial
reject corner as the point in the tile that's most inside the edge. If this
point isn't inside the edge, no point in the tile can be inside the edge, and
therefore, the whole triangle can be ignored for that tile.

Figure
13 shows the trivial reject test in action. Tile 0 is trivially rejected for
the black edge and can be ignored because its trivial reject corner is
positive, and therefore, the whole tile must be positive and must lie outside
the triangle. Meanwhile, the other three tiles must be investigated further.
You can see here how the trivial reject corner is the corner of each tile most
inside the black edge; that is, the point with the most negative value in the
tile.

Note
that determining which corner is the trivial reject corner will vary from edge
to edge depending on slope. For example, it would be the lower left corner of
each tile for the edge shown in red in Figure 14 because that's the corner
that's most inside that edge.

If
you understand what we've just discussed, you're ninety percent of the way to
understanding the whole Larrabee rasterizer. The trivial reject test is
actually very straightforward once you understand it - it's just a matter of
evaluating the sign of a simple equation at the right point - but it can take a
little while to get it, so you may find it useful to re-read the previous
section if you're at all uncertain or confused.

So
that's the tile trivial reject test. The other tile test is the trivial accept
test. For this, we take the value at the trivial reject corner (the corner we
just discussed) and add the amount that the edge equation changes for a step
all the way to the diagonally opposite tile corner, the tile trivial accept
corner. This is the point in the tile where the edge equation is most positive.
You can think of this as the point in the tile that's most outside the edge. If
the trivial accept corner for an edge is negative, that whole tile is trivially
accepted for that edge and there's no need to consider that edge when rasterizing
within the tile.

Figure
15 shows the trivial accept test in action. Because the trivial accept corner
is the corner at which the edge's equation is most positive, if this point is
negative - and therefore inside the edge - all points in the tile must be
inside the edge. Thus, tiles 0 and 1 are not trivially accepted for the black
edge, because the equation for the black edge is positive at their trivial
accept corners, but tiles 2 and 3 are trivially accepted, so rasterization of
this triangle in tiles 2 and 3 can ignore the black edge entirely, saving a
good bit of work.

There's
an important asymmetry here. When we looked at trivial reject, we saw that it
applies to the whole triangle in the tile being drawn to, in the sense that
trivially rejecting a tile against any edge means the triangle doesn't touch
that tile, so the triangle can be ignored in that tile. However, trivial accept
applies only to the edge being checked; that is, trivially accepting a tile
against an edge only means that the specific edge doesn't have to be checked
when rasterizing the triangle to that tile because the whole tile is inside
that edge; it has no direct implication for the whole triangle. The tile may be
trivially accepted against one edge, but not against the others. In fact, it
may be trivially rejected against one or both of the other edges, in which
case, the triangle won't be drawn to the tile at all. This is illustrated in
Figure 16, where tile 3 is trivially accepted against the black edge, so the
black edge wouldn't need to be considered in rasterizing the triangle to that
tile, but tile 3 is trivially rejected against the red edge, and that means
that the triangle doesn't have to be rasterized to tile 3 at all.

In
Figure 17, however, tile 3 is trivially accepted by all three edges, and here
we come to a key point.

If
all three edges are negative at their respective trivial accept corners, then
the whole tile is inside the triangle, and no further rasterization tests are
needed - and this is what I meant earlier when I said the rasterizer takes
advantage of CPU smarts by not-rasterizing whenever possible. The
tile-assignment code can just store a draw-whole-tile command in the bin. Then
the bin rendering code can simply do the equivalent of two nested loops around
the shaders, resulting in a fullscreen triangle rasterization speed of
approximately infinity - one of my favorite performance numbers!

By
the way, this whole process should be familiar to 3D programmers because
testing of bounding boxes against planes (for example, for frustum culling) is
normally done in exactly the same way - although in three dimensions instead of
two - with the same use of signs to indicate inside and outside for trivial
accept and reject. Also, structures such as octrees employ a 3D version of the
hierarchical recursion used by the Larrabee rasterizer.

That
completes our overview of how rasterization of large triangles for tile
assignment works. As I said, this is done as a scalar evaluation in the
Larrabee pipeline, so the trivial accept and reject tests for each tile are
performed separately. Intra-tile rasterization, to which we turn next, is much
the same, but vectorized. And it is this vectorization that will give us
insight into applying the Larrabee New Instructions to semi-parallel tasks.