Callbacks
One of the basic needs of a threading infrastructure is the
ability to dispatch work to a thread running parallel to the main thread. A callback system (sometimes called a job
system) is one way to achieve this.
Callbacks are function pointers executed by
a thread in a thread pool. The dispatching thread maintains no connection to
the callback and does not explicitly wait for its completion. Instead, the callback
sets a flag to be checked later in the dispatching thread's execution.
Callback systems using native threading solutions often
implement this technique by putting the function pointer and parameter into a
queue, which is serviced by threads in the thread pool whenever they complete a
task.
This approach can lead to performance degradation in highly contested
situations, as demonstrated by Sample 1. Using a concurrent container does not
completely avoid this problem, since worker threads may still wait to get a
task popped off the queue. Intel TBB avoids this performance bottleneck by
maintaining a task queue per thread and implementing an efficient task-stealing
system to spread the work around.
Figure B: Callback is spawned and not waited on
Figure B shows an Intel TBB callback system that attaches a
task to a pre-existing root, which hosts multiple callback tasks at once. This
allows the callbacks to be waited on as a group, which is helpful for proper
cleanup on exit. In this tree, the child task is spawned directly since there
is no wait, and work tree roots must be waited on when they are the target of a
spawn. This Intel TBB approach allows an arbitrary number of callbacks to be in
flight at once and guaranteed complete on exit, much like a work queue provides
in the traditional approach.
Intel TBB processes the work tree in Figure B in the
following way: Execution begins with the Task, which is the parameter of the
call to spawn(). The Task calls the specified callback function, which may trigger
the creation of additional subtasks, represented by More. If subtasks are
created, those subtasks are processed next. The net effect of this execution is
that the callback function is called asynchronously, and it has an opportunity
to spawn additional work, which will also be executed asynchronously.
Presumably the final step in the callback is to indicate to the spawning thread
that the computation is complete, although this notification step is not
enforced.
Sample 4: Creating a work tree for a callback
void doCallback(FunctionPointer fCallback, void *pParam)
{
// allocation with
"placement new" syntax, see TBB reference documents
CallbackTask
*pCallbackTask = new(
s_pCallbackRoot->allocate_additional_child_of(*s_pCallbackRoot)
) CallbackTask(fCallback, pParam);
s_pCallbackRoot->spawn(*pCallbackTask);
}
Sample 4 shows the steps to create the work tree in Figure
B. Instead of creating a root for each callback task, this code uses the same
root for all callbacks. Again, it is the callback function's responsibility to
create additional subtasks and to notify the calling thread when the
calculation is complete.
Callbacks meet a basic need of game architectures: how to
run multiple functional operations simultaneously without introducing
contention for hardware resources. Intel TBB enhances this solution further by
providing a clear path to combine callbacks with loop parallelism.
Promises
In many cases, it is advantageous to keep track of
asynchronous calls and to explicitly wait for them to complete. Promises are one way to accomplish this.
When handling a request for asynchronous execution, a promise system provides
to the calling thread an object that acts as a link to the asynchronous call.
This promise object allows the calling thread to wait on the asynchronous call
when necessary and to actively contribute to the calculation if the call
introduced additional parallel work.
Figure C: Promise system keeps track of the asynchronous call.
Figure C shows that the work tree for a promise system is
fundamentally similar to the work tree for a callback system, shown in Figure
B. The primary difference is the system waits on the root of the work tree.
Another difference is the root of the tree is encapsulated inside a promise
object. The root is not executed immediately after the task completes, but only
when the promise explicitly waits for it. This prevents the root from
disappearing without the promise knowing about it.
Sample 5(a): Implementing a promise
void doPromise(FunctionPointer fCallback, void *pParam, Promise *pPromise)
{
// allocation with
"placement new" syntax, see TBB reference documents
tbb::task
*pParentTask = new(tbb::task::allocate_root())
tbb::empty_task();
assert(pPromise !=
NULL);
pPromise->setRoot(pParentTask);
PromiseTask
*pPromiseTask = new(pParentTask->allocate_child())
PromiseTask(fCallback, pParam, pPromise);
// set the ref
count to 2, which accounts for the 1 child and for the eventual wait_for_all
pParentTask->set_ref_count(2);
pParentTask->spawn(*pPromiseTask);
}
Sample 5(a) shows how the work tree in Figure C is created.
The primary difference between this code and the callback code in Sample 4 is
that a root is created, it is wrapped in an object (pPromise above), and the
reference count of the task is increased to ensure that the root will not run
until requested by the promise. This avoids the specific problem of having the
root execute asynchronously and then get deleted and reused by the Intel TBB
memory manager as an unrelated task in the tree.
Sample 5(b): Waiting on a promise
void waitUntilDone()
{
if(m_pRoot != NULL)
{
// lock up
access to one-at-a-time
tbb::spin_mutex::scoped_lock(m_tMutex);
if(m_pRoot != NULL)
{
m_pRoot->wait_for_all();
m_pRoot->destroy(*m_pRoot);
m_pRoot = NULL;
}
}
}
Sample 5(b) shows how the promise object ensures the
completion of the asynchronous call. A mutex is introduced here to ensure that
the waitUntilDone call is thread-safe. If waitUntilDone will be called by only
one thread (presumably the thread that submitted the asynchronous call in the
first place), this additional mutex is not necessary. The root is never spawned
because of the increased reference count used in the call to doPromise. Since
the root is never spawned, it is never executed and therefore never deleted by
Intel TBB. This code deletes the root explicitly once the asynchronous call is
complete.
Promises provide a comprehensive scheme for dispatching and
waiting on parallel work. An Intel TBB implementation of promises has the
additional benefit of providing an efficient work-while-waiting policy. These
features make promises broadly applicable to game architectures.
|