January 20, 2009

Advances in Rendering for Physics

flowvisualization.jpg Meshula.net, the consistently informative blog run by veteran programmer Nick Porcino, has a useful overview of a free-access online physics journal, the New Journal of Physics.

To suit his audience, Porcino has highlighted a series of articles drawn from the NJP focusing on applying the principles explored by the journal to areas like rendering and simulation.

For example: Flow visualization and field line advection in computational fluid dynamics: application to magnetic fields and turbulent flows, published last month by Pablo Mininni, Ed Lee, Alan Norton, and John Clyne. An excerpt from the abstract follows:

"Accurately interpreting three dimensional (3D) vector quantities output as solutions to high-resolution computational fluid dynamics (CFD) simulations can be an arduous, time-consuming task. Scientific visualization of these fields can be a powerful aid in their understanding. However, numerous pitfalls present themselves ranging from computational performance to the challenge of generating insightful visual representations of the data. In this paper, we briefly survey current practices for visualizing 3D vector fields, placing particular emphasis on those data arising from CFD simulations of turbulence."

Other topics singled out by Porcino include visualization of spiral and scroll waves, visualizing a silicon quantum computer, and the simulation and visualization of a number of astrophysical bodies and phenomena.

January 08, 2009

Sponsored Feature: Multi-Core Simulation of Soft-Body Characters Using Cloth

Here, Intel senior software engineer Brad Werth explains how multicore CPUs can be leveraged for an efficient method of representing soft-body characters by way of cloth simulation.

Because of CPUs constrained with limited computational power, most character animation is done today with a bones-and-skin method, which uses a small number of invisible control points, linked together as bones, to control the a figure. To "skin" the character, the visual portions of the figure are then bound to specific bones.

While this method provides a relatively fast way to animate characters given CPU constraints, the advent of two-core and fore-core processors is allowing for more sophisticated character animation techniques:

"With additional processing power, the bones-and-skin method can be extended for more detailed animation. More bones can be added for additional degrees of freedom. But there is no need to create additional bones unless a new joint is being modeled. Improvements are also possible by focusing on more complicated movement of the bones.

Instead of just blending between canned animations, animations can be blended with physics to create dynamic motion in the bones. This is already implemented in games that use "rag doll" corpses, and has been implemented in some middleware products also.

Bones-and-skin is one method for character animation, but it is not the only viable choice assuming that increased computational power is available. Since the skin is the only visible part of the character, an alternative is to ignore bones and calculate the shape and movement of the skin directly.

If the skin is disengaged from bones, then only local forces and constraints maintain the character's form. The resulting skin can be manipulated equally well from internal and external forces. A character built this way is sometimes called a "soft-body" character.

In a soft-body character, simulated cloth can be used as a skin. Forces applied to the cloth create the form of the soft-body character. Sock puppets are a simple example of this technique. The cloth provides local constraints to maintain the form of the sock, and the hand provides the forces to give the character volume.

When simulating cloth computationally, the hand becomes an invisible mathematical construct and the sock is attached to that construct at key points. By expanding on this concept, a variety of soft-body characters can be created."

January 07, 2009

Visualizing Floats

visualizing_floats.jpgFloating point numbers permeate almost every area of game programming. They are used to represent everything from position, velocity, and acceleration, to fuzzy AI variables, texture coordinates, and colors.

Yet, despite their ubiquitous role, few programmers really take the time to study the underlying mechanics of floating point numbers, their inherent limitations, and the specific problems these can bring to games.

As an independently written article, Neversoft co-founder and veteran programmer Mick West lays out the specifics to help visualize the problems with floats in a new article that starts off as follows:

"A float consists of 32 bits: a sign bit, an 8-bit exponent (e), and a 23-bit significand (s).

To visualize the problems with floats, it's useful to visualize the differences between floats and integers. Consider how the 32-bit integers represent space. There are 232 integers; each one can be thought of as representing a region between two points on a line.

If each integer represents 1 millimeter, then you can represent any distance using integers from 1mm to 232mm. That's any distance up to about 4,295km, about 2,669 miles, with a resolution of 1mm."

You can now read the full Gamasutra feature on visualizing floats, floats versus integers and other useful calculations (no registration required, please feel free to link to this feature from other websites).

December 23, 2008

The Catch for Multi-threading Decision Making

aigamedev.gifContract-based developer Alex J. Champandard runs the AiGameDev.com site and blog, excellent resources for game makers working in the artificial intelligence space. The site is regularly updated with posts, Q&As, videos, and other features centering around AI.

Some of the content is directly related to multithreading and even Intel in particular, as numerous articles deal with the ins and outs of multithreading AI-related tasks, and examples frequently make use of Intel's threaded building blocks C++ library.

For example, The Catch for Multi-threading Decision Making contains a video Q&A with Champandard addressing the notion of dedicating processor cores specifically to AI.

"It's harder to throw processing power at the problem than you realize," he offers. "It's easier to throw, let's say, many more animations and more behaviors, but you don't necessarily have a linear growth in computation when you have more assets; you will structure them in such a way that your decision-making is always not linear. If you have N behaviors, you have to make sure your decision-making is not [in] order N, but has some structure to it, so when all these behaviors need to be selected, you're going to do that in a scalable way. ...Processing power is not something you use linearly."

As well as going further in-depth into that topic with visual aids, the post also follows a practical primer called Hierarchical Logic and Multi-threaded Game AI, and links to a feature on Multi-threading a Simple Hierarchical Planner to Estimate Performance (note that that piece requires registration on the site, which is free).

December 05, 2008

Video Demonstrations of the Smoke Framework

Following up on the recent Smoke framework overview, here are two videos from the Intel Software Network featuring Intel's Orion Granatir, Brad Werth, and Omar Rodriguez demonstrating the Smoke framework demo application, and explaining some of the concepts behind the project.

The demo was run on an Intel Core I7 with all four cores, as well as four virtual threads with hyperthreading. "There's lots of ways you can do threading," said Granatir. "It's a difficult problem." He noted:

"The Smoke demo is a technical demo we put together at Intel to showcase threading in a game environment. We wanted a lot of thread interaction between a lot of common game systems, and show you could thread these in an effective manner. ...The demo has two purposes: it shows off this threading, and it shows off how you can harness the power of Intel's modern CPUs."

Speaking on the project, Werth explained:

"In development, we ran [this demo] under two different threading system. We built what we call the native thread pool system using the Windows native APIs, then we built a threading system using the same API for the Intel Threading Building Blocks, which is an open-source library that was originally produced at Intel, but we were so happy with it we opened it up to the open source community."

In a second video, Rodriguez shows off the demo in more depth, characterizing it as a "multi-threaded demo with multiple game systems interacting in a highly-threaded environment." It allows the user to toggle between the number of available threads, and measures total CPU utilization, frames per second, available threads, and a list of systems interacting, including physics, audio, AI, animation, scripting, graphics, and others.

December 01, 2008

Performance Scaling With Cores: Introducing The Smoke Framework

smoke.gifIn an overview piece posted on Gamasutra, Intel Software and Services Group application engineer Orion Granatir and PC Perspective editor-in-chief Ryan Shrout have outlined Intel's Smoke framework, which attempts to intelligently optimize multi-threaded processors for gaming, ameliorating some of the difficulties encountered by game developers who are unaccustomed to the increasingly important world of multicore development, which is "moving in the direction of 'more cores' rather than 'more clocks.'"

In addition to explaining the purpose and architecture of Smoke, the piece gives a practical example based on a demo application, showing relative performance based on scaling to various numbers of cores.

As the pair explains:

"Smoke is a model framework that maximizes the performance of the processor in a purely gaming environment. Built to take advantage of all available threads, it works equally efficiently on standard dual-core Intel Celeron processors as well as on new Intel Core i7 Processors with Intel Hyper-Threading Technology.

"The Smoke video demonstration, shown at many trade shows and technology events, uses modern game-development technologies, including Havok for physics processing, FMOD for audio playback, Ogre 3D and DirectX 9 for graphics rendering, and more. As you would expect for an internally developed demo, the code shows the Smoke framework as a well-partitioned and configurable product.

"Intel developed Smoke mainly as a teaching tool to demonstrate the ability to create a framework that can scale to any number of threads. Developers are encouraged to explore the technology by examining new threading techniques and learning about the interactions between various game engine systems that typically hold back a game’s potential threadability.

"Intel’s goal through this and other efforts is to help prove that multi-threaded gaming can be done effectively and is an investment in time that is worth taking."

The full article is available to read on Gamasutra; no registration is required.

November 26, 2008

Adding YouTube Integration to Games

youtube.jpgIn a detailed technical feature with sample code posted on Gamasutra, Team Bondi programmer Claus Höfele delves into the practical steps for your players to get gameplay footage automagically uploaded online.

Noting that "increasingly, YouTube integration is seen as a valuable feature addition to games," Höfele points to Maxis' Spore Creature Creator, which makes extensive use of YouTube integration, features its own YouTube channel, and ties its community into the site.

Since YouTube provides an API that makes this feature possible for just about any game developer, even indie game makers can offer a bonus to their users that otherwise may be prohibitive in terms of the development effort and server capacity that would be required.

Höfele includes not only code examples in his article but also a demo application with included full source code. In the feature, he addresses numerous major steps to adding YouTube integration: recording gameplay, choosing a video format, encoding screenshots, dealing with YouTube's API, and uploading the videos.

As Höfele summarizes:

"Adding YouTube upload to your game is straightforward, but requires some planning.

"To start with, you need to decide how you are going to record the game footage. From the discussed options, a framebuffer capture is the easiest to add to an existing game, but recording game state or player input gives you more flexibility to apply changes to the recording. If you want to change the camera angle later on, for example, game state recordings allow you to do that.

"Finally, an application can upload videos with YouTube's RESTful interface; a protocol based on HTTP. The demo application for this article comes with an implementation for authentication, search, and video upload requests that you can use for your own game."

November 10, 2008

Simulating Reflections For Wet Asphalt

reflection.jpgThe C0de517e blog, written by a game rendering engineer, has an in-depth post up entitled "Impossible is approximatively possible," dealing with simulating realistic reflections on wet asphalt -- in particular with filling in the gaps of what data is impossible to conclusively simulate with approximations that can be generated by the GPU.

As the author describes:

"We have two things that we don't know, the reflection direction and the travelled light ray distance between the track and the bike, and those are possible to compute only using raytracing... Let's try now to fill the holes using some approximations that we can easily compute on a GPU.

"First of all we need the direction, that's easy, if we consider our reflections to be perfectly specular, the BRDF will be a dirac impulse, it will have only one direction for which it's non zero, and that is the reflected direction of the view ray (camera to track) around the (track) normal.

"The second thing that we don't know is the distance it travelled, we can't compute that, it would require raytracing. In general reflections would require that, why are the planar mirror ones an exception? Because in that case the reflection rays are coherent, visibility can be computed per each point on the mirror using a projection matrix, but that's what rasterization is able to do!

"If we can render planar mirrors, we can also compute the distance of each reflected object to the reflection plane. In fact it's really easy! So we do have a measure of the distance, but not the one that we want, the distance our reflected rays travels according to the rough asphalt normals, but the one it travels according to a smooth, marble-like surface. It's still something!"

The author includes a few code examples, based on previous R&D work he did on a shipped racing title, and has even posted a followup blog entry that delves deeper into his blurring algorithm.

October 27, 2008

Sponsored Feature: Intel Calls All Game Companies as Partners

In an Intel-sponsored feature, the company explains the benefits of the Intel Software Partner Program, a free initiative for game companies which includes development help, tools, and computer discounts.

The cites notes the growth of integrated graphics, and notes that there are various steps developers can take to squeeze unexpected performance out of such hardware:

"Integrated graphics already dominate over discrete graphics at a ratio of nearly 2:1 in terms of market segment share, and that ratio is expected to grow substantially over the next several years.

"Discrete graphics will continue to be widely used as well, of course, so enabling games to scale across the spectrum from mainstream graphics to high-end discrete solutions is just good business sense.

"A growing number of game companies are debunking the myth that 3D-based mainstream games require discrete graphics solutions, and as of second quarter 2008, Intel had more than a 47 percent market segment share for graphics hardware, with year-to-year growth of 46 percent.1

"Scott Brown, president of NetDevil Ltd., a game-development company in Louisville Colorado, sums it up: 'People with high-end machines need to see their investment pay off with our games, but at the same time, we'd be crazy not to target mainstream graphics hardware as well.'

"By tuning the scalable aspects of gameplay to the resources available, game companies can effectively expand their target user base, while still providing an optimal visual experience for everyone."

The feature goes on to explain specific benefits of the Intel program, including development assistance and discounts on development hardware.

October 23, 2008

Sponsored Video: Threading Quake 4 and Quake Wars

In this sponsored video, Intel senior software engineer Anu Kalra discusses the principles, challenges, and lessons learned in providing multithreading assistance to the teams behind Raven Software's Quake 4 and Splash Damage's spinoff title Enemy Territory: Quake Wars.

Among other topics, Kalra noted that certain multithreading aspects improved from game to game, including one notable area concerning efficiency:

"The key thing here is that the data is not being shared across threads as much," he explains. "There definitely is data that is being shared with the rendering thread and the graphics driver thread, but between the engine and the renderer, there really isn't a whole lot of data that is being shared. In the case of Quake 4, all the dynamic meshes that are generated per frame had to be buffered and shared between the two threads, which isn't the case with Quake Wars."

In addition to the video, there are full slides available from a talk given by Kalra alongside developer Jan Paul van Waveren of Quake series creator id Software.

October 22, 2008

Feature: Optimizing Asset Processing

In an in-depth technical feature posted on Gamasutra, Neversoft co-founder Mick West discusses performance concerns when optimizing asset processing for games, including the basic nature of the common problems and in-depth solutions for keeping the pipeline efficient.

Referring to asset processing tools as "the ugly stepsister" of game development, West warns against ignoring their role.

He notes that, because of their simplicity, effort is often not spent to properly optimize them, which can end up eating away time later in the project.

Multithreading is a great way to make these tools more efficient, West notes. "Most PCs now have some kind of multicore and/or hyper-threading," he writes. "If your tools are written in the traditional mindset of a single processing thread, you’re wasting a significant amount of the silicon you paid for, as well as the time of the artists and level designers as they wait for their assets to be converted.

"Since the nature of asset data is generally to be large chunks of homogeneous data, such as lists of vertices and polygons, it’s generally very amenable to data level parallelization with worker threads, where the same code is run on multiple chunks of similar data concurrently, taking advantage of the cache."

Surprisingly, he even admits that "bad code" is permissible when it comes to processing tools, as long as the risks are properly weighed:

"In-house tools don’t always need to be up to the same code standards as the code you use in your commercially released games. Sometime you can get performance benefits by making certain dangerous assumptions about the data you’re processing and the hardware it will be running on.

"Instead of constantly allocating buffers as needed, try allocating a 'reasonable' chunk of memory as a general purpose buffer. If you have debugging code, make sure you can switch it off. Logging or other instrumenting functions can end up taking more time than the code they are logging. If earlier stages in the pipeline are robust enough, then (very carefully) consider removing error and bounds checking from later stages if you can see they are a significant factor.

"If you have a bunch of separate programs, consider bunching them together into one uber-tool to cut the load times. All these are bad practices, but for their limited lifetime, the risks may be outweighed by the rewards."

You can now read the full Gamasutra feature on the subject, posted as an independently authored piece within the Intel Visual Computing section, and which includes sample code (no registration required; please feel free to link to this feature from external websites).

October 16, 2008

Sponsored Video: Robust N-Core Capable Game Engine Design

In this video session, Intel graphics and game technologist Ron Fosner introduces the principles of designing game engines that scale to an arbitrary number of cores.

There are two main approaches, Fosner points out. "One is the traditional way of threading for speed; you parallize things to make them faster on the GPU. You want to break your game tasks up into chunks. The other way is programming and threading for features. You take advantage of all the extra CPU power that's beyond the minimal gameplay, and take it to the point where it's increasing the user experience. What you can do is continue to add content or work while you have CPU power available. You can increase the user experience as the CPU power increases, design your game engine to be multicore aware."

In part two of the session, Fosner delves more into practical techniques:

"Games are really very challenging to thread," he admits. "If you can manage to thread your game engine successfully, you can thread anything."

October 13, 2008

Progress In Contouring

In an interesting historically-tinted blog post, Nick Porcino looks at various methods of contouring in graphics rendering, starting off with the marching cubes algorithm first introduced during Siggraph '87. The method essentially divides a volume into cubes and replaces the cubes with corresponding polygons to approximate the original shape.

After quickly recapping that original principle, Porcino moves onto evolutions of that algorithm: convex contouring ("Where marching cubes generates triangles based on face intersections, convex contouring generates polygonal shapes that enclose the convex negative space within each cell"), dual contouring ("Since vertices are placed in the interior of the cubes and not on the edges of the cubes, there is more freedom as to where the vertex can be placed"), and dual marching cubes.

Dual marching cubes was introduced in 2004. As Porcino describes:

"This method aligns vertices in the tessellation with features of the implicit function. The tessellation itself occurs on a grid that is the dual of the structured sampling grid. The result is that thin features missed, or requiring a lot of subdivision are preserved with a much sparser polygonization than that resulting from a structured grid. The method proceeds by creating an octree describing the volume to be contoured, then a dual grid of the octree is created by linking vertices at the centers of each grid cell to its topological neighbours.

"The surface is then extracted using a simple extension of marching cubes to dual grids. Since each cell in the grid is topologically equivalent to a cube, the standard marching cubes tables can be used to generate the surface interior to the cell. Since the underlying representation of the data is an octree, the resulting tessellation is much sparser than Dual Contouring or Marching Cubes."

Porcino includes visual aids that demonstrate how the dual marching cubes algorithm manages to represent geometry better than the other sampling methods, even with a lower polygon count.

Despite the cube-centric tack of the post, Porcino notes that progress is being made using tetrahedral meshes, and points to the website for Jonathan Shewchuk's UC Berkeley computer science course as a useful resource to that end.

October 06, 2008

Feature: Unlocking Processing Potential: Randi Rost On CPU-Based Graphics Architecture

Intel's Randi RostThe latest Gamasutra visual computing interview shines the spotlight on Intel Graphics' external relations manager Randi Rost, a 25-year development veteran instrumental in the company's graphics hardware efforts.

One of Rost's focuses is lowering the barrier of entry to development with graphics hardware, a goal achieved in part by working directly with universities to get feedback and provide training. In this excerpt he discusses the importance of that angle:

"Randi Rost: Universities are where a lot of new technology gets dreamed up, where new algorithms get invented. Universities, particularly in the visual computing space, have been relatively shackled by the existing graphics hardware capabilities-where the entire rendering pipeline has been built into fixed-functionality silicon.

"This provides scant flexibility for researchers to innovate in term of rendering algorithms. Recently, within the last half-dozen years, the hardware pipeline has gotten to be more programmable, but there are still a lot of constraints.

"With our upcoming graphics architecture, built around a completely general-purpose CPU-based design, we're basically removing all the constraints for the rendering pipeline. We're telling researchers: 'Hey, here's an architecture where you can effectively do everything you want in software. There's no fixed functionality to get in your way. If you want to experiment with new rendering algorithms, with ray tracing, with hybrid rendering systems, if you want to replace the rasterization unit, if you want to have procedural geometry so that you can render spheres analytically (rather than breaking them down into polygons)-all of those things are possible.'

"It's a completely open, general, high-performance platform for highly parallel floating-point workloads, such as graphics. And, the basic programming model is simple: C++ code that targets x86 cores."

Rost goes on to discuss the opportunities this presents in hiring talent, and the importance of preparing the next generation of developers for the increasingly complex world of graphics hardware. You can read the full interview (no registration required; please feel free to link to this feature from other websites).

September 29, 2008

Intel Game Demo Contest Winners Announced

goo.jpgIntel has released the results of its 2008 Game Demo Contest, with winners in four categories receiving cash prizes of $12,500 as well as passes to Game Developers Conference 2009. Runners up received smaller cash prizes. All finalists were helpfully given a suite of developer tools as well as International Game Developers Association memberships.

Top honors for best overall threaded game went to Tommy Refenes of PillowFort for his Goo! demo, which also placed in the graphics category. You can see some of the evolution of Refenes' multithreading techniques in his recent feature article.

Other top honorees include Добряк's Magic Worlds, Tandem Games' Pixel & Vega in: Crunch Time, and Яков Сумыгин's Deadly Light. The winners were narrowed down from 329 entries. All demos that placed up to fifth in their categories are available for download from Intel's site.

Feature: Procedural Spooling In Games

procedural_spooling.jpgIn the latest in-depth Gamasutra technical feature, Neversoft co-founder Mick West examines how procedural generated content and compression can lead to expanding vistas for your open-world games.

Open-world game environments and objects are typically spooled from the disc as players move through an area, with scene complexity often determined by the data transfer rate of spooling and the virtual speed of the player within the world.

If a world has too much complexity, then new glitches may result when data cannot be spooled fast enough as players move from one region to another. To prevent these problems, developers can restrict players' maximum speed so there is sufficient time for the world to load, and they can place limits on scene complexity and allowable variation between regions.

To allow for more complex environments, West suggest that developers take advantage of procedural content -- content generated from mathematical descriptions of underlying forms and parameters describing the specific instance of that content -- and procedural compression:

"Procedural compression is simply storing a piece of geometry as a set of procedural parameters rather than as the final model. While this is not compression in the normal sense of the word, the effects are essentially the same, only with a vastly increased (even arbitrarily large) compression ratio.

"The disc spooling bandwidth requirements are thus greatly reduced, allowing us to pack vastly more level geometry into a small percentage of that bandwidth. The trade-off is that artists have reduced flexibility in the models they can represent, since they are constrained to the possible output of the procedural algorithms.

"We also trade some CPU resources, since the generation of geometry may require more CPU time than the standard spooling and decompressing of the raw data."

You can read the full technical article, which includes more information on procedural compression and solutions to the resource problems that come with the technique (no registration required, please feel free to link to this feature from other websites).

September 22, 2008

A Tip On (And Discussion Of) HDR Rendering Methods

hl2_lostcoast.jpgLucasArts engineer Marco Salvi has posted an in-depth explanation of his method for HDR rendering, prompted by a SIGGRAPH 2006 lecture series by Valve employees. Salvi first touches on Valve's own HDR process (added to the Source engine in 2005), which he points out runs "with MSAA on relatively old hardware [and] executes tone mapping and MSAA resolve in the proper correct order with no extra performance cost, something that a lot of modern games can't still get right today."

"Through image segmentation techniques," Valve's code "’simply’ tries to determine if the previous frame has been under or over exposed and a new exposure value is adjusted to compensate for problems with the previous frame(s)." Salvi notes that this method doesn't allow for reliably determining average logarithmic luminance, which in his experience has led to "an overall flat and over or under saturated feeling." So he introduces his own plan: "get rid of the exposure search through previous frames feedback and compute it the proper way!"

Salvi's method computes per-pixel logarithmic luminance and outputs that to the alpha channel; he includes a code example of the relevant math, and delves into further specific details.

As a bonus, Valve's Gary McTaggart and Chris Green, both presenters of the lectures that spurred Salvi's post, drop by in the blog's comment section to explain their reasoning behind the route they took for the Source engine's HDR implementation.

September 21, 2008

Sponsored Feature: A Landmark in Image Processing: DMIP

In the latest Intel-sponsored feature, Lightspeed Publishing's Lee Purcell lays out deferred mode image processing, a new addition to the Intel Integrated Performance Primitives Library, which speeds up complex image-processing tasks with up to 3X performance increases.

Purcell begins by laying out current issues in dealing with image processing, which becomes more challenging with the ever-increasing resolution and color demands that come alongside the evolutions in high-definition imaging. Intel's DMIP attempts to address the issue. As software engineer Alexander Kibkalo, quoted by Purcell, explains:

"Fundamentally, DMIP optimizes overall image-processing tasks within an application while individual Intel IP functions can be optimized without requiring knowledge of the environment or the conditions of the function call. DMIP provides the detailed descriptions of each task in DAG form and the appropriate preferences can then be applied for optimizing the routines.

"In terms of parallelization on Intel processors, DMIP tries to maintain a balance of the slice size. Keeping it comparatively small lets the slice fit into L2 cache and enable efficient pipelining. Splitting the slices into comparatively large sub-slices allows them to be efficiently handled by the available processor cores (for example, by 4 lines for a quad-core processor or by 8 lines for an 8-core processor). For achieving the best performance, the actual method of splitting should be tailored to the individual processor on which the application is being run."

Purcell also notes several secondary benefits:

"For example, matching the Intel IP function algorithms to the low-level optimizations for the Intel Streaming SIMD Extensions (Intel SSE) (from Intel SSE through Intel SSE4) can improve overall application performance substantially. Software developers and their application designs benefit from well-established, highly refined algorithms that address many of the most important programming operations. Intel's support network also adds value and utility to these libraries."

You can now read the full feature, which includes detailed information and visual aids.

September 11, 2008

Sponsored Video: Michael Langmayer On 3ds Max's Multithreading

In this video, Autodesk 3D application specialist Michael Langmayer gives a demonstration of 3DS Max and how it takes advantage of multi-core computing. 3ds Max integrates the Mental Ray rendering engine and uses distributed processing, including over a network.

"Metal Ray allows you to take advantage of multiple processors -- not only processors in this machine, but processors in the network," he says. "We have something called distributed bucket rendering. If I switch this one on, and the network here is connected to other machines, I could basically connect to four other machines having dual-core processors intalled, and I could then render with the power of four machines."

He then demonstrates the rendering process: "At the bottom you can see we're getting all those different buckets. Each bucket is actually representing one core of this machine I'm running here. It's helping the artists or the creator of any office or design-related building to render a lot quicker. ...We can utilize this in the games market for texture baking, for example. ...We're really harnessing the power of those multi-threaded machines."

Langmayer also demonstrates the 3D sculpting tool Mudbox.

September 08, 2008

A Guide To Dual-Paraboloid Reflections

dual_paraboloid.jpgKyle Hayward, a Purdue University senior studying graphics research who we've featured before, posted a useful primer to dual-paraboloid reflections, which he describes as a "view-independent method of rendering reflections just like cubemaps."

Unlike cubemaps, which can require updating up to six textures when rendering dynamic scenes, dual-paraboloid reflections only require updating two textures.

Hawyard comments: "They won't give you the quality of cubemaps, but they are faster to update and require less memory. And in the console world, requiring less is almost reason enough to pick this method over cubemaps."

He goes on to share several reference articles and a math overview before providing code for developers to experiment with generating their own reflections with paraboloid maps.

Hayward notes that there are a couple faults to watch out for in addition to reduced quality: "One drawback of paraboloid maps is that the environment geometry has to be sufficiently tessellated or we will have noticeable artifacts on our reflector. Another drawback is that on spherical objects, we will see seems. However, with objects that have convex and concave polygons, the seem won't be as noticeable."

Sponsored Feature: Multi-Threading Goo!: A Programmer's Diary

In the latest Intel-sponsored feature, Goo! developer Tommy Refenes of PillowFort matches wits with multi-threading in four gripping acts -- and emerges victorious.

Refenes begins by describing his initial attitude to threading, one which proved a poor choice: he dedicated a thread to the specific tasks of collision, only calling that thread's functions when needed and keeping it in a nonfunctioning loop otherwise. Here, he explains his mistake:

"The model of having two threads sleeping and waiting for work (at least in the way in which they were waiting) was horrible. The threads waited in an infinite loop for work, and if they didn't find any, they performed a Sleep(0) and then continued waiting for work. At the time, I thought the Sleep(0) would free the processor enough so that other threads running elsewhere on the OS would be scheduled efficiently. Boy, was I wrong. The work check loop had the two processors constantly running at 90 percent to 100 percent capacity- an absolute waste of time. To understand why, picture this scenario.

"Let's say you are in charge of five construction workers tasked with paving a road. You know that you will need at least two of them to lay blacktop and at least the remaining three to grade the road, dig the ditches, and so on. Being the efficient supervisor that you are, you wouldn't think of allowing the two workers that need to lay the blacktop to sleep in the middle of the road until the blacktop is ready to be laid. No, to work as efficiently as possible, you would put all five of them to work grading and then all five paving. "

Refenes goes on to describe how he then created a new custom multi-threaded collision engine -- which he once again threw away and rewrote since that version of the game would not run on single-core machines. By the end of Act 4, he has programming another engine, one which dynamically scales based on the number of cores available.

You can now read the full feature, which includes specific technical details and visual aids.

September 05, 2008

Paper: Multi-Threaded Shadow Volumes on Mainstream Graphics Hardware

shadowvolumes.jpgThe Intel Software Network has posted "Multi-Threaded Shadow Volumes on Mainstream Graphics Hardware," a white paper from software engineer David Bookout and Intel senior applications engineer Satheesh Subramanian, which describes techniques for generating shadows on mainstream graphics hardware in realtime.

The paper notes that though shadow volumes are able to create physically accurate shadows, thus providing a more realistic 3D scene, having an abundance of dynamic shadow-generating objects or moving light sources can affect performance.

To combat that problem, the paper covers "how shadow volumes generated on the CPU using a multithreaded approach could free up the graphics hardware from this bottleneck-creating task."

The paper also provides a shadow maps implementation based on the EmptyProject example from the Microsoft DirectX 9.0 SDK.

August 27, 2008

Supported Feature: The Whimsy of Domain-Specific Languages

greenblat.jpgIn this technical article, originally published in Game Developer magazine, Neversoft co-founder Mick West explores making your own mini-languages for games by making Whimsy, a domain-specific language that creates art based on the abstract paintings of Parappa creator Rodney Alan Greenblat.

Unlike general-purpose languages, a domain-specific language must support a large amount of functionality, such as variables, data structures, conditional expressions, looping constructs, and functions. To create domain-specific language Whimsy, West defined his domain as the works of Rodney Alan Greenblat, using pieces from the artist's Elemental tour, a collection of semi-abstract paintings in a distinctive brightly colored and geometric style:

"The idea was this: If such a style of artwork were to be used in a video game, then it might be very useful to have a DSL that encapsulated that style and allowed for easy creation of similar pieces for use in-game.

"The first step in creating a DSL is to get a rough idea of the elements that the domain comprises. Looking at the Elemental works, we can see a number of common aspects. There are concentric oval shapes, with petals adjoined to various sections.

"Many of the works have segmented circles with colored circles inside them. There are little propellers and various other shapes that repeat both within individual works and within Greenblat's overall collection.

"I decided the best way to approach creating this DSL would be to pick one piece and attempt to replicate parts of it. I chose the painting 'Lunar Module' (see image). Many common elements hold the piece in its style: solid circles, concentric ovals with color gradients, petals, and stars."

You can read the full technical feature on Whimsy, Mick West's domain-specific language, which includes code and examples of potential uses of DSLs in games.

August 26, 2008

Sponsored Video: Greg Corson On COLLADA Tools

For this week's video. SCEA software engineer Greg Corson, who has worked on interchange file format COLLADA since 2005, talks about taking advantage of the format with game development tools such as COLLADA Refinery and Coherency Test.

Corson believes that COLLADA is helping developers move away from using custom-created solutions and formats: "This was the same issue in the movie industry -- they all had their own custom software, or they bought all their software that they used from one company. Everyone realized that they can't really do that anymore. No one company has everything that you need to build a game or to do a movie.

He continues: "It's getting more and more important that you have a lot of tools that work together. Other tools tend to get discarded now, instead of just worked around. That's a big change for the industry, finally seeing everyone say 'We can buy tools from anyone we want and have it work.'"

August 22, 2008

A Guide To Dual-Paraboloid Reflections

Kyle Hayward, a Purdue University senior studying graphics research who we've featured before, posted a useful primer to dual-paraboloid reflections, which he describes as a "view-independent method of rendering reflections just like cubemaps."

Unlike cubemaps, which can require updating up to six textures when rendering dynamic scenes, dual-paraboloid reflections only require updating two textures.

Hawyard comments: "They won't give you the quality of cubemaps, but they are faster to update and require less memory. And in the console world, requiring less is almost reason enough to pick this method over cubemaps."

He goes on to share several reference articles and a math overview before providing code for developers to experiment with generating their own reflections with paraboloid maps.

Hayward notes that there are a couple faults to watch out for in addition to reduced quality: "One drawback of paraboloid maps is that the environment geometry has to be sufficiently tessellated or we will have noticeable artifacts on our reflector. Another drawback is that on spherical objects, we will see seems. However, with objects that have convex and concave polygons, the seem won't be as noticeable."

August 19, 2008

Sponsored Video: Demo of the Thread Profiler, Threading Building Blocks

Intel senior software engineer Gary Carleton returns for this week's featured video to again discuss Intel's software development tools, this time demonstrating the Intel Thread Profiler and the Thread Building Blocks.

With the Intel Thread Profiler, developers can see lock transitions for a program and when threads are active on a particular core. Carleton explained, "You can kind of get a sense of how much lock thrashing or lock activity is occurring and maybe get a visual look at how much synchronization overhead is occurring with the application."

Developers can also click on a particular thread transition to look at the source code for both the thread releasing the lock and the thread acquiring the lock. That way, they can examine the code if they feel that there's too much lock activity occurring.

Said Carleton, "The goal is to measure the amount of parallelization of the system [to see if we] are in fact using all the execution cores in the system, and then to be able to drill down into more detail as to how the threads are interacting."

August 15, 2008

Intel Publishes First Details On Larrabee

Ahead of its presentation and preview of its Larrabee graphics processing unit at SIGGRAPH 2008, Intel posted a paper on the GPU titled Larrabee: A Many-Core x86 Architecture for Visual Computing, which will be presented at the Los Angeles conference.

According to the paper, Larrabee uses "multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks." Intel notes that its pipeline is derived from the dual-issue Intel Pentium processor.

The many-core architecture, which Intel claims is a first for the industry, allows for "dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads." It also designed to increase the flexibility and programmability of the architecture as compared to standard GPUs.

Furthermore, Larrabee's native programming model supports "a variety of highly parallel applications, including those that use irregular data structures. This enables development of graphics APIs, rapid innovation of new graphics algorithms, and true general purpose computation on the graphics processor with established PC software development tools."

The chip manufacturer also added that the first Larrabee-based products are expected to launch in late 2009 or early 2010, and will target "discrete graphics applications, support DirectX and OpenGL, and run existing games and programs."

August 13, 2008

Sponsored Video: Visual Computing with Next-Generation Intel Integrated Graphics Part 1

In this first part of a three-part video series taken from an Intel Developer Forum session, Intel marketing director Steve Skibiski discusses the company's vision and strategy for visual computing with next-generation Intel integrated graphics.

Speaking on Intel's 4-series graphics architecture, Skibiski predicts that 2008 is the year of high definition and the increased importance of high definition video: "The reasons for that is that we have greater broadband penetration. With this capability, you will see standard definition on web sites going to high definition. You will start seeing high definition being introduced as widgets on web sites."

He went on to emphasize the importance of the CPU processor to the overall experience that users get on platforms: "The processor continues to be the most important driver in the visual computing experience, but innovations in high definition video processing, like accelerators and post-processing algorithms, give platform engineers more options ind developing media and entertainment PCs."

August 11, 2008

Sponsored Feature: Havok Talks Simulating Real-World Physics

In this Intel-sponsored feature, Havok managing director David O'Meara discusses the middleware firm's new products -- from Havok Destruction to Havok Cloth -- its acquisition by Intel, and plans for the future.

Speaking on how Havok benefited from the recent acquisition, O'Meara noted that Intel made it possible for the company to offer Havok Complete, its physics and animation software toolset, available to game developers free of charge (for non-commercial use) in February 2008. Both Havok and Intel sought to "boost creative game development throughout the industry" with the initiative.

He also believes that the acquisition will enable Havok to extend its capabilities beyond the entertainment industry and into serious gaming: "Serious gaming includes non-entertainment applications, such as industrial and military simulations and training. Intel offers Havok a lot of capabilities and resources to enable us to go beyond the entertainment space in a couple of year's time."

O'Meara added that there are Intel-developed technologies that Havok could take part in: "It would be very nice if we could create some commercial applications for some of the technologies that are being developed - take them out of Intel and put them into Havok. And that is part of the concept for Intel and Havok."

August 08, 2008

Sample Framework For Connecting Several Post Processes

Kyle Hayward, a senior studying graphics research at Purdue University's computer science department, has put up a useful post processing framework sample structured to handle a myriad of post processes.

He begins by placing a PostProcessComponent (component) at the very bottom of the framework's hierarchy: "What this class represents, is a single atomic process, that is, it cannot be broken up into smaller pieces/objects. Each component has an input and an output in the form of textures. A component is also able to update the backbuffer. It is the simplest object in the hierarchy and is combined with other components to form a PostProcessEffect (effect)."

An effect contains a chain of components for implementing a single post process and handles out put from one component to another. With this arrangement, "components can be independent of one another, and the effect handles linking all the components that it owns. Because of this, each component is very simple. Also, like components, effects have inputs and outputs in the form of textures. Effects can also be enabled/disabled dynamically at runtime."

He continues: "Next we have the PostProcessManager (manager). The manager is comprised of a chain of effects, and it handles linking the output of one effect to the input of the next effect. And just like with components, this enables each effect to be independent of the next. The manager also takes care of linking the backbuffer and depth buffer to components that need it."

Because each component is independent, each one is simple in its implementation, making it easier to understand the code track down possible errors in a component. "Another nice feature about this framework is that you do not need to create a separate class deriving from PostProcessEffect for each effect. In this way, you can dynamically create your effects at run time. Most of the 'magic' occurs in the LoadContent() function inside PostProcessManager."

August 06, 2008

Supported Feature: Random Scattering: Creating Realistic Landscapes

In this technical article, originally published in Game Developer magazine, Neversoft co-founder Mick West continues his acclaimed analyses by showcasing an algorithm for procedurally scattering objects such as trees across a game level.

West details two competing procedural content generation methodologies, teleological and ontogenic, describing the former approach as creating an accurate physical model of the environment and process to simulate the results, whereas the ontogenic approach observes and reproduces the end results of the process.

For this task of simulating randomly scattered trees, you'll want to make sure that the trees are not overlapping, yet are still evenly distributed: "What if we started off with the trees perfectly evenly distributed (say on a square grid), and then simply move each tree a random amount, but not so far that they can overlap? ... This solution actually works quite well."

West continues: "[The results] (pictured) look something like a cross between the pure random scatter and the minimum distance scatter. The trees are evenly distributed, but we still see some minor clumping, including two trees that overlap slightly. But overall this algorithm produces much nicer looking results than our first attempt, and it's simpler and cheaper to implement than the second."

August 04, 2008

Sponsored Video: Performance and Threading Tools for Game Developers

In this week's featured video, Intel senior software engineer Gary Carleton provides an overview of Intel tools built to help developers thread code as easily as possible, improve performance, identify bottlenecks, and take advantage of multicore.

In addition to discussing the Intel Thread Profiler, which we've featured previously when comparing lock and lock-free code, Carleton covers several other Intel threading analysis tools and development products, such as the Intel Thread Checker and VTune Performance Analyzer.

On Intel Threading Building Blocks, Carleton describes the tool as a "template library that sits on top of the of threading APIs and allows the programmer to not have to focus on thread creation and the tread APIs in great detail." Programmers present Threading Building Blocks with a task, and it breaks up the work inside the task and partitions the work out to whatever CPU cores are available for execution.

He added that the template library has multiple purposes: : "[It makes] threading easier to release the programmer from being concerned about the details of programming, [and it makes] the program scalable. The threading building blocks at runtime will determine what CPU cores are available and actually dynamically do that. As cores become available or unavailable, it will modify the workflow according to that."

August 01, 2008

Paper: Geometry-Aware Framebuffer Level of Detail

Hong Kong University of Science and Technology's Pedro Sander and Lei Yang, along with University of Virginia assistant professor Jason Lawrence, have published and made available online a paper titled "Geometry-Aware Framebuffer Level of Detail" for Eurographics Symposium on Rendering 2008.

The paper introduces a framebuffer level of detail algorithm for controlling an interactive rendering application's pixel workload: "Our basic strategy is to evaluate the shading in a low resolution buffer and, in a second rendering pass, resample this buffer at the desired screen resolution. The size of the lower resolution buffer provides a trade-off between rendering time and the level of detail in the final shading."

To reduce approximation error, a feature-preserving reconstruction technique that approximates the shading near depth and normal discontinuities is used. The paper also demonstrates "how intermediate components of the shading can be selectively resized to provide finer-grained control over resource allocation," as well as a simple control mechanism that "continuously adjusts the amount of resizing necessary to maintain a target framerate."

The techniques covered do not require any preprocessing and are straightforward to implement on GPUs. In addition, they have been shown to provide "significant performance gains for several pixel-bound scenes."

[Via Level of Detail]

July 30, 2008

Sponsored Video: Allegorithmics Real-Time Graphics Texture Streaming for Games

Demonstrating developer Allegorithmic's Substance engine, marketing and sales director Alexis Khouri spoke with Intel Software Network about real-time graphics texture streaming for games.

According to Khouri, Allegorithmic's optimized engine allows for bigger game worlds, richer content, and more details: "What is really interesting is that our technology scales very well with multi-cores. We have more than 90% scalability, which means, for example, with 4 cores, we can have almost 14 megabytes per second of compressed textures."

For the demonstration, Allegorithmic produced its own level of Unreal Tournament 3 to show off what can be expected with the company's procedural texture technology: "What is interesting is that the whole texture package of this demonstration fits in 280 KB instead of 300 MB. So, what we did was divide the size of the textures by a thousand."

July 28, 2008

Arauna Real-Time Ray Tracing Updated

NHTV's International Game Architecture and Design senior lecturer and team manager Jacco Bikker has released a new demo and source package for his Arauna real-time ray tracing application. The package - which includes static library files for building a demo application, three tutorial applications, and source code needed to build the libraries and the demo application - can be used freely for non-commercial applications.

According to the Bikker's site, Arauna is an experimental and stable real-time ray tracer developed specifically for game development. While the application isn't yet capable of delivering the performance needed to produce graphics comparable to modern games using a GPU, it is one of the fastest renderers in its class.

Arauna's features include "real-time ray tracing of large triangle meshes (up to 2M per 1GB of memory), full HDR pipeline with post-processing for HDR glow, recursive reflection and refraction, accurate shadows from an unlimited amount of point lights, texturing with bilinear filtering and normal mapping," and more.

Two games have been developed with Arauna so far, both of which were created by students enrolled in NHTV University of Applied Sciences' IGAD program. The most recently completed game, Let There Be Light (pictured), was released earlier this month and is available for download at the Arauna site.

July 25, 2008

Supported Feature: Practical Fluid Dynamics: Part 2

Following up his recent article on practical fluid dynamics, Neversoft co-founder Mick West further explains the technical details - including source code - of creating dynamic fluid systems such as smoke for video games, using nothing more complex than basic algebra.

On creating smoke, West notes that simulating the surface of the fluid isn't the goal, rather its is more interesting to visualize substance suspended by and carried around by the fluid. "With water, we might have silt, sand, ink, or bubbles. In air, we could see dust, steam, or smoke. You can even use the velocity field techniques outlined here to move larger object, like leaves or paper in a highly realistic manner."

He continues: "It's important that what we're talking about is a suspension of one substance in another. We are generally not so interested in simulating two fluids that do not mix (like oil and water)."

Modeling smoke should be approached as a suspension of tiny particles in the air and not as a gas: "These tiny particles are carried around by the air, and they comprise a very small percent of the volume occupied by the air. So we do not need to be concerned about smoke displacing air."

To simulate smoke, West suggests adding "another advected density field, where the value at each cell represents the density of smoke in that region." In his accompanying code, he refer to this as "ink," as it's similar to the density of air, except "the density of smoke or ink is more of a purely visual thing and does not affect the velocity field."

July 23, 2008

Sponsored Video: Game Creators: Threading for Games

For this week's featured video, Game Creators CEO Lee Bamber describes how the company approached Intel about obtaining logos to use with their Windows Vista and Direct X10 game creation tool, FPS Creator X10, only to come away with the goal of integrating multicore support with its software.

Game Creators picked three software-slowing areas to target with multicore, the first being artificial intelligence: "The problem was that when we had a hundred characters rendered into the scene, the CPU slowed down on a single core because there was just too much AI processing going on. We spread out the AI calculation across all the cores, which brought our speed back up... so we could have a hundred characters running around a confined space, and we also got our performance back."

The team then accelerated line mapping, having found that with a single core, calculating all the shadows in a scne with lots of light could sometimes takes up to 180 seconds. "When we otimized with multicore technology on a quad core, the same thing took 53 seconds. So, [there was] a dramatic reduction in the waiting time to be able to go and play the game."

The third area Game Creators focused on was video capture. The studio wanted players to be able to capture footage to share with their friends and actually incorporate into their game-making process for cutscenes, but as soon as the feature was added, the game slowed to a crawl: "What we decided to do was get a separate core, and [we] said that core is just going to be video capture. That core sits there, and then captures the game which was on the other cores. The idea is that when you switch on video capture, the game doesn't slow down. So, you can capture the video at full frame rate, and you actually get a cool video at the end of it that you can use during your game-making process."

July 21, 2008

Deriving Exponential Shadow Maps Simplified

LucasArts PlayStation 3 engineer Marco Salvi has posted a conceptually simpler, more intuitive way to derive exponential shadow map (ESM) equations, which he discovered while working on an improved version of exponential shadow maps.

According to Salvi, there is "no need to invoke Markov's inequality, higher order moments, or convolutions." He goes on to rewrite the "basic percentage closer filtering (PCF)" equation with several adjustments before arriving at the more streamlined ESM occlusion term formula pictured.

He concludes: "Exponential shadow maps can be seen as a very good approximation of a PCF filter when all the occluders are located in front of our receiver (no receiver self shadowing within the filtering window). There's not much else to add, if not that this new derivation clearly shows the limits of this technique and that any future improvements will necessarily be based on a relaxed version of the planar receiver hypothesis."

July 16, 2008

Sponsored Video: Optimizing DirectX on Multi-Core Architecture Part 1

Intel technical marketing engineer Brad Werth delivers this week featured video, discussing an informative set of slides on "Optimizing DirectX on Multi-Core Architectures" put together by Intel Applications engineer Leigh Davies.

Referring to Amdahl's Law, which states that "the amount of speed up you can get from a parallel processor is limited by the portion of that work which can only be done serially (or what portion of your game has to run on only one processor and only one core at one time)," Werth notes that for games, that portion of work is generally rendering.

He goes on to advise that developers maximize portions of rendering that can be moved off onto other cores, minimizing aspects of rendering which must be done serially. "You need to be testing this and analyzing it during development. You want to be able to know whether each of the changes you make are in fact optimizations or not in terms of your balance of graphics computations and CPU optimizations."

July 14, 2008

Getting Started On Programming, 3D Effects

Hoping to set budding programmers in the right direction, rendering engineer Angelo Pesce has provided several useful resources for anyone looking to get started with programming and 3D effects.

He begins by suggesting that novices look into Processing: "It's the most fun language I know of, and it's Java basically, so you will do graphics in a mainstream programming language. There are a few tutorials and courses on the processing site itself, there is an incredibly active community, and Java is an incredibly widespread language, you won't have any problems in finding tutorials and books for starters." He went on to also suggest Hackety Hack, a Ruby-based environment targeted at beginners.

Pesce then advises that aspiring programmers try C++ (or C#) and OpenGL. "You could start with the famous NeHe graphics programming tutorials. Another good way is C# and XNA, especially if you have a 360. Then you will need CG to code shaders." From there, programmers should start reading relevant books, papers, and anything else they can get their hands on.

He concludes: "Don't EVER think to know enough. You don't. If you've been programming for 4-5 years and took a 5 years university course, then you will have just the basics that are required to be able to understand almost anything, with some effort. They give you only the alphabet, from there on, there is the real knowledge!"

July 11, 2008

Distribution-Based BRDFs Paper Released

SCEA programmer Naty Hoffman pointed us to a useful technical report from researchers Michael Ashikhmin and Simon Premoze on bidirectional reflectance distribution function, a formalism often used for representing surface reflection properties in computer graphics. Titled Distribution-based BRDFs, the paper is now available online.

Ashikhmin and Premoze outline several characteristics desirable in a useful BRDF representation:

  • represent a significant number of real-world materials with sufficient accuracy for visual applications;
  • use measured data and allow acquisition of the necessary information for existing materials quickly and easily;
  • be able to model new materials from scratch, i.e., not rely exclusively on measured data;
  • respect basic physical properties of non-negativity, reciprocity and energy conservation;
  • allow efficient sampling in a Monte-Carlo rendering system;
  • allow straightforward hardware implementation;
  • be compact;
  • have at least a semi-intuitive interpretation and be simple to use for non-BRDF experts.

Though many BRDF models have been proposed, the report notes that most of those models do not have "at least some of the desirable practical properties." Distribution-based BRDFs aims to present a simple, flexible model satisfying many of the above requirements: ]"We show that the proposed model provides a good approximation for many real world materials, obeys basic physical restrictions, allows straightforward hardware implementation and provides for efficient sampling in a Monte-Carlo rendering system. A procedure to fit the model to BRDF measurement data is presented which suggests a simplified way of measuring surface reflection."

July 10, 2008

Supported Feature: Programming Responsiveness

If you can't control your actions in a game, might the game be to blame? In this Intel-supported technical article originally published in Game Developer magazine, Neversoft co-founder Mick West examines the problem of response lag in games, along with a number of possible solutions.

Response lag can be described as "the delay between the player triggering an event and the player receiving feedback (usually visual) that the event has occurred." Whenever the delay is too long, the game feels unresponsive. It's easy to see how responsiveness can make or break a game at first impression.

According to West, if your game is unresponsive, it could be the result of cumulative effects of several different factors. "Adjusting one factor alone may not make a perceptible difference, but addressing all the factors can lead to a noticeable improvement."

Players, and sometimes even designers, cannot always put into words what feels wrong when a game is unresponsive, sometimes simply concluding that the game sucks, without really understanding why. West argues that "designers and programmers need to be aware of response lag and the negative effect it has on a game, even if test players do not directly report it as a factor."

July 09, 2008

Sponsored Video: Mobile Gaming – A New Way to Play Multi-User Games

In our featured video for the week, Intel Software Network blogger and engineering manager Scott Crabtree describes “Carry Small, Game Large,” a new gameplay model for mobile video games which promises a a big, shared multiplayer gaming experience.

Crabtree boasts that with the new model, players can "walk up with a laptop, a handheld, or any mobile computer that can browse the internet, connecting to a multiplayer game projected on a big screen." This way, everyone is playing together in the same place.

Some of the example games the engineer cites for this setup include a group jigsaw puzzle, a multiplayer tank game, and a poker game. He went on to explain the appeal of the gameplay model: "Typically people are at home, looking at their own private view of the world. [With] this way of playing, everyone comes together, and everyone can talk to each other as they're playing whatever game is up on the big screen. "

July 07, 2008

Sponsored Post: Intel - Multicore Outperforms Graphics Processors With Ray Tracing?

Delivering a technology futures speech in mid-June, Intel chief technology officer Justin Rattner stated that Intel's "aggressive multicore" approach trumps a graphics processor when the cores use ray tracing as opposed to rasterization, according to financial news site Forbes.com.

It's possible that this line of thinking could explain what the company has planned with all of the processor cores it has in its road map for future chips. Rattner affirmed, "Ray tracing is squarely in Intel's future."

A company spokesman added at a recent Intel research briefing that ray tracing goes hand-in-hand with parallel computing, as it is capable of scaling across thousands of cores. Rattner believes that graphics processors are "fundamentally tied" to the raster architecture.

AMD chief technology officer Raja Koduri, however, argued the opposite, that graphics processors are better for ray tracing over a general-purpose multicore processor, as it can be "tuned to the application." Koduri claims that multicore graphics processors, along with accelerator technology found in graphics subsystems, can also be used for ray tracing.

With a wide range of 3D modeling applications ideal for ray tracing, such as engineering and motion picture animation, it's in both companies' interest to produce the chips best suited for ray tracing. But it will be consumers who decide in the end which solution works best.

July 02, 2008

Sponsored Video: Behind the Scenes at Project Offset

In this week's featured video, Project Offset technical director Sam McGrath shares demo footage for the studio's first-person shooter, Project Offset (working title), which takes advantage of future graphics cards. Built on the developer's Offset Engine, the game is set in "an epic fantasy world rendered at cinematic quality.

As McGrath points out, all of the game footage in the above video is rendered in real time inside Project Offset's game engine. "Every object in the world casts and receives shadows, including corrective self-shadowing on all objects, even complex objects."

The engine is also capable of rendering thousands of particles, each casting its own soft shadow into the scene. He adds that the engine also performs motion blur instead of process: "Rather than being a simple special effect that only works in certain situations, the motion blur works uniformly on everything."

June 30, 2008

Implementing Fluid Effets

Simulating convincing fluids in computer games is not only computationally expensive, but often mentally expensive as well, with even introductory papers on the subject requiring the reader have math skills at least at the undergraduate calculus level. In his technical article originally printed in Game Developer magazine, Neversoft co-founder Mick West explains how fluid effects work without using advanced equations, providing example code and explaining how to simulate fluids without expensive iterative diffusion and projection steps.

West begins by describing the two common styles for simulating the motion of fliuds - grid methods and particle methods. When the grid method style, he explains, "The fluid is represented by dividing up the space a fluid might occupy into individual cells and storing how much of the fluid is in each cell. In a particle method, the fluid is modeled as a large number of particles that move around and react to collisions with the environment, interacting with nearby particles. Let's focus first on simulating fluids with grids."

He then notes that the simplest way to discuss the grid method is in respect to a regular two-dimensional grid: "At the most basic level, to simulate fluid in the space covered by a grid you need two grids: one to store the density of liquid or gas at each point and another to store the velocity of the fluid."

In addition to the two grids, programmers can use any number of other matching grids that store various attributes: "Each will be stored as a matching array of floats, which can store factors such as the temperature of the fluid at each point or the color of the fluid (whereby you can mix multiple fluids together). You can also store more esoteric quantities such as humidity, for example, if you were simulating steam or cloud formation."

June 25, 2008

Sponsored Video: Getting The Most Out Of Your Game With Integrated Graphics

For our featured video this week, NetDevil president and Intel Software Partner Program Member Scott Brown took a few minutes to share with viewers how the studio works with Intel's chipsets, as well as how they've collaborated with the semiconductor in the past to fine-tune their games.

With Jumpgate Evolution, NetDevil's sci-fi MMO, the developer specifically set out to make their game run on as many different computers and setups as possible. According to Brown, the title owed a significant portion of its accessibility to Intel's integrated set: "We've been able to get a lot more users capable of playing our game than without."

In addition to using vTune extensively for all of its performance analysis, NetDevil brought Intel in to help with its vehicular combat MMO, Auto Assault. They were not only able to add server-side optimizations, but push the game towards supporting the entire video chipset.

For its next big project, Lego Universe, the developer is again focusing on supporting all the chipsets: "That's meant to be a kids game ... With kids, they always get the old computer in the family, so you got to make sure, especially with kids software, that you really support it as much as you can."

On future trends, Brown added: "I think you're gonna see more CPU-side work. Now with dual core and multicore, the chips can handle much more than what the video games are giving them. I think you'll see a lot of people doing some really cool stuff with that and taking advantage of the next generation of chips."

June 23, 2008

Game AI Papers Posted For SIGGRAPH

Game artificial intelligence site AiGameDev has rounded up a collection of noteworthy research project papers posted so far for the coming SIGGRAPH 2008 conference, picking out papers most relevant for creating artificial intelligent characters, including the two summarized below.

In Group Motion Editing, a paper and demo put together by Taesoo Kwon, Kang Hoon Lee, Jehee Lee, Shigeo Takahashi, the project presents an approach to "editing group motion as a whole while maintaining its neighborhood formation and individual moving trajectories in the original animation as much as possible."

As explained in the abstract: "The user can deform a group motion by pinning or dragging individuals. Multiple group motions can be stitched or merged to form a longer or larger group motion while avoiding collisions... The usefulness and flexibility of our approach is demonstrated through examples in which the user creates and edits complex crowd animations interactively using a collection of group motion clips."

In Real-time Motion Retargeting to Highly Varied User-Created Morphologies, a technical paper on Spore's procedural animation system, the project covers "a novel system for animating characters whose morphologies are unknown at the time the animation is created."

According to the abstract: "Our authoring tool allows animators to describe motion using familiar posing and key-framing methods. The system records the data in a morphology-independent form, preserving both the animation's structural relationships and its stylistic information. At runtime, the generalized data are applied to specific characters to yield pose goals that are supplied to a robust and efficient inverse kinematics solver."

With this system, characters with highly varying skeleton morphologies can be animated despite those morphologies not existing when the animation was authored. As a result, the character's animation can turn out to be "radically different" than what the original animator envisioned.

June 20, 2008

Solving Old Triangle Rendering Problems

Programmer Timothy Farrar has collected several solutions he has come across for old problems he has encountered during his trials with triangle rendering.

On level of detail and large view distance occlusion, Farrar suggests you build surfaces up with layers of small triangles: "Each layer independent. New LOD blends in new fine layer, blends out a previous course layer, keeping enough middle layers to insure a good effect. Course layers are mostly inner hulls, fine LOD layers mostly extend the surface, or split into disjoint shapes."

He describes the method as similar to an artist painting a scene, laying out rough shapes of color (lower LOD layers) before refining them (fine LOD layers): "Occlusion culling is an easy and solved problem, with occlusion queries and a hierarchical layered world structure."

With the LOD problem solved, developers can now control the overall size of triangles in a given region on the screen: "Small triangles enables more work to be pushed back to the vertex shader. Even soft shadows and diffuse transparency effects. Gets rid of the need for multiple passes for lighting. All lighting + dynamic global illumination computed per vertex, interpolate spherical harmonics for single pass per pixel lighting."

Farrar continues: "Keep in mind at 720P at 30fps, the GPU is solving for only 30 Mpix/sec, but has the capacity of over 200 Mtri/sec in setup. So small triangles are not a problem (until reaching the size of micro triangles, with bad pixel quad utilization)."

June 18, 2008

Lock and Lock-Free Code Compared, Optimized With Intel Thread Profiler

CD Projekt Red (The Witcher) senior programmer Maciej Sinilo has posted his experience with using an evaluation version of Intel Thread Profiler on a dual-core machine to see if there was any real difference between code based on locks and lock-free versions with his multi-threaded experiments.

Aimed at helping users "tune multi-threaded applications faster, for optimal performance on Intel multi-core processors," Intel Thread Profiler enables developers to visualize what percent of code is optimally parallel and where application performance issues exist.

Using Intel Thread Profiler's timeline view, Sinilo found that the average concurrency for his test using code based on locks was 1.99, with 5902.61 transitions per second. His results for the lock-free implementation test showed average concurrency at 3.04 with 57.36 transitions per second.

Sinilo found the lock-free implementation better, but not better. He tried a few methods to optimize the results, including eliminating as many semaphore waits as possible and a trick he found in Intel's Threading Building Blocks: "Every worker thread has its own task queue, if it's empty it tries to steal work from another thread. It wont sleep immediately, instead spin a little bit trying to steal something (yielding from time to time and pausing for a very short periods of time… It may be tricky to fine tune this).

He continued, "Eventually, it may wait, but it shouldn't happen that often. Wake-up events are not signaled every time task is added, only when changing queue state from empty to full (possible contention here, as I guard gate state variable, but it's very short). I do not need to implement work-stealing, as all threads acquire tasks from one queue, so it more or less auto-balances itself. I simply spin a little bit waiting for new task to arrive."

His final lock-free implementation had a an average concurrency of 3.58 with 6.29 transitions per second. Sinilo admitted that the improved results didn't amount to much on his dual-core system: "Anyway, what's [the] 'real' difference between all those versions? Not that big, honestly, fractions of one frame. I blame [the] test machine, partially. I may test it on my quad-core work machine and will get back with the results (if they're interesting)."

Supported Feature: 'Microsoft Flight Simulator X & Multi-Threading'

The seminal Flight Simulator franchise is embracing multithreading with the latest version, Microsoft Flight Simulator X, and in this supported feature for Intel's Visual Computing microsite, engineers explain the threading techniques that help enhance the sim's visuals.

Though it might not be obvious at first, processor demands are considerable for a complex application like Microsoft Flight Simulator, which requires launching a computer-generated aircraft, tracking and displaying the craft's movements above diverse landscapes, and responding to flight maneuver physics: "For many years (since 1982 when the IBM* PC version was released), Microsoft Flight Simulator has pushed the boundaries of processing power and graphics display capabilities.

Not everyone realizes that the first version of Flight Simulator, created by Bruce Artwick, flew on an Apple* II computer in 1980, where budding pilots had to use a lot of imagination with only a four-color or monochrome screen to display the surroundings and a rudimentary two-gauge panel that delivered airspeed and altitude data.

The second generation Microsoft release, FS 1.0, modeled the behavior of a Cessna 182, improving on the prior Apple version by offering eight gauges, an improved coordinate system, four unique scenery areas with 20 airports to choose from, a pair of COM radios, and distance measurement equipment (DME).

The simulator factored weather into the flight performance, giving the user nine different view directions, but the display characteristics were closer to abstract art than photorealism, with only four colors plus dithering to replicate the cockpit and scenery."

Improvements in Microsoft Flight Simulator over the past 25 years have followed improvements in personal computing as displays, and an Intel-Microsoft joint effort has led to multi-threading specific changes in the latest version:

"The collaborative engineering engagement between Microsoft and Intel took place over approximately six months, beginning in December 2006. Early in the engagement, the Microsoft developers and Intel development support team targeted their efforts on enhancing the visual quality of Flight Simulator X, rather than on improving the frame rate.

The Flight Simulator X Service Pack 1 download includes these performance improvements and graphics enhancements along with the architecture optimizations and the multi-threading capabilities."

You can now read the full feature, with engineers explaining the threading techniques that've helped enhance Microsoft Flight Simulator X's visuals (no registration required, please feel free to link to this feature from other websites).

June 16, 2008

Paul Debevec To Speak At Mundos Digitales, Procams 2008

In addition to his SIGGRAPH 2008 sessions on high-dynamic-range imaging and image-based lighting, Paul Debevec, USC IT's Graphics Research associate director and HDRI/IBMR pioneer, will be appearing at a couple more upcoming events which you might want to mark on your calendar.

At Mundos Digitales 2008, a five-day (July 1-5) international festival in Spain focusing on animation, visual effects, and videogames, Debevec will hold an Electronic Theatre screening and a session titled "New Techniques for Acquiring, Rendering, and Displaying Human Performances." He plans to present "recent work for acquiring, rendering, and displaying photo real models of people, objects, and dynamic performances. "

Other notable points for his Mundos Digitales talk include "a new 3D face scanning process that captures high-resolution skin detail by estimating surface orientation from the skin's reflection of polarized spherical gradient illumination," as well as "a new 3D display that leverages 5,000 frames per second video projection to show autostereoscopic, interactive 3D imagery to any number of viewers simultaneously."

At Procams 2008, a projector-camera systems international workshop co-located with SIGGRAPH 2008 in Los Angeles, Debevec will be offering attendees an open-house tour of the USC ICT graphics laboratory, where "an impressive array of projector-camera systems" are being used and developed.

June 13, 2008

Rendering With Correct Math And Physics

Rendering engineer Angelo Pesce has posted an interesting two-part piece on the importance of rendering with correct math and physics instead of hacks. He argues that when when rendering engineers don't know what they're doing, they're giving artists models that can't achieve the result they're aiming for or that are too complicated.

According to Pesce, two problems can arise when artists take whatever models they're given and tweak them in unpredictable ways to make them fit the idea they want to express: "The first one is that such tweaking could end up with suboptimal use of our precious, scarce, computing resources. The second, somewhat related to the other, is that bad models could be too complicated to fit [and] to find parameters that achieve the desired look."

In order to avoid these problems, he suggests working together with artists to see what they're trying to do and ask them to make prototypes with their DCC tools. That way, you can see if it's possible to express their art in a more procedural way.

Pesce also advises that you find out what your artists need and the physics, basing models on their needs and good math: "Good math does not mean correct physics, we are far from that in real-time rendering, but reasonable physics, models that are based on solid ideas."

June 11, 2008

Shading Calculations At A Lower Than Per-Pixel Frequency

3D graphics programmer Jeremy Shopf has written up an interesting post documenting techniques for "performing shading calculations at a frequency lower than per-pixel." Depending on lighting effect frequency and surface orientation relative to camera, developers can compute values at a lower resolution or at an adaptive sampling.

On the likely most used technique, Off-screen Particles, Shopf explains, "The basic idea is that you blend your particles into a lower than screen resolution texture to save on fill rate. Particles, like smoke, are notorious fill rate hogs due to massive amounts of overdraw and potentially expensive per-pixel lighting computations. In order to get correct particle occlusion, a manually downsampled depth buffer has to be used for depth testing the lower resolution particle buffer." He goes on to mention a problem that can develop with particle-scene intersections, offering a solution and suggestions for optimizations.

Shopf also summarizes Bilateral Upsampling: "The concept is to bilinearly interpolate lower resolution illumination results while applying a weighting function to interpolants so that values aren't interpolated across boundaries. In some situations it is required to perform additional computation near boundaries. This situation is handled similarly to the off-screen particle method: edge detection and selective higher resolution calculation in edge regions."

The third and final method Shopf examines is Adaptive Soft Shadows with Push-Pull Reconstruction, a technique introduced by Gaël Guennebaud: "This trick computes soft shadow amounts at adaptive resolution rather than simply at a lower resolution as in the previously discussed methods. While not as simple to upsample, computing values adaptively based on surface orientation and shadow frequency allocates more fidelity in regions that need it. The other methods simply pick some lower resolution and forgo any higher frequency information."

Supported Feature: A More Accurate Volumetric Particle Rendering Method Using the Pixel Shader

Many games, even on current "next-gen" hardware, render particles using camera facing quads. In his Intel-supported Visual Computing piece, veteran coder Mike Krazanowski (Tomb Raider: Anniversary) suggests a neat alternative solution using pixel shaders and a little bit of math.

Though rendering particles using camera facing quads has been a practice employed in games for many years, Krazanowski's method uses shader technology to "give a more accurate visual representation of the simulated volumes as well as potentially decrease the necessary number of particles, which in turn will help to improve render performance."

According to Krazanowski, shaders have served as a significant advancement for the software developer's ability to define the functions used to render a scene in the hardware: "Even on cheap consumer graphics hardware, the software developer can define almost any function imaginable (usually only limited by available registers and functions made available to the shading language)."

He continues: "Using pixel shaders, render-to-texture technology and a little bit of math, I claim that we can more correctly simulate the volumes that the particles were intended to represent."

June 09, 2008

SIGGRAPH 2008 Highlights Announced

Organizers behind annual computer graphics conference SIGGRAPH have announced several highlights for the 2008 event, including an expanded format for the Computer Animation Festival and featured speaker presentations.

The expanded Computer Animation Festival will feature a variety of variety of competition screenings at Los Angeles' Nokia Theatre, on-site awards presentations, talks, discussions, panels, and more. In addition, SIGGRAPH 2008 will host the return of FJORG!, the 32-hour international computer graphics "iron-animator" competition.

Walt Disney and Pixar Animation Studios President (and Pixar co-founder) Ed Catmull has been scheduled as a featured speaker, as has artist and U2 3D film director Catherine Owens. Also, new tech demos for innovative technologies and applications, such as Origami Optics, Rome Reborn, and ZCam, will be available for attendees to interact with.

SIGGRAPH 2008 will take place on August 11 to August 15 in Los Angeles, California and is expected to draw an estimated 30,000 industry professionals from around the globe.

June 05, 2008

The Best Way To Learn OpenGL?

OpenGL is, of course, an important and seminal standard for cross-platform graphics, and on student Gail Carmichael's blog, she recently asked an open question on the best way to learn about the OpenGL standard.

There's a couple of different approaches espoused in the comments - though we'd be interested in hearing your own perspectives.

In particular, commenter Robert notes: "This is going to sound dumb, but I find the best thing to learn opengl from is... the opengl specification document. Yeah, it's pretty terse, but it does actually explain things quite well and you know for sure it won't give you someone's crazy misleading interpretation of how things work, because you are reading the gospel."

Alternatively, ARBaboon notes: "NeHe (http://nehe.gamedev.net) is the best resource that I have found. I have to second [toolkit API, mentioned by a previous commenter] GtkGLExt. Gtk+ is very well designed and GtkGLExt makes it a simple as saying "make this widget use OpenGL." It also provides offline drawing which can be a pain to do cross platform. If you want to do special effects that is all you need. If you want to do complex 3D you are going to want think about a scene graph."

May 30, 2008

Engel Details Light Pre-Pass Renderer

Rockstar lead graphics programmer and notable visualization community figure Wolfgang Engel recently updated his Diary Of A Graphics Programmer blog, a must-read for anyone interested in visual computing, with an idea for a new rendering design.

Describing the design as a Light Pre-Pass Renderer, Engel explained: "The idea is to fill up a Z-buffer first and also store normals in a render target. This is like a G-buffer with normals and Z values. So compared to a deferred renderer, there is no diffuse color, specular color, material index, or position data stored in this stage. Next the light buffer is filled up with light properties. So the idea is to differ between light and material properties."

As the Light Pre-Pass Renderer is designed to be flexible and scalable, Engel expects programmers to produce different results based on the rendering design: "Obviously, my initial approach is only scratching the surface of the possibilities." More recently, he's revealed that he talked at the University Of California San Diego about the concept, and plans to publish more information on it in the near future.

May 29, 2008

Sponsored Feature: Procedural Terrain Generation With Fractional Brownian Motion

Accompanying the launch of the Visual Computing microsite, Intel Software and Solutions Group software engineer Jeff Freeman has put together an article demonstrating several techniques (including the source code) for creating realistic terrain scenes on systems with integrated graphics solutions.

For the demonstration, Freeman mixed terrain patch generating techniques proposed by Dr. F Kenton Musgrave with texture blending and Shader Model 3.0 to create a synthetic scene on integrated graphics solutions: "Our implementation was inspired by Musgrave's work in Texturing and Modeling: A Procedural Approach, showcasing three methods from that text: simple fBm, hybrid fBm, and the ridged multifractal algorithm, each based on Perlin's noise algorithm. The output from these methods is used to perturb the Z direction of a fixed size polygon mesh."

Freeman notes that a number of interesting fractal terrain generation problems still need to be tackled: "Applications of automatic landscape generation face decisions associated with conflicting game-play elements in the storyline or unrealistic features that present themselves from both fBm and other fractional models of terrain."

He continues: "In addition, most terrain generation methods are calculation intensive and are not real-time. While some fractal algorithms lend themselves easily to multi-threading, the result is still time consuming as is the case with the Mandelbrot set and may not apply well without significantly reducing the size of the perturbed surface greatly or reducing the number of iterations inspected."


This specially written weblog combines Gamasutra and Intel knowhow to present and deconstruct the latest happenings in visual computing and game technology.

Editor: Chris Remo