November 10, 2008

Simulating Reflections For Wet Asphalt

reflection.jpgThe C0de517e blog, written by a game rendering engineer, has an in-depth post up entitled "Impossible is approximatively possible," dealing with simulating realistic reflections on wet asphalt -- in particular with filling in the gaps of what data is impossible to conclusively simulate with approximations that can be generated by the GPU.

As the author describes:

"We have two things that we don't know, the reflection direction and the travelled light ray distance between the track and the bike, and those are possible to compute only using raytracing... Let's try now to fill the holes using some approximations that we can easily compute on a GPU.

"First of all we need the direction, that's easy, if we consider our reflections to be perfectly specular, the BRDF will be a dirac impulse, it will have only one direction for which it's non zero, and that is the reflected direction of the view ray (camera to track) around the (track) normal.

"The second thing that we don't know is the distance it travelled, we can't compute that, it would require raytracing. In general reflections would require that, why are the planar mirror ones an exception? Because in that case the reflection rays are coherent, visibility can be computed per each point on the mirror using a projection matrix, but that's what rasterization is able to do!

"If we can render planar mirrors, we can also compute the distance of each reflected object to the reflection plane. In fact it's really easy! So we do have a measure of the distance, but not the one that we want, the distance our reflected rays travels according to the rough asphalt normals, but the one it travels according to a smooth, marble-like surface. It's still something!"

The author includes a few code examples, based on previous R&D work he did on a shipped racing title, and has even posted a followup blog entry that delves deeper into his blurring algorithm.

October 27, 2008

Sponsored Feature: Intel Calls All Game Companies as Partners

In an Intel-sponsored feature, the company explains the benefits of the Intel Software Partner Program, a free initiative for game companies which includes development help, tools, and computer discounts.

The cites notes the growth of integrated graphics, and notes that there are various steps developers can take to squeeze unexpected performance out of such hardware:

"Integrated graphics already dominate over discrete graphics at a ratio of nearly 2:1 in terms of market segment share, and that ratio is expected to grow substantially over the next several years.

"Discrete graphics will continue to be widely used as well, of course, so enabling games to scale across the spectrum from mainstream graphics to high-end discrete solutions is just good business sense.

"A growing number of game companies are debunking the myth that 3D-based mainstream games require discrete graphics solutions, and as of second quarter 2008, Intel had more than a 47 percent market segment share for graphics hardware, with year-to-year growth of 46 percent.1

"Scott Brown, president of NetDevil Ltd., a game-development company in Louisville Colorado, sums it up: 'People with high-end machines need to see their investment pay off with our games, but at the same time, we'd be crazy not to target mainstream graphics hardware as well.'

"By tuning the scalable aspects of gameplay to the resources available, game companies can effectively expand their target user base, while still providing an optimal visual experience for everyone."

The feature goes on to explain specific benefits of the Intel program, including development assistance and discounts on development hardware.

October 23, 2008

Sponsored Video: Threading Quake 4 and Quake Wars

In this sponsored video, Intel senior software engineer Anu Kalra discusses the principles, challenges, and lessons learned in providing multithreading assistance to the teams behind Raven Software's Quake 4 and Splash Damage's spinoff title Enemy Territory: Quake Wars.


Among other topics, Kalra noted that certain multithreading aspects improved from game to game, including one notable area concerning efficiency:

"The key thing here is that the data is not being shared across threads as much," he explains. "There definitely is data that is being shared with the rendering thread and the graphics driver thread, but between the engine and the renderer, there really isn't a whole lot of data that is being shared. In the case of Quake 4, all the dynamic measures that are generated per frame had to be buffered and shared between the two threads, which isn't the case with Quake Wars."

In addition to the video, there are full slides available from a talk given by Kalra alongside developer Jan Paul van Waveren of Quake series creator id Software.

October 22, 2008

Feature: Optimizing Asset Processing

In an in-depth technical feature posted on Gamasutra, Neversoft co-founder Mick West discusses performance concerns when optimizing asset processing for games, including the basic nature of the common problems and in-depth solutions for keeping the pipeline efficient.

Referring to asset processing tools as "the ugly stepsister" of game development, West warns against ignoring their role.

He notes that, because of their simplicity, effort is often not spent to properly optimize them, which can end up eating away time later in the project.

Multithreading is a great way to make these tools more efficient, West notes. "Most PCs now have some kind of multicore and/or hyper-threading," he writes. "If your tools are written in the traditional mindset of a single processing thread, you’re wasting a significant amount of the silicon you paid for, as well as the time of the artists and level designers as they wait for their assets to be converted.

"Since the nature of asset data is generally to be large chunks of homogeneous data, such as lists of vertices and polygons, it’s generally very amenable to data level parallelization with worker threads, where the same code is run on multiple chunks of similar data concurrently, taking advantage of the cache."

Surprisingly, he even admits that "bad code" is permissible when it comes to processing tools, as long as the risks are properly weighed:

"In-house tools don’t always need to be up to the same code standards as the code you use in your commercially released games. Sometime you can get performance benefits by making certain dangerous assumptions about the data you’re processing and the hardware it will be running on.

"Instead of constantly allocating buffers as needed, try allocating a 'reasonable' chunk of memory as a general purpose buffer. If you have debugging code, make sure you can switch it off. Logging or other instrumenting functions can end up taking more time than the code they are logging. If earlier stages in the pipeline are robust enough, then (very carefully) consider removing error and bounds checking from later stages if you can see they are a significant factor.

"If you have a bunch of separate programs, consider bunching them together into one uber-tool to cut the load times. All these are bad practices, but for their limited lifetime, the risks may be outweighed by the rewards."

You can now read the full Gamasutra feature on the subject, posted as an independently authored piece within the Intel Visual Computing section, and which includes sample code (no registration required; please feel free to link to this feature from external websites).

October 16, 2008

Sponsored Video: Robust N-Core Capable Game Engine Design

In this video session, Intel graphics and game technologist Ron Fosner introduces the principles of designing game engines that scale to an arbitrary number of cores.

There are two main approaches, Fosner points out. "One is the traditional way of threading for speed; you parallize things to make them faster on the GPU. You want to break your game tasks up into chunks. The other way is programming and threading for features. You take advantage of all the extra CPU power that's beyond the minimal gameplay, and take it to the point where it's increasing the user experience. What you can do is continue to add content or work while you have CPU power available. You can increase the user experience as the CPU power increases, design your game engine to be multicore aware."

In part two of the session, Fosner delves more into practical techniques:

"Games are really very challenging to thread," he admits. "If you can manage to thread your game engine successfully, you can thread anything."

October 13, 2008

Progress In Contouring

In an interesting historically-tinted blog post, Nick Porcino looks at various methods of contouring in graphics rendering, starting off with the marching cubes algorithm first introduced during Siggraph '87. The method essentially divides a volume into cubes and replaces the cubes with corresponding polygons to approximate the original shape.

After quickly recapping that original principle, Porcino moves onto evolutions of that algorithm: convex contouring ("Where marching cubes generates triangles based on face intersections, convex contouring generates polygonal shapes that enclose the convex negative space within each cell"), dual contouring ("Since vertices are placed in the interior of the cubes and not on the edges of the cubes, there is more freedom as to where the vertex can be placed"), and dual marching cubes.

Dual marching cubes was introduced in 2004. As Porcino describes:

"This method aligns vertices in the tessellation with features of the implicit function. The tessellation itself occurs on a grid that is the dual of the structured sampling grid. The result is that thin features missed, or requiring a lot of subdivision are preserved with a much sparser polygonization than that resulting from a structured grid. The method proceeds by creating an octree describing the volume to be contoured, then a dual grid of the octree is created by linking vertices at the centers of each grid cell to its topological neighbours.

"The surface is then extracted using a simple extension of marching cubes to dual grids. Since each cell in the grid is topologically equivalent to a cube, the standard marching cubes tables can be used to generate the surface interior to the cell. Since the underlying representation of the data is an octree, the resulting tessellation is much sparser than Dual Contouring or Marching Cubes."

Porcino includes visual aids that demonstrate how the dual marching cubes algorithm manages to represent geometry better than the other sampling methods, even with a lower polygon count.

Despite the cube-centric tack of the post, Porcino notes that progress is being made using tetrahedral meshes, and points to the website for Jonathan Shewchuk's UC Berkeley computer science course as a useful resource to that end.

October 06, 2008

Feature: Unlocking Processing Potential: Randi Rost On CPU-Based Graphics Architecture

Intel's Randi RostThe latest Gamasutra visual computing interview shines the spotlight on Intel Graphics' external relations manager Randi Rost, a 25-year development veteran instrumental in the company's graphics hardware efforts.

One of Rost's focuses is lowering the barrier of entry to development with graphics hardware, a goal achieved in part by working directly with universities to get feedback and provide training. In this excerpt he discusses the importance of that angle:

"Randi Rost: Universities are where a lot of new technology gets dreamed up, where new algorithms get invented. Universities, particularly in the visual computing space, have been relatively shackled by the existing graphics hardware capabilities-where the entire rendering pipeline has been built into fixed-functionality silicon.

"This provides scant flexibility for researchers to innovate in term of rendering algorithms. Recently, within the last half-dozen years, the hardware pipeline has gotten to be more programmable, but there are still a lot of constraints.

"With our upcoming graphics architecture, built around a completely general-purpose CPU-based design, we're basically removing all the constraints for the rendering pipeline. We're telling researchers: 'Hey, here's an architecture where you can effectively do everything you want in software. There's no fixed functionality to get in your way. If you want to experiment with new rendering algorithms, with ray tracing, with hybrid rendering systems, if you want to replace the rasterization unit, if you want to have procedural geometry so that you can render spheres analytically (rather than breaking them down into polygons)-all of those things are possible.'

"It's a completely open, general, high-performance platform for highly parallel floating-point workloads, such as graphics. And, the basic programming model is simple: C++ code that targets x86 cores."

Rost goes on to discuss the opportunities this presents in hiring talent, and the importance of preparing the next generation of developers for the increasingly complex world of graphics hardware. You can read the full interview (no registration required; please feel free to link to this feature from other websites).

September 29, 2008

Intel Game Demo Contest Winners Announced

goo.jpgIntel has released the results of its 2008 Game Demo Contest, with winners in four categories receiving cash prizes of $12,500 as well as passes to Game Developers Conference 2009. Runners up received smaller cash prizes. All finalists were helpfully given a suite of developer tools as well as International Game Developers Association memberships.

Top honors for best overall threaded game went to Tommy Refenes of PillowFort for his Goo! demo, which also placed in the graphics category. You can see some of the evolution of Refenes' multithreading techniques in his recent feature article.

Other top honorees include Добряк's Magic Worlds, Tandem Games' Pixel & Vega in: Crunch Time, and Яков Сумыгин's Deadly Light. The winners were narrowed down from 329 entries. All demos that placed up to fifth in their categories are available for download from Intel's site.

Feature: Procedural Spooling In Games

procedural_spooling.jpgIn the latest in-depth Gamasutra technical feature, Neversoft co-founder Mick West examines how procedural generated content and compression can lead to expanding vistas for your open-world games.

Open-world game environments and objects are typically spooled from the disc as players move through an area, with scene complexity often determined by the data transfer rate of spooling and the virtual speed of the player within the world.

If a world has too much complexity, then new glitches may result when data cannot be spooled fast enough as players move from one region to another. To prevent these problems, developers can restrict players' maximum speed so there is sufficient time for the world to load, and they can place limits on scene complexity and allowable variation between regions.

To allow for more complex environments, West suggest that developers take advantage of procedural content -- content generated from mathematical descriptions of underlying forms and parameters describing the specific instance of that content -- and procedural compression:

"Procedural compression is simply storing a piece of geometry as a set of procedural parameters rather than as the final model. While this is not compression in the normal sense of the word, the effects are essentially the same, only with a vastly increased (even arbitrarily large) compression ratio.

"The disc spooling bandwidth requirements are thus greatly reduced, allowing us to pack vastly more level geometry into a small percentage of that bandwidth. The trade-off is that artists have reduced flexibility in the models they can represent, since they are constrained to the possible output of the procedural algorithms.

"We also trade some CPU resources, since the generation of geometry may require more CPU time than the standard spooling and decompressing of the raw data."

You can read the full technical article, which includes more information on procedural compression and solutions to the resource problems that come with the technique (no registration required, please feel free to link to this feature from other websites).

September 22, 2008

A Tip On (And Discussion Of) HDR Rendering Methods

hl2_lostcoast.jpgLucasArts engineer Marco Salvi has posted an in-depth explanation of his method for HDR rendering, prompted by a SIGGRAPH 2006 lecture series by Valve employees. Salvi first touches on Valve's own HDR process (added to the Source engine in 2005), which he points out runs "with MSAA on relatively old hardware [and] executes tone mapping and MSAA resolve in the proper correct order with no extra performance cost, something that a lot of modern games can't still get right today."

"Through image segmentation techniques," Valve's code "’simply’ tries to determine if the previous frame has been under or over exposed and a new exposure value is adjusted to compensate for problems with the previous frame(s)." Salvi notes that this method doesn't allow for reliably determining average logarithmic luminance, which in his experience has led to "an overall flat and over or under saturated feeling." So he introduces his own plan: "get rid of the exposure search through previous frames feedback and compute it the proper way!"

Salvi's method computes per-pixel logarithmic luminance and outputs that to the alpha channel; he includes a code example of the relevant math, and delves into further specific details.

As a bonus, Valve's Gary McTaggart and Chris Green, both presenters of the lectures that spurred Salvi's post, drop by in the blog's comment section to explain their reasoning behind the route they took for the Source engine's HDR implementation.

September 21, 2008

Sponsored Feature: A Landmark in Image Processing: DMIP

In the latest Intel-sponsored feature, Lightspeed Publishing's Lee Purcell lays out deferred mode image processing, a new addition to the Intel Integrated Performance Primitives Library, which speeds up complex image-processing tasks with up to 3X performance increases.

Purcell begins by laying out current issues in dealing with image processing, which becomes more challenging with the ever-increasing resolution and color demands that come alongside the evolutions in high-definition imaging. Intel's DMIP attempts to address the issue. As software engineer Alexander Kibkalo, quoted by Purcell, explains:

"Fundamentally, DMIP optimizes overall image-processing tasks within an application while individual Intel IP functions can be optimized without requiring knowledge of the environment or the conditions of the function call. DMIP provides the detailed descriptions of each task in DAG form and the appropriate preferences can then be applied for optimizing the routines.

"In terms of parallelization on Intel processors, DMIP tries to maintain a balance of the slice size. Keeping it comparatively small lets the slice fit into L2 cache and enable efficient pipelining. Splitting the slices into comparatively large sub-slices allows them to be efficiently handled by the available processor cores (for example, by 4 lines for a quad-core processor or by 8 lines for an 8-core processor). For achieving the best performance, the actual method of splitting should be tailored to the individual processor on which the application is being run."

Purcell also notes several secondary benefits:

"For example, matching the Intel IP function algorithms to the low-level optimizations for the Intel Streaming SIMD Extensions (Intel SSE) (from Intel SSE through Intel SSE4) can improve overall application performance substantially. Software developers and their application designs benefit from well-established, highly refined algorithms that address many of the most important programming operations. Intel's support network also adds value and utility to these libraries."

You can now read the full feature, which includes detailed information and visual aids.

September 11, 2008

Sponsored Video: Michael Langmayer On 3ds Max's Multithreading

In this video, Autodesk 3D application specialist Michael Langmayer gives a demonstration of 3DS Max and how it takes advantage of multi-core computing. 3ds Max integrates the Mental Ray rendering engine and uses distributed processing, including over a network.

"Metal Ray allows you to take advantage of multiple processors -- not only processors in this machine, but processors in the network," he says. "We have something called distributed bucket rendering. If I switch this one on, and the network here is connected to other machines, I could basically connect to four other machines having dual-core processors intalled, and I could then render with the power of four machines."

He then demonstrates the rendering process: "At the bottom you can see we're getting all those different buckets. Each bucket is actually representing one core of this machine I'm running here. It's helping the artists or the creator of any office or design-related building to render a lot quicker. ...We can utilize this in the games market for texture baking, for example. ...We're really harnessing the power of those multi-threaded machines."

Langmayer also demonstrates the 3D sculpting tool Mudbox.

September 08, 2008

A Guide To Dual-Paraboloid Reflections

dual_paraboloid.jpgKyle Hayward, a Purdue University senior studying graphics research who we've featured before, posted a useful primer to dual-paraboloid reflections, which he describes as a "view-independent method of rendering reflections just like cubemaps."

Unlike cubemaps, which can require updating up to six textures when rendering dynamic scenes, dual-paraboloid reflections only require updating two textures.

Hawyard comments: "They won't give you the quality of cubemaps, but they are faster to update and require less memory. And in the console world, requiring less is almost reason enough to pick this method over cubemaps."

He goes on to share several reference articles and a math overview before providing code for developers to experiment with generating their own reflections with paraboloid maps.

Hayward notes that there are a couple faults to watch out for in addition to reduced quality: "One drawback of paraboloid maps is that the environment geometry has to be sufficiently tessellated or we will have noticeable artifacts on our reflector. Another drawback is that on spherical objects, we will see seems. However, with objects that have convex and concave polygons, the seem won't be as noticeable."

Sponsored Feature: Multi-Threading Goo!: A Programmer's Diary

In the latest Intel-sponsored feature, Goo! developer Tommy Refenes of PillowFort matches wits with multi-threading in four gripping acts -- and emerges victorious.

Refenes begins by describing his initial attitude to threading, one which proved a poor choice: he dedicated a thread to the specific tasks of collision, only calling that thread's functions when needed and keeping it in a nonfunctioning loop otherwise. Here, he explains his mistake:

"The model of having two threads sleeping and waiting for work (at least in the way in which they were waiting) was horrible. The threads waited in an infinite loop for work, and if they didn't find any, they performed a Sleep(0) and then continued waiting for work. At the time, I thought the Sleep(0) would free the processor enough so that other threads running elsewhere on the OS would be scheduled efficiently. Boy, was I wrong. The work check loop had the two processors constantly running at 90 percent to 100 percent capacity- an absolute waste of time. To understand why, picture this scenario.

"Let's say you are in charge of five construction workers tasked with paving a road. You know that you will need at least two of them to lay blacktop and at least the remaining three to grade the road, dig the ditches, and so on. Being the efficient supervisor that you are, you wouldn't think of allowing the two workers that need to lay the blacktop to sleep in the middle of the road until the blacktop is ready to be laid. No, to work as efficiently as possible, you would put all five of them to work grading and then all five paving. "

Refenes goes on to describe how he then created a new custom multi-threaded collision engine -- which he once again threw away and rewrote since that version of the game would not run on single-core machines. By the end of Act 4, he has programming another engine, one which dynamically scales based on the number of cores available.

You can now read the full feature, which includes specific technical details and visual aids.

September 05, 2008

Paper: Multi-Threaded Shadow Volumes on Mainstream Graphics Hardware

shadowvolumes.jpgThe Intel Software Network has posted "Multi-Threaded Shadow Volumes on Mainstream Graphics Hardware," a white paper from software engineer David Bookout and Intel senior applications engineer Satheesh Subramanian, which describes techniques for generating shadows on mainstream graphics hardware in realtime.

The paper notes that though shadow volumes are able to create physically accurate shadows, thus providing a more realistic 3D scene, having an abundance of dynamic shadow-generating objects or moving light sources can affect performance.

To combat that problem, the paper covers "how shadow volumes generated on the CPU using a multithreaded approach could free up the graphics hardware from this bottleneck-creating task."

The paper also provides a shadow maps implementation based on the EmptyProject example from the Microsoft DirectX 9.0 SDK.

August 28, 2008

Intel Announces Visual Adrenaline Developer Program

Delivering her IDF 2008 keynote, Intel's Software and Solutions Group VP and GM Renee James announced Visual Adrenaline, a new developer program targeted specifically at visual computing.

According to Intel's Visual Adrenaline portal and technology website bit-tech, who attended the event, the new program will offer developers, animators and other gaming and digital content profesionals resources to take advantage of Intel's' technologies and hardware, particularly multi-core CPUs and Larrabee.

Visual Adrenaline will also offer an online developer community and a new digital magazine (free subscriptions) titled Visual Adrenaline. The magazine will feature profiles of games, software, artists, and industry figures.

James promised that the developer program will include "focused content, online activities, training, as well as SDKs and tools specifically designed at using Intel platforms and future platforms ... for visual computing and gaming."

August 27, 2008

Supported Feature: The Whimsy of Domain-Specific Languages

greenblat.jpgIn this technical article, originally published in Game Developer magazine, Neversoft co-founder Mick West explores making your own mini-languages for games by making Whimsy, a domain-specific language that creates art based on the abstract paintings of Parappa creator Rodney Alan Greenblat.

Unlike general-purpose languages, a domain-specific language must support a large amount of functionality, such as variables, data structures, conditional expressions, looping constructs, and functions. To create domain-specific language Whimsy, West defined his domain as the works of Rodney Alan Greenblat, using pieces from the artist's Elemental tour, a collection of semi-abstract paintings in a distinctive brightly colored and geometric style:

"The idea was this: If such a style of artwork were to be used in a video game, then it might be very useful to have a DSL that encapsulated that style and allowed for easy creation of similar pieces for use in-game.

"The first step in creating a DSL is to get a rough idea of the elements that the domain comprises. Looking at the Elemental works, we can see a number of common aspects. There are concentric oval shapes, with petals adjoined to various sections.

"Many of the works have segmented circles with colored circles inside them. There are little propellers and various other shapes that repeat both within individual works and within Greenblat's overall collection.

"I decided the best way to approach creating this DSL would be to pick one piece and attempt to replicate parts of it. I chose the painting 'Lunar Module' (see image). Many common elements hold the piece in its style: solid circles, concentric ovals with color gradients, petals, and stars."

You can read the full technical feature on Whimsy, Mick West's domain-specific language, which includes code and examples of potential uses of DSLs in games.

August 26, 2008

Sponsored Video: Greg Corson On COLLADA Tools

For this week's video. SCEA software engineer Greg Corson, who has worked on interchange file format COLLADA since 2005, talks about taking advantage of the format with game development tools such as COLLADA Refinery and Coherency Test.

Corson believes that COLLADA is helping developers move away from using custom-created solutions and formats: "This was the same issue in the movie industry -- they all had their own custom software, or they bought all their software that they used from one company. Everyone realized that they can't really do that anymore. No one company has everything that you need to build a game or to do a movie.

He continues: "It's getting more and more important that you have a lot of tools that work together. Other tools tend to get discarded now, instead of just worked around. That's a big change for the industry, finally seeing everyone say 'We can buy tools from anyone we want and have it work.'"

August 22, 2008

A Guide To Dual-Paraboloid Reflections

Kyle Hayward, a Purdue University senior studying graphics research who we've featured before, posted a useful primer to dual-paraboloid reflections, which he describes as a "view-independent method of rendering reflections just like cubemaps."

Unlike cubemaps, which can require updating up to six textures when rendering dynamic scenes, dual-paraboloid reflections only require updating two textures.

Hawyard comments: "They won't give you the quality of cubemaps, but they are faster to update and require less memory. And in the console world, requiring less is almost reason enough to pick this method over cubemaps."

He goes on to share several reference articles and a math overview before providing code for developers to experiment with generating their own reflections with paraboloid maps.

Hayward notes that there are a couple faults to watch out for in addition to reduced quality: "One drawback of paraboloid maps is that the environment geometry has to be sufficiently tessellated or we will have noticeable artifacts on our reflector. Another drawback is that on spherical objects, we will see seems. However, with objects that have convex and concave polygons, the seem won't be as noticeable."

August 19, 2008

Sponsored Video: Demo of the Thread Profiler, Threading Building Blocks

Intel senior software engineer Gary Carleton returns for this week's featured video to again discuss Intel's software development tools, this time demonstrating the Intel Thread Profiler and the Thread Building Blocks.

With the Intel Thread Profiler, developers can see lock transitions for a program and when threads are active on a particular core. Carleton explained, "You can kind of get a sense of how much lock thrashing or lock activity is occurring and maybe get a visual look at how much synchronization overhead is occurring with the application."

Developers can also click on a particular thread transition to look at the source code for both the thread releasing the lock and the thread acquiring the lock. That way, they can examine the code if they feel that there's too much lock activity occurring.

Said Carleton, "The goal is to measure the amount of parallelization of the system [to see if we] are in fact using all the execution cores in the system, and then to be able to drill down into more detail as to how the threads are interacting."

August 15, 2008

Intel Publishes First Details On Larrabee

Ahead of its presentation and preview of its Larrabee graphics processing unit at SIGGRAPH 2008, Intel posted a paper on the GPU titled Larrabee: A Many-Core x86 Architecture for Visual Computing, which will be presented at the Los Angeles conference.

According to the paper, Larrabee uses "multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks." Intel notes that its pipeline is derived from the dual-issue Intel Pentium processor.

The many-core architecture, which Intel claims is a first for the industry, allows for "dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads." It also designed to increase the flexibility and programmability of the architecture as compared to standard GPUs.

Furthermore, Larrabee's native programming model supports "a variety of highly parallel applications, including those that use irregular data structures. This enables development of graphics APIs, rapid innovation of new graphics algorithms, and true general purpose computation on the graphics processor with established PC software development tools."

The chip manufacturer also added that the first Larrabee-based products are expected to launch in late 2009 or early 2010, and will target "discrete graphics applications, support DirectX and OpenGL, and run existing games and programs."

August 13, 2008

Sponsored Video: Visual Computing with Next-Generation Intel Integrated Graphics Part 1

In this first part of a three-part video series taken from an Intel Developer Forum session, Intel marketing director Steve Skibiski discusses the company's vision and strategy for visual computing with next-generation Intel integrated graphics.

Speaking on Intel's 4-series graphics architecture, Skibiski predicts that 2008 is the year of high definition and the increased importance of high definition video: "The reasons for that is that we have greater broadband penetration. With this capability, you will see standard definition on web sites going to high definition. You will start seeing high definition being introduced as widgets on web sites."

He went on to emphasize the importance of the CPU processor to the overall experience that users get on platforms: "The processor continues to be the most important driver in the visual computing experience, but innovations in high definition video processing, like accelerators and post-processing algorithms, give platform engineers more options ind developing media and entertainment PCs."

August 11, 2008

Sponsored Feature: Havok Talks Simulating Real-World Physics

In this Intel-sponsored feature, Havok managing director David O'Meara discusses the middleware firm's new products -- from Havok Destruction to Havok Cloth -- its acquisition by Intel, and plans for the future.

Speaking on how Havok benefited from the recent acquisition, O'Meara noted that Intel made it possible for the company to offer Havok Complete, its physics and animation software toolset, available to game developers free of charge (for non-commercial use) in February 2008. Both Havok and Intel sought to "boost creative game development throughout the industry" with the initiative.

He also believes that the acquisition will enable Havok to extend its capabilities beyond the entertainment industry and into serious gaming: "Serious gaming includes non-entertainment applications, such as industrial and military simulations and training. Intel offers Havok a lot of capabilities and resources to enable us to go beyond the entertainment space in a couple of year's time."

O'Meara added that there are Intel-developed technologies that Havok could take part in: "It would be very nice if we could create some commercial applications for some of the technologies that are being developed - take them out of Intel and put them into Havok. And that is part of the concept for Intel and Havok."

August 08, 2008

Sample Framework For Connecting Several Post Processes

Kyle Hayward, a senior studying graphics research at Purdue University's computer science department, has put up a useful post processing framework sample structured to handle a myriad of post processes.

He begins by placing a PostProcessComponent (component) at the very bottom of the framework's hierarchy: "What this class represents, is a single atomic process, that is, it cannot be broken up into smaller pieces/objects. Each component has an input and an output in the form of textures. A component is also able to update the backbuffer. It is the simplest object in the hierarchy and is combined with other components to form a PostProcessEffect (effect)."

An effect contains a chain of components for implementing a single post process and handles out put from one component to another. With this arrangement, "components can be independent of one another, and the effect handles linking all the components that it owns. Because of this, each component is very simple. Also, like components, effects have inputs and outputs in the form of textures. Effects can also be enabled/disabled dynamically at runtime."

He continues: "Next we have the PostProcessManager (manager). The manager is comprised of a chain of effects, and it handles linking the output of one effect to the input of the next effect. And just like with components, this enables each effect to be independent of the next. The manager also takes care of linking the backbuffer and depth buffer to components that need it."

Because each component is independent, each one is simple in its implementation, making it easier to understand the code track down possible errors in a component. "Another nice feature about this framework is that you do not need to create a separate class deriving from PostProcessEffect for each effect. In this way, you can dynamically create your effects at run time. Most of the 'magic' occurs in the LoadContent() function inside PostProcessManager."

August 06, 2008

Supported Feature: Random Scattering: Creating Realistic Landscapes

In this technical article, originally published in Game Developer magazine, Neversoft co-founder Mick West continues his acclaimed analyses by showcasing an algorithm for procedurally scattering objects such as trees across a game level.

West details two competing procedural content generation methodologies, teleological and ontogenic, describing the former approach as creating an accurate physical model of the environment and process to simulate the results, whereas the ontogenic approach observes and reproduces the end results of the process.

For this task of simulating randomly scattered trees, you'll want to make sure that the trees are not overlapping, yet are still evenly distributed: "What if we started off with the trees perfectly evenly distributed (say on a square grid), and then simply move each tree a random amount, but not so far that they can overlap? ... This solution actually works quite well."

West continues: "[The results] (pictured) look something like a cross between the pure random scatter and the minimum distance scatter. The trees are evenly distributed, but we still see some minor clumping, including two trees that overlap slightly. But overall this algorithm produces much nicer looking results than our first attempt, and it's simpler and cheaper to implement than the second."

August 04, 2008

Sponsored Video: Performance and Threading Tools for Game Developers

In this week's featured video, Intel senior software engineer Gary Carleton provides an overview of Intel tools built to help developers thread code as easily as possible, improve performance, identify bottlenecks, and take advantage of multicore.

In addition to discussing the Intel Thread Profiler, which we've featured previously when comparing lock and lock-free code, Carleton covers several other Intel threading analysis tools and development products, such as the Intel Thread Checker and VTune Performance Analyzer.

On Intel Threading Building Blocks, Carleton describes the tool as a "template library that sits on top of the of threading APIs and allows the programmer to not have to focus on thread creation and the tread APIs in great detail." Programmers present Threading Building Blocks with a task, and it breaks up the work inside the task and partitions the work out to whatever CPU cores are available for execution.

He added that the template library has multiple purposes: : "[It makes] threading easier to release the programmer from being concerned about the details of programming, [and it makes] the program scalable. The threading building blocks at runtime will determine what CPU cores are available and actually dynamically do that. As cores become available or unavailable, it will modify the workflow according to that."

August 01, 2008

Paper: Geometry-Aware Framebuffer Level of Detail

Hong Kong University of Science and Technology's Pedro Sander and Lei Yang, along with University of Virginia assistant professor Jason Lawrence, have published and made available online a paper titled "Geometry-Aware Framebuffer Level of Detail" for Eurographics Symposium on Rendering 2008.

The paper introduces a framebuffer level of detail algorithm for controlling an interactive rendering application's pixel workload: "Our basic strategy is to evaluate the shading in a low resolution buffer and, in a second rendering pass, resample this buffer at the desired screen resolution. The size of the lower resolution buffer provides a trade-off between rendering time and the level of detail in the final shading."

To reduce approximation error, a feature-preserving reconstruction technique that approximates the shading near depth and normal discontinuities is used. The paper also demonstrates "how intermediate components of the shading can be selectively resized to provide finer-grained control over resource allocation," as well as a simple control mechanism that "continuously adjusts the amount of resizing necessary to maintain a target framerate."

The techniques covered do not require any preprocessing and are straightforward to implement on GPUs. In addition, they have been shown to provide "significant performance gains for several pixel-bound scenes."

[Via Level of Detail]

July 30, 2008

Sponsored Video: Allegorithmics Real-Time Graphics Texture Streaming for Games

Demonstrating developer Allegorithmic's Substance engine, marketing and sales director Alexis Khouri spoke with Intel Software Network about real-time graphics texture streaming for games.

According to Khouri, Allegorithmic's optimized engine allows for bigger game worlds, richer content, and more details: "What is really interesting is that our technology scales very well with multi-cores. We have more than 90% scalability, which means, for example, with 4 cores, we can have almost 14 megabytes per second of compressed textures."

For the demonstration, Allegorithmic produced its own level of Unreal Tournament 3 to show off what can be expected with the company's procedural texture technology: "What is interesting is that the whole texture package of this demonstration fits in 280 KB instead of 300 MB. So, what we did was divide the size of the textures by a thousand."

July 28, 2008

Arauna Real-Time Ray Tracing Updated

NHTV's International Game Architecture and Design senior lecturer and team manager Jacco Bikker has released a new demo and source package for his Arauna real-time ray tracing application. The package - which includes static library files for building a demo application, three tutorial applications, and source code needed to build the libraries and the demo application - can be used freely for non-commercial applications.

According to the Bikker's site, Arauna is an experimental and stable real-time ray tracer developed specifically for game development. While the application isn't yet capable of delivering the performance needed to produce graphics comparable to modern games using a GPU, it is one of the fastest renderers in its class.

Arauna's features include "real-time ray tracing of large triangle meshes (up to 2M per 1GB of memory), full HDR pipeline with post-processing for HDR glow, recursive reflection and refraction, accurate shadows from an unlimited amount of point lights, texturing with bilinear filtering and normal mapping," and more.

Two games have been developed with Arauna so far, both of which were created by students enrolled in NHTV University of Applied Sciences' IGAD program. The most recently completed game, Let There Be Light (pictured), was released earlier this month and is available for download at the Arauna site.

July 25, 2008

Supported Feature: Practical Fluid Dynamics: Part 2

Following up his recent article on practical fluid dynamics, Neversoft co-founder Mick West further explains the technical details - including source code - of creating dynamic fluid systems such as smoke for video games, using nothing more complex than basic algebra.

On creating smoke, West notes that simulating the surface of the fluid isn't the goal, rather its is more interesting to visualize substance suspended by and carried around by the fluid. "With water, we might have silt, sand, ink, or bubbles. In air, we could see dust, steam, or smoke. You can even use the velocity field techniques outlined here to move larger object, like leaves or paper in a highly realistic manner."

He continues: "It's important that what we're talking about is a suspension of one substance in another. We are generally not so interested in simulating two fluids that do not mix (like oil and water)."

Modeling smoke should be approached as a suspension of tiny particles in the air and not as a gas: "These tiny particles are carried around by the air, and they comprise a very small percent of the volume occupied by the air. So we do not need to be concerned about smoke displacing air."

To simulate smoke, West suggests adding "another advected density field, where the value at each cell represents the density of smoke in that region." In his accompanying code, he refer to this as "ink," as it's similar to the density of air, except "the density of smoke or ink is more of a purely visual thing and does not affect the velocity field."

July 23, 2008

Sponsored Video: Game Creators: Threading for Games

For this week's featured video, Game Creators CEO Lee Bamber describes how the company approached Intel about obtaining logos to use with their Windows Vista and Direct X10 game creation tool, FPS Creator X10, only to come away with the goal of integrating multicore support with its software.

Game Creators picked three software-slowing areas to target with multicore, the first being artificial intelligence: "The problem was that when we had a hundred characters rendered into the scene, the CPU slowed down on a single core because there was just too much AI processing going on. We spread out the AI calculation across all the cores, which brought our speed back up... so we could have a hundred characters running around a confined space, and we also got our performance back."

The team then accelerated line mapping, having found that with a single core, calculating all the shadows in a scne with lots of light could sometimes takes up to 180 seconds. "When we otimized with multicore technology on a quad core, the same thing took 53 seconds. So, [there was] a dramatic reduction in the waiting time to be able to go and play the game."

The third area Game Creators focused on was video capture. The studio wanted players to be able to capture footage to share with their friends and actually incorporate into their game-making process for cutscenes, but as soon as the feature was added, the game slowed to a crawl: "What we decided to do was get a separate core, and [we] said that core is just going to be video capture. That core sits there, and then captures the game which was on the other cores. The idea is that when you switch on video capture, the game doesn't slow down. So, you can capture the video at full frame rate, and you actually get a cool video at the end of it that you can use during your game-making process."

July 21, 2008

Deriving Exponential Shadow Maps Simplified

LucasArts PlayStation 3 engineer Marco Salvi has posted a conceptually simpler, more intuitive way to derive exponential shadow map (ESM) equations, which he discovered while working on an improved version of exponential shadow maps.

According to Salvi, there is "no need to invoke Markov's inequality, higher order moments, or convolutions." He goes on to rewrite the "basic percentage closer filtering (PCF)" equation with several adjustments before arriving at the more streamlined ESM occlusion term formula pictured.

He concludes: "Exponential shadow maps can be seen as a very good approximation of a PCF filter when all the occluders are located in front of our receiver (no receiver self shadowing within the filtering window). There's not much else to add, if not that this new derivation clearly shows the limits of this technique and that any future improvements will necessarily be based on a relaxed version of the planar receiver hypothesis."