Gamasutra.com Features - Skinned Mesh Export: Optimization
It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

Gamasutra
April 18, 2007

Skinned Mesh Export: Optimization

arrowrightPage One
arrowrightPage Two
arrowrightPage Three

Printer Friendly Version




Latest Letters to the Editor:
Perpetual Layoffs by Alexander Brandon [09.21.2007]

Casual friendliness in MMO's by Colby Poulson [09.20.2007]

Scrum deals and 'What is Scrum?' by Tom Plunket [08.29.2007]


[Submit Letter]

[View All...]
  


Skinned Mesh Export: Optimization


Cheapest Insertion

Instead of providing the order in which to send the partitions to the hardware, we can also query the hardware and search for a good solution. In general, we want to get an indication of the impact of rendering a partition when the hardware is in a specific state. Minimizing the impact of each insertion is called cheapest insertion. A naïve implementation takes a lot more time to come up with a solution for, but it typically outperforms the solution quality of the simple algorithm.

The algorithm for cheapest insertion walks through the complete list of the partitions and inserts every partition in the cheapest position in the solution (and not just at the end, like the “simple algorithm”). Inserting a partition in a solution requires that we re-evaluate the cost of the complete solution because the insertion might induce costs and/or savings for partitions that are processed after the inserted partition.

Cheapest insertion is also sensitive to the order in which the partitions are added, although to a much lesser degree compared to the simple algorithm. Using random ordering gives results that are comparable to the simple algorithm with good ordering, but the same orderings as used in the simple algorithm normally gives the best result.

The algorithm is shown in pseudocode 2.

An Example

To illustrate our algorithm, let us assume our fictional target hardware supports a shader with Matrix Palette Skinning. We assume at most eight joint matrices can be used in one rendering pass (the shader has eight joint ‘slots’). Our simple mesh has three partitions, with every partition using five joints.


A shader that supports eight joint-matrices needs to process three partitions which all have five joints.

We will solve the partition using the simple algorithm while showing the cost calculation. In the beginning we have a new, clean hardware state. We want to render a partition that contains five unique joint indices. The “cost” to render this partition is the uploading of five matrices and the cost to create a new palette. The second partition also uses five unique joint indices, but four of these are also in the first partition. The additional cost to render this partition is now the uploading of just one matrix because the other matrices are already available. Our third partition also has five unique joints and two of them are already available in the current palette. Because the complete partition does not fit in the palette (three new entries are needed, but only two are available), the hardware object will create a new palette. In this new palette, we upload the five unique matrices of the third partition. Having processed all the partitions, we now are ready to export the solution. The relevant portions of the resulting export file is shown below.

Benefits

Simulating hardware to optimize the export of mesh data has a few important benefits. The specific knowledge of the hardware is located in one point (the “hardware” subclass), and this makes it easy to maintain the code. Your technical programmer can supply an implementation for the Xbox, PS2, PSP, PC or some other platform. Furthermore, programmers can supply different implementations for the same hardware platform. Maybe there are new insights in how the hardware can perform optimally.

The best thing about working with abstracted hardware is that it does not need any special implicit or explicit knowledge about properties of joints. We implement a function that estimates the cost to change the hardware state and by trying different orderings we look for good solutions. This reduces the effort required by the programmer considerably: we let the computer do what it does best: considering a great number of alternatives and selecting the best one.

The Software

A piece of software has been written to create a software utility that is easily extendable. The program is written in C++ and can be compiled using MSVC++ or GNU C++. At Wishbone Games we use the utility to optimize our meshes multiple hardware targets. On the PC, the partitioning is fairly simple and different techniques show similar results (the same number of render passes). For consoles however, we were able to improve the speeds significantly: by using a more elaborate hardware simulation we were able to reduce joint matrix data sent to the hardware by 60 percent, thus improving overall performance.

Conclusion

Different hardware platforms need different optimization techniques when exporting skinned meshes. Using our framework, it is possible to create and plug-in new hardware or a new technique fairly easily. Just create a subclass of the “Hardware” class and put in special hardware considerations.

The currently implemented ways to generate solutions are pretty standard, but we found it works just fine for us. If you have difficult hardware simulation the cost function might take too much time to perform cheapest insertion. You could try to make a better estimate of the insertion point for a partition or maybe use other optimization techniques like simulated annealing or genetic algorithms. For our purposes however, the software works just fine as it is.

This utility is created to optimize the partitioning of a mesh based on skinning information. It can be altered, however, to take other splitting criteria in mind. For instance, you could split on materials (texture or shader), physics, fracture points or some other criteria. Furthermore, other utilities can be created to process the mesh even further by, for instance, stripify the mesh to accelerate rendering or perform polygon reduction. These routines are also hardware dependent, based on the size of the cache and other criteria. On the PS2 for instance, non-textured polygons optimally have a width that doesn’t exceed 32 pixels2. Such specific knowledge might not be used normally when designing an export utility, but when we create the tool we could easily plug-in a new routine using this knowledge. We can then export the models using both techniques and compare the rendering speeds. You could even estimate the cost of rendering such polygons, but it might be too much trouble for complicated systems that use caching, multiple processors and difficult bus simulation models. If you create such a utility however, I am sure many developers would appreciate it if the source code was shared.

The conversion framework as well as an implementation for PC hardware (source code and executable) can be downloaded here [.ZIP link]. On a final note, I would like to thank my friend Erik van der Pluym for reviewing this article.

2. PS2 Programming Optimisations, page 25.




join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2006 CMP Media LLC

privacy policy
| terms of service