Gamasutra.com Features - Skinned Mesh Export: Optimization
It's free to join Gamasutra!|Have a question? Want to know who runs this site? Here you go.|Targeting the game development market with your product or service? Get info on advertising here.||For altering your contact information or changing email subscription preferences.
Registered members can log in here.Back to the home page.

Search articles, jobs, buyers guide, and more.

Gamasutra
April 18, 2007

Skinned Mesh Export: Optimization

arrowrightPage One
arrowrightPage Two
arrowrightPage Three

Printer Friendly Version




Latest Letters to the Editor:
Perpetual Layoffs by Alexander Brandon [09.21.2007]

Casual friendliness in MMO's by Colby Poulson [09.20.2007]

Scrum deals and 'What is Scrum?' by Tom Plunket [08.29.2007]


[Submit Letter]

[View All...]
  


Skinned Mesh Export: Optimization


On current hardware (as of Vertex Shader 2.0), the constant table contains 256 four dimensional vectors. Depending on the number of constants used for other parameters, this amounts to around 70-80 joints. On older graphics cards the constant table can only contain 96 four dimensional vectors and the joint limit can be as low as 20 to 28 joints (although theoretically the limit is 96/3=32 joints).

If the number of joints used in the mesh exceeds the vertex shader joint limit, we need to partition the mesh and render parts of the mesh separately. When we partition a mesh, we typically want to use the least number of shader passes because uploading constants to the shader is an expensive operation.

Compared to the PC, handheld devices like the Playstation Portable (PSP) are completely different. Because the hardware is very different from PC hardware, this means we also need different algorithms to partition the mesh if we want to allow a large number of joints to be used in our models.

Optimally partitioning an arbitrary mesh is an NP-complete problem (which means it cannot be solved exactly for meshes of a practical size), but by using heuristics we can generate solutions that perform quite well in practice. Before creating such a tool however, we need to decide where we want to place this partition algorithm in our toolchain.

Exporting Skinned Meshes

The skinned meshes are typically created by the artist in a 3D modeling package like Maya or 3D Studio Max and are exported to a format that can be read by the game. At Wishbone Games, we use Maya in combination with a third party plug-in that is especially well suited for PC related hardware. When we started on our PSP project, we found that we needed to implement some major changes to accommodate for the PSP hardware. We decided to create a separate utility to handle these properties, sitting between the modeling package and the game engine (figure 2).

The main design considerations of the tool were:

  1. It should support multiple hardware platforms, all using the same utility.

  2. It should be easily extendable to accommodate for new hardware or new insights in existing hardware.

  3. It should have multiple optimization techniques that have different speeds. Fast turnaround time is key for the artists, but the final export or nightly build is allowed more time to optimize the mesh to improve performance.


The optimization utility sits between the modeling package and the game engine.

When you support multiple hardware platforms, we found it helps if you have a uniform toolchain. So exporting from a modeling package always requires putting it through an optimization program. For some target hardware this program is more difficult than for others, but, ideally, we would have one conversion utility that can create optimized meshes for any of these platforms and takes advantage of the architecture of the hardware. Much of the functionality for the hardware platforms will be shared (like reading the input file format) and by putting all the functionality in one utility we avoid discrepancies between different builds of the software.

The second design consideration deals with extendability. When new hardware is released, this hardware will have specific properties. Maybe uploading complete shader constants will be cheaper, or new functionality might become available. Such new hardware might require a complete new design of the way you render a mesh. Programmers should be able to quickly create a new path to export for this hardware when needed. Alteration of existing paths is also commonly encountered during the period of a game project. When you work extensively with some hardware you might have new insights in what works well on a specific platform, or maybe you have read an interesting article and want to try a new technique. This can all be done easily by creating new subclasses. A sample hierarchy is shown in figure 3.


The class hierarchy used to represent different hardware platforms.

The third design consideration is speed. When creating models, artists should be allowed to quickly (pre)view their work. Just as programmers want excellent compile speed, artists should have excellent export speed. While the artist is still working on a character model, the mesh does not have to be extensively optimized for rendering speed but the execution speed is the most important factor. By supporting different optimization techniques, we can control the time spent on optimization. Fully optimized models can be generated in the night build or the final export.

Building the Solution

To support skin partitioning we use a separate utility program that is called with command line options to indicate the target platform (Xbox, PC, PSP, etc), optimization options (quality or speed), the input file and the output file. The program reads the file, which is an ASCII text file similar to the widely known OBJ file format. It then partitions the mesh and writes the new mesh to the output file, which is like a list of commands to be processed by the hardware. For different hardware the export formats may vary.

The data needs to be read into memory and processed independently of the target hardware. Every vertex has at most four joints that influence it and a triangle has exactly three vertices. This means that in the worst case we use 12 unique joints to render a single triangle. Because a triangle is the smallest thing we can render, we want to make sure that the maximal 12 joints are all available when we want to render it. When we create the list of unique joints that are used by the triangle, we put the joint indices in ascending order so “5 2 4” becomes the same as “2 4 5”. It is very likely that many of the triangles in the mesh have the same joint indices and such triangle-groups can be rendered in one pass. We call such a triangle-group a partition.

The problem we want to solve is the order in which to send these partitions to the GPU so we can achieve maximal performance. This is where the representation of the hardware architecture comes into play. We create an object that simulates the relevant portions of the target hardware. For the PC, this object represents a vertex shader with a limited number of constant registers. You can query the hardware object to estimate the cost to render a partition.

A Simple Algorithm

A very simple method to create a solution is to just walk through the complete list of partitions and append them to the solution. This way, the initial ordering of the partition defines the quality of such a solution. If the list of partitions is imperfect, we might perform unnecessary work which could lead to disappointing rendering speeds.

By ordering the list of partitions we can improve our result significantly. Sorting on the number of unique joints used (’1’,’2’,’3’,’1 2’,’2 3’) or in ascending order (’1’, ‘1 2’, ’2’, ’2 3’, ’3’) are commonly used orderings. Please note that we don’t use the hardware representation here. We do an informed guess about a good ordering and execute it. The biggest pro of this algorithm is its speed. The relative cost of updating the hardware to render a partition is not even used, the algorithm just assumes the ordering will be good enough. This assumption is also a disadvantage because the quality of the solution completely depends only on the hard-coded sorting of the input data. Careless (or just plain bad) orderings may generate sub-optimal solutions. This simple algorithm is easy to implement, it is very fast (there is no searching) and in practice it can generate good solutions if the programmer has chosen a nice ordering (which is normally one of the two orderings mentioned earlier). Most skin partitioning algorithms use this technique. The algorithm is shown in pseudocode 1.




join | contact us | advertise | write | my profile
news | features | companies | jobs | resumes | education | product guide | projects | store



Copyright © 2006 CMP Media LLC

privacy policy
| terms of service