|
3. Struct Padding
Figuring out the size of the data types and the amount of padding in between each of them is not enough. Consider this array WaypointInfo m_waypoints[10]; Compilers will often pad structures for performance purposes so that arrays of them will be aligned correctly.
In the example above, if we assume 32-bit integers, 8-bit Booleans, a 3-byte padding, 32-bit floats, and 8-bit characters, we might think the structure is 21 bytes. In reality, it will probably be padded to 24 bytes so subsequent structures in the array will be aligned on a 32-bit boundary. Some compilers might go as far as to pad it to 32 bytes.
What's a poor data baker to do with all those rules?
The varying size of basic data types is often dealt by creating user-defied data types that have a well-defined size. See Listing 1 for an example. In each target platform, you can provide definitions for those data types so their sizes are the same.
Listing 1: Example Data Definition for One Platform
typedef __int64 int64;
typedef signed int int32;
typedef unsigned int uint32;
typedef unsigned short uint16;
typedef unsigned short int16;
typedef unsigned char byte;
Doesn't it seem wasteful that we're all redefining our own data types just so we can know their exact sizes? It gets even worse when middleware providers do the same thing. We end up with many different "basic" data types of varying sizes and properties all over the same code base.
Fortunately, C99 introduced a new header file stdint.h, which among other things, declares integer data types with exact sizes, such as int8_t, uint8_t, int16_t, uint8_t, and so forth.
Do yourself a favor and start using those data types whenever exact size is important. If your compiler isn't yet C99 compliant (tsk, tsk, Visual Studio 2005!), you can get a third-party header file that adds those defines. (See Resources.)
The rules for member and struct padding aren't defined in the C++ standard, so it's completely up to each compiler implementation to decide how to do it. Fortunately, a lot of common compilers (most notably Visual Studio and gcc) support the #pragma pack directive, which allows you to specify the byte alignment desired in your structures.
You can either use #pragma pack everywhere that matters, or you can learn the padding rules for your compiler by implementing those structures and seeing what the compiler creates.
Another common source of problems are bitfields. Using the C language bitfields is very handy to pack flags into a small amount of space:
struct EntityState
{
bool m_active : 1;
bool m_invisible : 1;
bool m_invulnerable : 1;
bool m_playerControlled : 1;
bool m_inVehicle : 1;
// ....
};
The C++ standard guarantees that all those flags will fit in one bit each plus some padding. What it doesn't make any promises about is exactly how those bits will be laid out and or how they will be padded.
You either need to find out how the compiler in your target platform does it, or replace those flags with something you have control over, such as explicit bit masks on a 32-bit unsigned integer.
struct EntityState
{
uint32_t m_flags;
// .....
};
#define ENTITYSTATE_ACTIVE 0x00000001
#define ENTITYSTATE_INVISIBLE 0x00000002
#define ENTITYSTATE_INVULNERABLE 0x00000004
#define ENTITYSTATE_PLAYERCONTROLLED 0x00000008
#define ENTITYSTATE_INVEHICLE 0x00000010
In general, you need to watch out for anything that the standard doesn't explicitly dictate, and that's left up to each implementation.
For each of those cases, you should either substitute it with something that is well defined and consistent, or learn how each implementation defines it and make it part of the rules of your data baking.
Little End or Big End?
Once you've figured out the size of your data types and their offsets in the structure, you still need to know how exactly they're stored in memory. You might know that an integer is 32 bits, but what bit pattern describes a particular number?
There are two parts to that answer. The first one relates to how data types are represented in different hardware. And here, there's good news: Most modern platforms use the same method to represent basic data types.
Signed integers are represented with two's complement, and floating point numbers use the IEEE 754 standard for both 32- and 64-bit numbers (sign, mantissa, and exponent). A few platforms might not support floating point numbers, in which case we'll need to translate the data to fixed point or some other format. But in most cases, this is not something we have to worry about.
That's not the end of the story, though. The second part of the answer relates to how that number is stored in memory. In all modern platforms, a byte (8 bits) is the smallest addressable memory unit. Data types that are just a byte long (like a char) are simply stored at a particular memory address in a single byte, with nothing more to it.
The problem comes with data types that are larger than a single byte. Integers and floats are often 32-bits long, which is 4 bytes. How are those bytes arranged in memory? This is such a fundamental issue that you would hope there were one standard everybody followed. Unfortunately, because of historical reasons, there are two standard ways to do it.
Figure 2A shows the 32-bit integer 0x0A0B0C0D broken down into four bytes. The bit on the far left has the highest potential value (2^31) and is called the most significant bit (msb).
Conversely, the bit on the right has the lowest potential value (2^0) and is called the least significant bit (lsb). Extending this to bytes, the byte containing the lsb is called the least significant byte (LSB) and the one containing the msb is the most significant byte (MSB).

Figure 2 a) 32-bit data in a register. b) Memory layout in big-endian format. c) Memory layout in little-endian format.
One approach, known as the big-endian format, stores bytes in memory starting with the MSB (see Figure 2B). The other approach, little-endian, stores bytes in memory starting with the LSB (see Figure 2C).
The names big-endian and little-endian come from Jonathan Swift's Gulliver's Travels - tensions are high between two rival nations because one cracks its eggs on the big end, and the other cracks them on the little end, and each is convinced that its way is the correct way.
|