The key to using many of the 3D user interfaces techniques found in the VR and 3D UI literature is having an input device that supports 6 DOF. In this case, 6 DOF means the device provides both position (x, y, and z) and orientation (roll, pitch, and yaw) of the controller or user in the physical world.
In some cases, it is also important to be able to track the user's head position and orientation. This will be somewhat problematic with current motion controller hardware, but these techniques can certainly be modified when no head tracking is available.
6 DOF is essentially the Holy Grail when it comes to 3D spatial interfaces and, fortunately, this is the way the video game industry is going (e.g., Sony Move, Sixense TruMotion, and Microsoft Natal).
One can still support 3D spatial interfaces with a device like the Nintendo Wii Remote, but it is slightly more challenging since the device provides some of these DOF under certain conditions. As part of the reading list at the end of the article, I provide some papers that discuss how to deal with the issues surrounding the Nintendo Wii remote and 3D spatial interaction.
The motor component of navigation is known as travel (e.g., viewpoint movement). There are several issues to consider when dealing with travel in 3D UIs.
One such issue is the control of velocity and/or acceleration. There are many methods for doing this, including gesture, speech controls, sliders, etc. Next, one must consider whether motion should be constrained in any way, for example by maintaining a constant height or by following the terrain.
Finally, at the lowest-level, the conditions of input must be considered -- that is, when and how does motion begin and end (click to start/stop, press to start, release to stop, stop automatically at target location, etc.)? Four of the more common 3D travel techniques are gaze-directed steering, pointing, map-based travel, and "grabbing the air".
Gaze-directed steering is probably the most common 3D travel technique and was first discussed in 1995, although the term "gaze" is really misleading. Usually no eye tracking is being performed, so the direction of gaze is inferred from tracking the user's head orientation.
This is a simple technique, both to implement and to use, but it is somewhat limited in that you cannot look around while moving. Potential examples of gaze-directed steering in video games would be controlling vehicles or traveling around the world in a real-time strategy game.
To implement gaze-directed steering, typically a callback function is set up that executes before each frame is rendered. Within this callback, first obtain the head tracker information (usually in the form of a 4x4 matrix). This matrix gives you a transformation between the base tracker coordinate system and the head tracker coordinate system.
By also considering the transformation between the world coordinate system and the base tracker coordinates (if any), you can get the total composite transformation. Now, consider the vector (0,0,-1) in head tracker space (the negative z-axis, which usually points out the front of the tracker).
This vector, expressed in world coordinates, is the direction you want to move. Normalize this vector, multiply it by the speed, and then translate the viewpoint by this amount in world coordinates. Note: current "velocity" is in units/frame. If you want true velocity (units/second), you must keep track of the time between frames and then translate the viewpoint by an amount proportional to that time.
Pointing is also a steering technique that was developed in the mid 1990s (where the user continuously specifies the direction of motion). In this case, the hand's orientation is used to determine direction.
This technique is somewhat harder to learn for some users, but is more flexible than gaze-directed steering. Pointing is implemented in exactly the same way as gaze-directed steering, except a hand tracker is used instead of the head tracker. Pointing could be used to decouple line of sight and direction of motion in first and third person shooter games.
The map-based travel technique is a target-based technique. The user is represented as an icon on a 2D map of the environment. To travel, the user drags this icon to a new position on the map (see Figure 1). When the icon is dropped, the system smoothly animates the user from the current location to the new location indicated by the icon. Map-based travel could be used to augment many of the 2D game maps currently found in many game genres.
Figure 1. Dragging a user icon to move to a new location in the world. This image was taken in 1998.
To implement this technique, two things must be known about the way the map relates to the world. First, we need to know the scale factor, the ratio between the map and the virtual world. Second, we need to know which point on the map represents the origin of the world coordinate system. We assume here that the map model is originally aligned with the world (i.e. the x direction on the map, in its local coordinate system, represents the x direction in the world coordinate system).
When the user presses the button and is intersecting the user icon on the map, then the icon needs to be moved with the stylus each frame. One cannot simply attach the icon to the stylus, because we want the icon to remain on the map even if the stylus does not.
To do this, we first find the position of the stylus in the map coordinate system. This may require a transformation between coordinate systems, since the stylus is not a child of the map. The x and z coordinates of the stylus position are the point to which the icon should be moved. We do not cover here what happens if the stylus is dragged off the map, but the user icon should "stick" to the side of the map until the stylus is moved back inside the map boundaries, since we don't want the user to move outside the world.
When the button is released, we need to calculate the desired position of the viewpoint in the world. This position is calculated using a transformation from the map coordinate system to the world coordinate system, which is detailed here.
First, find the offset in the map coordinate system from the point corresponding to the world origin. Then, divide by the map scale (if the map is 1/100 the size of the world, this corresponds to multiplying by 100). This gives us the x and z coordinates of the desired viewpoint position.
Since the map is 2D, we can't get a y coordinate from it. Therefore, the technique should have some way of calculating the desired height at the new viewpoint. In the simplest case, this might be constant. In other cases, it might be based on the terrain height at that location or some other factors.
Once we know the desired viewpoint, we have to set up the animation of the viewpoint. The move vector represents the amount of translation to do each frame (we are assuming a linear path). To find , we subtract the desired position from the current position (the total movement required), divide this by the distance between the two points (calculated using the distance formula), and multiplied by the desired velocity, so that gives us the amount to move in each dimension each frame.
The only remaining calculation is the number of frames this movement will take: distance/velocity frames. Note that again velocity is measured here in units/frame, not units/second, for simplicity.