Building an AI Sensory System: Examining The Design of Thief: The Dark ProjectBy Tom Leonard
The term "senses" in game development is a useful metaphor for understanding, designing, and discussing that part of the AI that gathers information about items of interest in the simulated environment of the game. Non-player characters visually presented as humans, animals, or creatures with eyes and ears in a realistic three-dimensional space lend themselves well to the metaphor.
This engineering metaphor is not best applied too literally. In spite of the seemingly physical nature of the AIs in the game world, the analogy of game AI senses is not a physiological or neurological one. The line between "sense" and "knowledge" in a game is a blurry one. Sense incorporates the idea of awareness of another entity in the game, includes elements of value and knowledge, and can have game-relevant logic wired directly in.
A game sensory system must be designed in a way that is subservient to the game design and efficient in implementation. The senses need only be as sophisticated as is needed to be entertaining and robust. The result of their work must be perceivable and understandable by the player. Few game designs require AIs with a sense of taste, touch, or smell; thus senses primarily are concerned with vision or hearing. Used wisely, senses can be an invaluable tool to make simple state machines more interesting by providing them with a broad range of environmental input.
This paper describes an approach to designing and implementing a high-fidelity sensory system for a stealth-oriented first-person AI system. The techniques described are derived from experience constructing the AI for Thief: The Dark Project, as well as familiarity with the code of Half-Life. Initially, the basic concepts of AI senses are laid out using Half-Life as a motivating example. The paper then examines the more stringent sensory requirements of a stealth game design. Finally, the sensory system built for Thief is described.
An Introductory Example: Half-Life
Half-Life is not a game that centers on stealth and senses. With a strong tactical combat element, however, it does require a reasonable sensory system. This makes it a perfect case to explore the basics of AI sensory systems. AIs in Half-Life have sight and hearing, a system for managing information about sensed entities, and present interesting examples of leveraging basic senses into appealing behaviors.
In a simple sensory system, AIs periodically "look" at and "listen" to the world. Unlike real vision and hearing where stimuli arrive at the senses whether desired or not, these are active events. The AI examines the world based on its interest, and decides according to a set of rules that it sees or hears another element in the game. These probes are designed to emulate real senses while limiting the amount of work done. A greater amount of resources is dedicated to the things that are important for the game mechanics.
For example, in Half-Life the core sensory logic that is run periodically is:
If I am close to player then...
--Gather a list of entities within a specified distance
--For each entity found...
----If I want to look for them and
----If they are in my viewcone and
----If I can raycast from my eyes to their eyes then...
------If they are the player and
------If I have been told to not see the player until they see me and
------If they do not see me
--------Set various signals depending on my relationship with the seen --------entity
--For each sound being played...
----If the sound is carrying to my ears...
------Add the sound to a list of heard sounds
------If the sound is a real sound...
--------Set a signal indicating heard something
------If the sound is a "smell" pseudo-sound
--------Set a signal indicating smelled something
The first concept illustrated by this pseudo-code is that the senses are closely tied to the properties of the AI, its relationship with the subject, and the relevance of the AI to the player's experience. This is in part motivated by optimization concerns, but made available by game mechanics. In the Half-Life game design an AI that is not near the player is not relevant and need not sense the world. Even when near the player, the AI needs only to look at things that are known to produce reactions of fear or hatred later.
The logic also demonstrates the basic construction of vision as a view distance, a view cone, line-of-sight, and eye position (Figure 1). Each AI has a length-limited two-dimensional field of view within which it will cast rays to interesting objects. Unblocked ray casts indicate visibility.
There are two important things to note. First, the operations of sensing are ordered from least expensive to most expensive. Second, for player satisfaction, vision is a game of peek-a-boo. In a first-person game, the player's sense of body is weak, and the player seen by an opponent they do not see often feels cheated.
Most interesting is the snippet that restrains the AI's ability to see the player until seen by the player, which is purely for coordinating the player's entertainment. This is an example of how higher-level game goals can be simply and elegantly achieved by simple techniques in lower level systems.
The logic for hearing is much simpler than vision. The basic element of a hearing component is the definition and tuning of what it means for a sound to carry to the AI audibly. In the case of Half-Life, hearing is a straightforward heuristic of the volume of the sound multiplied by a "hearing sensitivity" yielding a distance within which the AI hears the sound. More interesting is the demonstration of the utility of hearing as a catchall for general world information gathering. In this example, the AI "hears" pseudo-sounds, fictional smells emanating from nearby corpses.
Senses as Gameplay Focus: Thief
Thief: The Dark Project and its successors present a lightly scripted game world where the central game mechanic, stealth, challenges the traditional form of the first-person 3D game. The Thief player moves slowly, avoids conflict, is penalized for killing people, and is entirely mortal. The gameplay centers on the ebb and flow AI sensory knowledge of the player as they move through the game space. The player is expected to move through areas populated with stationary, pacing, and patrolling AIs without being detected, creeping among shadows and careful not to make alerting sounds. Though the game AI's senses are built on the same core concepts as those of Half-Life, the mechanics of skulking, evading, and surprising require a more sophisticated sensory system.
The primary requirement was creating a highly tunable sensory system that operated within a wide spectrum of states. On the surface, stealth gameplay is about fictional themes of hiding, evasion, surprise, quiet, light and dark. One of the things that makes that kind of experience fun is broadening out the gray zone of safety and danger that in most first-person games is razor thin. It's about getting the payer's heart pounding by holding them on the cusp of either state, then letting loose once the zone is crossed. This demanded "broad-spectrum" senses that didn't tend to polarize rapidly to the extremes of "player sensed" and "player not sensed."
A secondary requirement was that the sense system be active much more frequently and operating on more objects than is typical of a first-person shooter. During the course of the game, the player can alter the state of the world in ways that the AIs are supposed to take notice of, even when the player is not around. These things, like body hiding, require reliable sensing. Together with the first requirement, these created an interesting challenge when weighed against the perennial requirement for game developers: performance.
Finally, it was necessary that both players and designers understand the inputs and outputs of the sensory system, and that the outputs match learned expectations based on the inputs. This suggested a solution with a limited number of player-perceivable inputs, and discrete valued results.
Expanding the Senses
At heart, the sensory system described here is very similar to that found in Half-Life. It is a viewcone and raycast based vision system and simple hearing system with hooks to support optimization, game mechanics, and pseudo-sensory data. Like the Half-Life example, most of the sense gathering is decoupled from the decision process that acts on that information. This system expands some of these core ideas, and introduces a few new ones.
Figure 2, Basic components and relationships
The design of the system and the flow of data through it are derived from its definition as an information gathering system that is customizable and tunable, but stable and intelligible in its output.
In this system, AI senses are framed in terms of "awareness." Awareness is expressed as a range of discrete states that represent an AI's certainty about the presence, location, and identity of an object of interest. These discrete states are the only representation of the internals of the system exposed to the designer, and are correlated by the higher-level AI to an alertness state. In Thief's AI, the range of alertness states is similar to awareness states. The alertness state of the AI is fed back into the sensory system in various ways to alter the behavior of the system.
Awareness is stored in sense links that associate either a given AI to another entity in the game, or to a position in space. These relations store game relevant details of the sensing (time, location, line-of-sight, etc.), as well as cached values used to reduce calculations from think cycle to think cycle. Sense links are, in effect, the primary memory of the AI. Through verbalization and observation sense links can be propagated among peer AIs, with controls in place to constrain knowledge cascades across a level. They may also be manipulated by game logic after base processing.
Figure 3, Sense Links
Each object of interest in the game has an intrinsic visibility value independent of any viewer. Depending on the state of the game and the nature of the object the level of detail of this value and the frequency of update are scaled in order to keep the amount of processor time spent deriving the value within budgets.
Visibility is defined as the lighting, movement, and exposure (size, separation from other objects) of the entity. The meaning of these is closely tied to the game requirements. For example, the lighting of the player is biased towards the lighting near the floor below the player, as this provides the player with an objective, perceivable way to anticipate their own safety. These values and their aggregate sum visibility are stored as 0..1 analog values.
Rather than having a single two-dimensional field of view, the Thief senses implement a set of ordered three-dimensional viewcones described as an XY angle, a Z angle, a length, a set of parameters describing both general acuity and sensitivity to types of stimuli (e.g., motion versus light), and relevance given the alertness of the AI. The viewcones are oriented according to the direction an AI's head is facing.
At any time for a given object being sensed, only the first view cone the object is in is considered in sense calculations. For simplicity and gameplay tunability, each viewcone is presumed to produce a constant output regardless of where in the viewcone the subject is positioned.
For example, the AI represented in Figure 4 has five viewcones. An object at point A will be evaluated using viewcone number 3. The viewcone used for calculating the vision sense awareness for an entity at either point B and point C is viewcone number 1, where identical visibility values for an object will yield the same result.
Figure 4, Viewcones, Top-view
When probing interesting objects in the world, the senses first determine which viewcone, if any, applies to the subject. The intrinsic visibility is then passed through a "look" heuristic along with the viewcone to output a discrete awareness value.
The motivation for multiple viewcones is to enable the expression of such things as direct vision, peripheral vision, or a distinction between objects directly forward and on the same Z plane as opposed to forward but above and below. Cone number 5 in the diagram above is a good example of leveraging the low-level to express a high level concept. This "false vision" cone is configured to look backwards and configured to be sensitive to motion, giving the AI a "spidey-sense" of being followed too closely even if the player is silent.
The sense management system is designed as a series of components each taking a limited and well-defined set of data and outputting an even more limited value. Each stage is intended to be independently scalable in terms of the processing demands based on relevance to game play. In terms of performance, these multiple scalable layers can be made to be extremely efficient.
Figure 5, Information Pipeline
The core sensory system implements heuristics for accepting visibility, sound events, current awareness links, designer and programmer configuration data, and current AI state, and outputting a single awareness value for each object of interest. These heuristics are considered a black box tuned by the AI programmer continually as the game develops.
Vision is implemented by filtering the visibility value of an object through the appropriate viewcone, modifying the result based on the properties of the individual AI. In mundane cases a simple raycast for line-of-sight is used. In more interesting cases, like the player, multiple raycasts occur to include the spatial relation of the AI to the subject in the weighing of the subject's exposure.
Thief has a sophisticated sound system wherein sounds both rendered and not rendered were tagged with semantic data and propagated through the 3D geometry of the world. When a sound "arrived" at an AI, it arrived from the directions it should in the real world, tagged with attenuated awareness values, possibly carrying information from other AIs if it was a spoken concept. These sounds join other awareness inducing things (like the Half-Life smell example) as awareness relations to positions in space.
Once the look and listen operations are complete, their awareness results are passed to a method responsible for receiving periodic pulses from the raw senses, and resolving them into a single awareness relationship, storing all the details in the associated sense link. Unlike the analog data used in the pipeline to this point, the data in this process is entirely discrete. The result of this process is to create, update, or expire sense links with the correct awareness value.
This is a three-step process. First, the sound and vision input values are compared, one declared dominant, and that becomes the value for awareness. The accessory data each produces is then distilled together into a summary of the sense event.
Second, if the awareness pulse is an increase from previous readings, it is passed through a time-based filter that controls whether the actual awareness will increase. The time delay is a property only of the current state, not the goal state. This is how reaction delays and player forgiveness factors are implemented. Once the time threshold is passed, the awareness advances to the goal state without passing through intermediate states.
Finally, if the new pulse value is below current readings, a capacitor is used to allow awareness to degrade gradually and smoothly. Awareness decreases across some amount of time, passing through all the intermediate states. This softens the behavior of the AI once the object of interest is no longer actively sensed, but is not the mechanism by which the core AI's alertness is controlled.
If an object of interest is no longer generating pulses, the senses incorporate a degree of free knowledge which is scaled based on the state of the AI. This mechanism produces the appearance of deduction on the part of the AI when an object has left the field of view without overtly demonstrating cheating to the player.
The system described here was designed for a single-player software rendered game. Because of this, all authoritative information about game entities was available to it. Unfortunately, in a game engine with a client/server architecture and a hardware-only renderer, this may not be true. Determining the lit-ness field of an object's visibility may not be straightforward. Thus incorporating such a system as described here is something to do deliberately and with care, as it will place information demands on other systems.
Furthermore, although efficient in what it does, it is designed for a game that in many ways centers around the system's output. In Thief it consumes a non-trivial amount of the AI's CPU budget. This will take time away from pathing, tactical analysis, and other decision processes.
However, there are benefits to be had for any game to invest in their sensing code. By gathering and filtering more information about the environment and serving it up in a well-defined manner, senses can be leveraged to produce engaging AI behaviors without significantly increasing the complexity of the decision state machines. A robust sense system also provides a clean hook for expressing "pre-conscious" behaviors by controlling and manipulating the core knowledge inputs. Finally, a multi-state sense system provides the player with an AI opponent or ally that exhibits varied and subtle reactions and behaviors without adding complexity to the core decision machines.
Because of the highly data-driven nature of the Dark Engine on which Thief was built, most of the concepts presented in this paper and all of the configuration details may be explored first-hand using a copy of the tools available at http://www.thief-thecircle.com.
Return to the full version of this article
Copyright © UBM Tech, All rights reserved