Pathfinding
AIs tend to fall short in one of two ways: either they are so
incredibly effective and efficient that they don't convey intelligence
as much as omniscience, or they are so incredibly inept that human
observers can't help but feel frustration watching them. The first case
is exemplified by an agent which ignores locally acceptable paths in
favor of the ideal global path, thereby giving away the secret that it
has knowledge of the entire map. The second case can be seen in a wide
variety of stupid behavior, but my favorite example is the agent which
travels along a river to avoid the relatively small penalty of crossing
water, only to plow through an enemy camp and ultimately meet its doom.
Anyone familiar with real-time-strategy gaming has no doubt met this
breed of pathfinder before.
The
problem, assuming we choose not to endow our pathfinders with God-like
knowledge of the Universe, is that we are trying to simulate the sort
of decision making employed by a computer that we still don't
understand - the human brain. Rather than trying to hard code something
this complex, perhaps we should be allowing our algorithms to develop
over thousands and thousands (or millions and millions) of generations,
using the same evolutionary mechanism that nature has relied on all
along.
Genetic Programming
Genetic
programming is a technique which uses the genetic operators of
reproduction, mutation and cross-breeding to find programs which solve
a given problem. The goal is to develop abilities in a population of
programs in the same way living organisms have developed their
abilities - namely, through evolution. Although at first the idea of
computer programs reproducing sounds bizarre, it is less so when you
realize that we are simply abstracting biological concepts to the point
where we can sensibly apply them to programming.
Genetic
programming, and genetic algorithms in general, are approximate methods
for finding a solution to a given problem, meaning they aren't
guaranteed to produce the correct solution, but rather a solution which
has a high probability of being correct. If our problem is to find a
reasonably intelligent and realistic pathfinding algorithm, this isn't
a deal breaker, since we make no assumption about there being only one
correct solution. In fact, anything that looks good to the human eye
will do. Ultimately, then, we can set the rules for a population and
its method of natural selection, and let it go. We stop the process
when the most-fit algorithm looks as if it is behaving both wisely and
realistically.
Rules and Requirements
Although
applying genetic programming techniques to the pathfinding problem is
fairly easy, there are several rules and requirements that must be
considered before beginning. This article is based on a project called
Hampton, named after the Hampton Court hedge maze. The project is
available at http://www.stromcode.com/Hampton
along with a technical write-up and source code. Although requirements
will vary depending on the project, the Hampton project was developed
with an RTS in mind; specifically it evolves pathfinders good at
maneuvering through square grids on which individual squares are either
unpassable, or passable with varying penalties.
The Map
For
our purposes, maps are square grids 40x40 in size. Moving from one
square in the grid to another constitutes a single move, and upon
moving the agent acquires a penalty in the range 0-9 for that square.
Additionally, squares can be impassable, and moves outside the map are
impossible. If an agent attempts an illegal move, it immediately "dies"
and effectively fails.
Hampton allows you to create custom maps and can generate random maps (which we'll see when talking about adaptation).
Fitness
In
order to evaluate the programs in a given generation, we need a way to
score a program's performance. While the specific fitness function used
will vary depending on the specific application, a simple method, and
the one used by Hampton, is to simply run the program on the map and
sum the number of moves it makes with the penalties it has accumulated
along the way. Because we are looking for programs that ultimately
reach their goal, we penalize programs that fail to do so by adding a
substantial value - a "death penalty" if you will - to their final
score (a value which will assuredly put the failing program's score
outside of the range of scores a successful program might achieve).
Once
this process is complete for all programs in the population, we can
easily find the most fit program by simply sorting the programs
according to fitness, and selecting that program which has achieved the
lowest score.
The Language
Of
primary importance is the syntax of the language which GP uses to build
candidate programs. LISP is common for genetic programming, although
ultimately it will depend on the application. Most importantly, it
should have a structure that lends itself to reproduction and mutation.
Programs that can be represented as trees work well for this, as we
will soon see.
For
the Hampton project, I developed a specific language consisting of a
set of terminals and functions. The functions are questions which the
program can ask about its current situation, while the terminals are
the moves it can make once it has asked the questions it deems
necessary. Since we want our agent to behave realistically, we strive
to give it access only to the information and moves a human player
would have access to. For example, we limit its environmental knowledge
only to a few squares immediately ahead of it - to see more, it will
have to physically move to another location and begin asking questions
anew.
The
set of functions and terminals is shown below. The function set
developed gradually, as earlier function sets were determined to be too
limiting. For example, the isGoal* functions were added when it was
recognized that programs performed better with a sense of direction.
Without these functions, agents would wander around in circles until
they blindly hit their goal. The doBoth function was added to give
programs memory, and will be discussed in detail when the topic of
memory is addressed.
Function
Description
isWaterAheadX
Detect water within X squares
isEnemyAhead
Detect enemy units one square ahead
isBlockedAheadX
Detect any roadblock within X squares
isSteeperAhead
TRUE if penalty ahead greater than current penalty
isLessSteepAhead
TRUE if penalty ahead less than current penalty
isGoalLeft
test relative direction of goal
isGoalRight
test relative direction of goal
isGoalAhead
test relative direction of goal
isGoalBehind
test relative direction of goal
doBoth
Execute both children
Our
set of terminals (moves) is much simpler. An agent can either move
forward one square, or rotate itself left or right by one quarter turn.
These nodes are represented as moveForward, turnLeft and turnRight.
The Interpreter
Once
our language is defined and the fitness function determined, we need to
be able to quickly and easily run the program against the map and
determine its score. This is where we see the benefit of requiring
programs that can be represented as trees. Programs are run by
traversing the tree until a terminal is reached, and then returning to
the root node and repeating, until either the goal is reached or the
program makes an illegal move and dies.
Because
the function set is composed of only boolean functions, our trees are
binary. When a function node is encountered, the system determines the
answer to the question, and follows the right branch if the answer is
yes, or the left branch if the answer is no. Once a terminal is read,
the move is made and program execution continues again from the root.
Memory
The
exception to the previous discussion on processing function nodes is
the doBoth function, which was added to the function set in order to
give programs memory. When the interpreter meets a doBoth terminal,
both of its child nodes are executed. To prevent programs from
executing in multiple threads, at least one of the children of a doBoth
function node is required to be a terminal.
Effectively,
this allows a program to inspect the landscape ahead of it, make a turn
(or a move, if that seems reasonable) and ask questions about its new
situation, without forgetting what was asked about its previous
situation. If the justification for giving programs memory is not
immediately clear, consider the situation where an agent analyzes a
position and opts to turn left. If each of the remaining three
directional orientations is worse than the first, the agent must either
choose a worse direction to move forward, or enter a loop state. After
all, without memory, once the agent returns to its original
orientation, it has forgotten it has been there before and will again
turn to a new orientation.
The
effect of adding the doBoth function to the function set was impressive
- programs began to actually demonstrate recognizable strategies. For
example, programs without memory were prone to falling into traps such
as long hallways, but with memory they quickly adopted wall-hugging
strategies which allowed them to enter a hallway and find their way out
of it. Memory also allows programs to return to previous locations and
take a path which it previously passed up. Without memory, returning to
a previous location could only result in another loop condition.
There
are other ways to give your programs memory. You could set up a small
memory register, for example, and give the program some means of
writing and reading from it. Or you could represent your programs as
finite state machines, and give them a stack to use to store
information about their environment (essentially creating a pushdown
automaton).
I
can't imagine what advantage one approach would have over another, but
the idea of embedding memory into the program itself (via a function
like doBoth) seems to be the simplest and most elegant solution.