I am a robotics researcher and hobbyist game developer. My specialization is in high dimensional motion planning for robot arms. Motion planning is a highly relevant topic for game developers, as it’s needed every time you want to move an AI controlled character from one place to another.
In the course of learning game development, I’ve come across many tutorials which discuss motion planning (usually called "Path Finding" in game development literature), but most of these do not go into great detail about what motion planning is all about from a theoretical perspective. As far as I can tell, most games rarely use any motion planning beyond these big three algorithms: A* Grid Search, Visibility Graphs, and Flow Fields. There is a whole world of theoretical motion planning research beyond these three, and some of it could be useful to game developers.
In this article I wanted to discuss the common techniques of motion planning in context, and explain some of the advantages and disadvantages of each. I also want to present some of the basic techniques beyond those commonly used in video games, and hopefully shed light on how these might be useful in game development.
Let’s start by assuming we have an AI-controlled character in a game, such as an enemy in a shooter, or a unit in a strategy game. We will call this character the Agent. The Agent is at a particular place in the world, called its “configuration.” In two dimensions, the configuration of the Agent can be represented by just two numbers (typically, x and y). The Agent wants to get to some other place in the world, which we will call the Goal.
Let’s assume for now that the Agent is a circle, and that he lives in a flat, two dimensional world. Assume the Agent can move in any direction he pleases. If there is nothing but empty space between the Agent and his Goal, the Agent can just move in a straight line from his current configuration to the Goal. When he arrives at the Goal, he stops. We will call this the “Walk To Algorithm.”
Algorithm: WALK TO
If not at goal:
Move toward goal.
If the world is completely empty, this is perfectly fine. It may seem obvious that we need to move in a straight line from the start configuration to the Goal configuration, but consider that the agent could have moved in a totally different way to get from its current configuration to the Goal. He could have looped around before going to the Goal, or perhaps walked fifty miles away from the Goal before finally coming back and landing on the Goal.
So why do we think the straight line is obvious? Well, because in some sense, it’s the “best” way. If we assume the Agent can only move at a fixed speed in any direction, the straight line is the shortest and fastest way for him to get from one point to another. In this sense, the Walk To Algorithm is Optimal. We will discuss optimality a lot in this article. Basically, when we say an algorithm is optimal, we are saying that it “does the best it can” given a certain definition of “best.” If you want to get from point A to point B as fast as possible, and there’s nothing in the way, you can’t beat a straight line.
And indeed, in many games (I would say, even most games), the straight-line Walk To Algorithm will work perfectly fine for you. If all you have are little 2D enemies that live in hallways or rooms, and who don’t need to search through mazes or follow orders from the player, you’ll never need anything more complicated than this.
But what if there is something in the way?
In this case, the thing in the way (which is called an Obstacle) blocks the path of the agent to the Goal. Since the agent can no longer move towards the goal, he simply stops. What has happened here?
While the Walk To Algorithm is Optimal, it is not Complete. A “Complete” algorithm is able to solve all solvable problems in its domain within finite time. In our case, the domain is circles moving around on a plane with obstacles. Clearly, just going straight toward the Goal blindly will not work for all problems in this domain.
In order to solve all of the problems in this domain, and solve them optimally, we will need to do something a little more sophisticated.
What would you do in the presence of an obstacle? When you need to get to some goal (like say, the front door) and something is in the way (like say, a chair), you simply walk around it and continue on toward your goal. This sort of behavior is called “Obstacle Avoidance,” and the idea can be used to solve our problem of getting the Agent to his goal.
One of the simplest obstacle avoidance algorithms is called the Bug Algorithm. It works like this:
Draw a straight line from the Agent to the Goal. Call this the M-Line.
The Agent takes a single step along the M-Line until one of the following conditions is met:
If the agent reaches the goal, stop.
If the agent touches an obstacle, follow the obstacle clockwise until we reach the M-Line again, and also if we are closer to the goal than we were initially upon reaching the M-line. When we reach the M-Line, continue from step 2.
Step 2.2 needs some explanation. Basically, the Agent remembers all of the times its encountered the M-line for a given obstacle, and only leaves the obstacle if it is on the M-line, and is closer to the goal than any time before. This is to avoid the Agent getting into an infinite loop.
In the figure above, the dotted line is the M-Line, and the orange arrows represent the steps our Agent takes before reaching the Goal. This is called the Agent’s Trajectory. Amazingly, the Bug Algorithm solves many problems of this type very easily. Try drawing a bunch of crazy shapes, and you will see how well it works! But does the Bug Algorithm work for every problem? And why does it work?
The simple Bug Algorithm has very many interesting features; and demonstrates many of the key concepts we’ll keep coming back to in this article. These concepts are: Trajectory, Policy, Heuristic, Completeness, and Optimality.
We already mentioned the concept of the Trajectory, which is just a sequence of configurations or movements that the Agent takes along its path. The Bug Algorithm could be considered a Trajectory Planner, which is an algorithm which takes in a start configuration and a goal configuration, and returns a Trajectory from the start to the goal. (This is also sometimes called a "path planner" in game development literature).
But we should also mention the concept of a Policy. If instead of interpreting the Bug Algorithm as returning a full Trajectory from the start and goal, we interpret it as giving a set of instructions to Agent which allow it to get to the Goal from any location, just using its own configuration as a guide, then the Bug Algorithm could be said to be a Control Policy. A control policy is just a set of rules which, given a state of the Agent, determines what action the Agent should take next.
In the Bug Algorithm, the M-Line is a strong indicator of where the Agent should go next. Why did we choose the M-line to follow, and not, say, the line from the current Agent position to the goal? The answer is that the M-line is just a Heuristic. A heuristic is just a general rule we use to inform the next step of an algorithm that imparts some human knowledge on the algorithm. Choosing a heuristic is often an important part of developing such an algorithm. Choosing the wrong heuristic can result in the algorithm failing to work altogether.
Believe it or not (and I did not believe it at first), the Bug Algorithm solves all solvable problems in the domain of two dimensional circles moving from a start to a goal position. That means, no matter where you put the Agent or the Goal, if the Agent can reach the Goal, the Bug Algorithm will allow him to do so. The proof of this fact is beyond the scope of this article, but you can read it if you wish. If an Algorithm can solve all problems in its domain for which there is a solution, it is called a Complete Algorithm.
Even though the Bug Algorithm can solve all problems of this type, the trajectories it produces are not always the best possible trajectories the Agent could have taken to get to the goal. Indeed, the Bug Algorithm will cause the Agent to do some pretty stupid things from time to time (just try making an obstacle which is tiny in the clockwise direction, and billions of miles long in the other direction, and you will see my point. If we define “best” as meaning “taking the shortest time,” then the Bug Algorithm is in no way Optimal.
So what if we wanted to solve the problem of moving around circles among obstacles in an Optimal (shortest) way? To do this, we can exploit geometry to figure out what an optimal algorithm might look like for this problem. First, note the following facts:
The definitions of “convexity,” “non-convexity,” and “convex hull” are not really important in understanding this article, but these definitions are all we need to know to construct an optimal trajectory planner. Let’s make the following assumptions:
Then, we can find an Optimal plan from the Agent to his goal by constructing what is called the Visibility Graph of the world. Besides the geometry, we also need to know a little bit about Graphs. Graph theory is not the topic of this article, but basically Graphs are just groups of abstract objects (called Nodes), linked to other Nodes by abstract objects called Edges. We can use the power of Graphs to describe the world the Agent lives in. We’ll say a Node is any place an agent can be without colliding with something. An Edge is any Trajectory between nodes that the agent can travel along.
With those ideas in mind, we define the Visibility Graph Planner in the following way:
Algorithm: VISIBILITY GRAPH PLANNER
Create an empty Graph called G.
Add the Agent as a vertex in G.
Add the Goal as a vertex in G.
Place all of the vertices of all of the polygons in the world on a list.
Connect all of the polygon’s vertices in G according to their edges.
For each vertex V on the list:
Try to connect V to the Agent and the Goal. If the straight line between them does not intersect any polygon, add this line as an edge in G.
For every other vertex in every other polygon, try to connect it to V using a straight line. If the line does not intersect any polygon, add the line as an edge in G.
Now, run a Graph Planner (like Dijkstra’s or A*) on the Visibility graph to find the shortest path from the Agent to the Goal.
The visibility graph looks like this: (dotted lines)
And the resulting path looks like this:
Because of our assumptions that the agent is a perfectly small point, and that it can travel in straight lines from any location to any other, we are left with a trajectory that perfectly passes through corners, edges, and free space to exactly minimize the amount of distance the Agent has to travel to the goal.
But what if our Agent isn’t a point? Notice how the orange path takes the Agent between the trapezoid and the rectangle. The Agent clearly can’t fit through that space, so we’ve got to do something about this problem.
One solution is to assume that the Agent is a disk rather than a point. If this is the case, we can simply inflate all of the obstacles by the radius of the Agent, and then plan as if the Agent were a point in this new, inflated world.
In this case, the Agent instead takes the path around the left of the rectangle rather than the right, since it simply can’t fit through the space between the trapezoid and the rectangle. This “inflating” operation is actually incredibly important. In essence, we’ve changed the world so that our assumption that the Agent is a point is true.
This operation is called computing the Configuration Space of the world. The Configuration Space (or C-Space) is a coordinate system whereby our Agent’s configuration is represented by a single point. An obstacle in the Work Space is transformed into an obstacle in the Configuration Space by first putting the Agent at the specified configuration and then testing the agent for collision. If the agent collides with an obstacle in the Work Space at that configuration, then there is also an obstacle in the Configuration Space. If we have a non-circular agent, we can use what is called the Minkowski Sum to inflate obstacles.
This might sound tedious for two dimensional cases, but that is really what we’re doing when we “inflate” the obstacles. The new “inflated” space is actually the Configuration Space. In higher dimensional spaces, things will not be quite so easy!
The Visibility Graph Planner is pretty cool. It’s both Optimal, and Complete, and it does not rely on any Heuristic. Another major advantage of the Visibility Graph Planner is that it is reasonably Multi-Query. A Multi-Query planner is one which can re-use old computations when going to new goals. In the case of the Visibility Graph, we only need to rebuild the edges of the graph from the Agent and the Goal whenever the Goal or Agent is unexpectedly moved. These advantages make the planner extremely attractive for game programmers.
Indeed, many modern games use Visibility Graphs for planning. A popular variant of the Visibility Graph idea is sometimes called a Nav(igation) Mesh. Nav Meshes sometimes use Visbility Graphs as a base (along with other heuristics for distance to enemies, items, and so on). Nav Meshes can then be modified by a designer or programmer. Each Nav Mesh is then stored as a file and used for a particular level.
Here, I have a screenshot of a Nav Mesh used in a modern video game: Overlord
For all their advantages, Visibility Graph methods have significant disadvantages. For one, it can be a fairly costly operation to compute the visibility graph. As the number of obstacles increases, the size of the Visibility Graph increases dramatically. The cost of computing the visibility graph increases as the square of the number of vertices. Additionally, if any obstacles change, it’s likely that the entire Visibility Graph needs to be recomputed in order to remain optimal.
Game programmers have invented all sorts of tricks to speed up graph construction by using Heuristics on which nodes to attempt to connect via an edge. Additionally, game designers often give hints to the Visibility Graph algorithm to determine which nodes to connect beforehand. All of these tricks tend to make Visibility Graphs no longer optimal, and sometimes involve tedious programmer intervention.
However, the biggest problem with Visibility Graphs (in my opinion) is that they simply don’t generalize well. They solve one problem, and solve it incredibly well: planning in the 2D plane with polygon obstacles. What if you’re not on the 2D plane? What if your agent can’t be easily represented by a circle? What if you don’t know what the obstacles are, or they can’t be represented by polygons? Well, then you could not easily use Visibility Graphs to solve your problem.
Luckily, there are other methods which have a more “fire and forget” feel to them, and which address some of these issues.
The reason Visibility Graphs work is that they exploit the optimizing qualities of abstract graph search, along with the fundamental rules of Euclidean geometry. The Euclidean geometry part gives you the “fundamentally right” graph to search. The abstract search itself takes care of the optimization part.
So what if we abandoned the Euclidean geometry altogether and just did the entire problem using abstract graph search? That way, we’d be able to cut out the middle man of geometry, and solve many more kinds of problems.
The problem is, there are many different approaches we could take to “massage” our problem into one of graph search. What are our nodes? What exactly are our edges? What does it mean for one node to be connected to another? The way we answer these questions will dramatically alter the performance of our algorithm.
Even if a graph search algorithm gives you the “optimal” answer, it is only giving you the “optimal” answer in terms of its own internal structure. This does not imply that the “optimal” answer according to the graph search algorithm is the answer that you want.
With that in mind, let’s go ahead and define the graph in the most common way: the Grid Lattice. Here’s how it works:
Algorithm: DISCRETIZE SPACE
Assume the Configuration Space has some fixed size in all its dimensions.
Discretize each dimension so that it has a fixed number of cells.
For each cell whose center is inside an obstacle in the Configuration Space, mark it Impassable.
Likewise, for each cell whose center is outside an obstacle, mark it Passable.
Each Passable cell is now a Node.
Each Node connects to all its “adjacent” Passable neighbors in the graph.
“Adjacent” is another definition we have to worry about. Two of the most common definitions are “Any cell which shares at least one corner with the current cell” (this is called “Euclidean Adjacency”) or “Any cell which shares at least one face with the current cell” (this is called “Manhattan Adjaceny”). Depending on which one you choose, the answer the graph search algorithm will return is going to be entirely different.
Here’s what the Discretization step looks like in our toy example:
If we search this graph according to the Euclidean Adjacency metric, the path we get will look something like this:
There are many different Graph search algorithms you could use to solve the lattice grid search problem, but by far the most common one is called A*. A* is a relative of Dijkstra’s algorithm which searches nodes forward from the starting node using a Heuristic. A* is extremely well explored in a number of other articles and tutorials, so I am not going to explain it here.
A* and other lattice grid graph search methods are probably the most common planning algorithms in games, and the discretized lattice grid is probably the most common way of structuring A*. For many games, this search method is ideal, since so many games use either tiles or voxels to represent the world anyway.
Graph search methods over a discretized grid (including A*) are Optimal according to their internal structure. They are also Resolution Complete. This is a weaker form of completeness which states that as the fineness of the lattice grid approaches infinity (ie, as the cells get infinitely small), the algorithm solves more and more of all solvable problems.
In this particular example, the lattice grid search method is Single-Query rather than multi-query, since the graph search cannot usually re-use old information when generalizing to new goal and start configurations. The discretization step, however, can naturally be re-used.
Another key advantage (and in my opinion the main advantage) of graph-based methods is that they are purely abstract. That means additional costs (such as safety, smoothness, nearness to desired objects, etc.) can automatically be considered and optimized. Additionally, totally different problems can use the same underlying abstract code. In DwarfCorp, we use the same A* planner for motion planning, and Symbolic Action Planning (which is about representing actions Agents can take as an abstract graph).
However, Lattice Grid graph search has at least one serious disadvantage: It is incredibly memory intensive. If every node of the grid is built naively from the start, you will very quickly run out of memory as the size of the world increases. A large part of this problem is the fact that the lattice grid has to store vast amounts of free space in its grid. Optimization techniques exist to alleviate this problem, but fundamentally, as the size of the world increases, and the dimension of the problem increases, the amount of memory in the lattice grid blows up enormously. This makes is unsuitable for many more complicated problems.
Another serious disadvantage is that the graph search itself can take quite a long time. If you have many thousands of agents moving around, or many obstacles changing positions, lattice grid searching may not work for you. In DwarfCorp, we have several threads dedicated to planning. It eats up a lot of our CPU time. It might do so for you too!
Another way of approaching the motion planning problem is to stop thinking about it in terms of Trajectory Planning, and start thinking about it in terms of Control Polices. Remember when we said that the Bug Algorithm could be thought of as both a Trajectory Planner and a Control Policy? In that paragraph, we defined a Policy as a set of rules which take in a Configuration, and return an Action (or a “Control Input”).
The Bug Algorithm can be thought of as a simple Control Policy which just tells the Agent to move toward the Goal until hitting an obstacle, and then to follow the obstacle clockwise. The Agent could literally follow these rules as it moves along its path to the goal.
The key advantages of Control Policies is that they generally don’t depend on the Agent having anything more than local knowledge of the world, and that they are generally extremely fast to compute. Many thousands (or millions) of Agents can all follow the same control policy in parallel quite easily.
So can we think of any control policies which work better than the Bug Algorithm? One useful one to look at is called the “Potential Field Policy.” It works by simulating the Agent as a charged particle which is attracted to the Goal, and repelled by obstacles:
Algorithm: POTENTIAL FIELD POLICY
Define real number constants a and b
Find the vector in the configuration space from the Agent to the Goal and call it g.
Find the nearest obstacle point in the configuration space, call it o.
Find the vector from to the Agent and call it r.
The control input u = a * g^ + B * r^ where v^ refers to the normalized vector v.
Move the agent according to the control input.
This policy requires some knowledge of Linear Algebra. Basically, the control input is the weighted sum of two terms: the Attractive term, and the Repulsive term. The choice of the weights on each term dramatically influence the resulting trajectory that the Agent takes.
Using the Potential Field Policy, you can get your agents to move in incredibly realistic and smooth ways. Additional terms (such as nearness to a desired object, or distance from an enemy) can be added in easily.
Since the Potential Field Policy can be computed extremely quickly in parallel, thousands upon thousands of agents can be controlled in this way very efficiently. Sometimes, this control policy is computed offline and stored in a grid for each level. It can then be modified by a designer as he or she sees fit (then, it is usually called a Flow Field).
Some games (especially strategy games) use this to great effect. I’ve attached an image of flow fields being used in the strategy game Supreme Commander 2 to get units to naturally avoid each other and stay in formation:
Of course, flow fields and potential field functions have serious disadvantages. First, they are in no way Optimal. Agents will be blown about by the flow field without any regard to how long they are taking to get to the goal. Secondly (and in my opinion most importantly) they are in no way Complete. What if the control input cancelled out to zero before the agent reached the goal? In that case, we would say that the agent has reached a “Local Minimum”. You might think that such cases are rare, but in fact they are quite easy to construct. Just put a big U-shaped obstacle in front of the Agent.
Finally, flow fields and other control policies are notoriously difficult to tune. How do we choose the weights on each term? Ultimately, these need to be hand-tuned to get good results.
Typically, designers will have to go in and hand-tune flow fields to avoid local minima. These problems limit the potential usefulness of flow fields in the general case. Nevertheless, they are useful in many scenarios.
So now we’ve discussed the three most popular motion planning algorithms in video games: lattice grids, visibility graphs, and flow fields. These algorithms are extremely useful, very easy to implement, and quite well studied. For 2D problems with circular Agents moving around in all directions in the plane, they work beautifully. Luckily, that covers almost all problems in video games, and the rest is faked by clever tricks and a lot of hard work on the designer’s part. It is no wonder that they are used so extensively in the game industry.
So what is it that researchers in motion planning have been doing for the past few decades? The answer is: everything else. What if your Agent isn’t a circle? What if he doesn’t live on the plane? What if he can’t move in every direction? What if the goals, or the obstacles, are constantly changing? Well, then the answers aren’t so simple.
Let’s start with a very simple case that seems like it should be easy to solve using the same methods we just described, but which in fact can’t be solved by those methods without seriously modifying them.
What if instead of being a circle, our Agent is a rectangle? What if the rotation of the Agent matters? Here’s a picture of what I’m describing:
In the above case, the Agent (in red) is sort of like a shopping cart that can move in any direction. We want to move the agent so that it gets exactly on top of the goal, and is rotated in the correct direction.
Here, we could presumably use the Bug Algorithm, but we’d have to be careful about how to rotate the Agent so that he doesn’t collide with the obstacles. We’d quickly find ourselves with a mess of rules that is nowhere near as elegant as the one for a circular Agent.
What if we wanted to use the Visibility Graph? Well, that requires that the Agent be a point. Remember when we did the trick of inflating the obstacles for the circular agent? Maybe we could do something similar in this case. But how much do we inflate the obstacles by?
One easy solution is just to take the “worst case scenario” and plan with that assumption in mind. In that case, we just take the Agent and find its bounding circle, and then assume the agent is a circle of that size. We then inflate the obstacles by the necessary amount.
This works, but we sacrifice completeness. There are many problems which this just won’t solve. Take for example this problem, where the long, thin Agent must fit through a “Keyhole” to get to the goal:
In this scenario, the Agent could get to the goal by going into the keyhole, turning 90 degrees, going to the next keyhole, turning 90 degrees, and exiting the keyhole. The conservative bounding circle approximation would tell us this is impossible.
The issue is, we have not properly taken into consideration the configuration space of the Agent. Up until this point, we’ve really only discussed 2D planning among 2D obstacles. In those cases, the configuration space nicely maps from 2D obstacles onto another 2D plane, and we see the effect of “inflating” the obstacles.
But what’s really going on here is that we’ve added another dimension to our planning problem. Not only does the Agent have an x and a y position, but he also has a rotation. Our problem is actually a 3D planning problem. Now, the mapping between the Workspace and the Configuration space becomes much more complicated.
One way to think about it is like this: imagine taking the agent at a particular rotation (which we’ll call theta). Now, for every point in the Workspace, we move the agent there at the given rotation, and if the Agent collides with an obstacle, we set that point as a collision in the configuration space. Again, this could be done for polygonal obstacles using the Minkowski Sum.
In the above diagram, we have a single sweep of the Agent through an obstacle. The red outlines of the Agent represent configurations which are in collision, the green outlines represent collision-free configurations. This naturally leads to another “inflation” of the obstacles. If we do this for all possible (x, y) positions of the Agent, we create a 2D slice of the configuration space.
Now, we just adjust theta by some amount, and do it again. We get another 2D slice.
We can then stack the slices on top of each other so that their x and y coordinates line up, just like stacking sheets of graph paper:
If we slice up theta infinitely thin, what we end up with is the 3D configuration space of the Agent. It’s a continuous cube, and theta wraps around from the bottom to the top. The obstacles get transformed into weird, cylindrical shapes. The Agent and its goal just becomes a point in this new, strange space.
This can be quite difficult to wrap your head around, but in a way, it’s quite beautiful and elegant. If we are able to represent obstacles in two dimensions, where each dimension is just the position in the x and y coordinates of the Agent, we can also represent obstacles in three dimensions, where the third dimension is the rotation of the agent.
What if we wanted to add another degree of freedom to the agent? Say it could also increase or decrease its length? Or maybe it has arms and legs we have to worry about? Well in those cases, we do exactly the same thing – we just keep adding dimensions to the configuration space. At that point, it becomes completely impossible to visualize, and you have to start using terms like “hyperplane”, “constraint manifold” and “hyperball” to understand them.
An example of a 4-dimensional planning problem. The Agent is made of two segments with a pin between them).
Now that our agent is just a point in 3D space, we can use the usual algorithms to find a solution to any motion planning problem. If we wanted to, we could grid up the new, 3D space into voxels and use A*. If we were clever, we could also find a polygonal mesh representation of the configuration space and then use a visibility graph in 3D.
Similarly, for other dimensions, we could just do the following:
Unfortunately, this has two huge problems:
Where NP-hard is a technical computer science term meaning the problem gets exponentially harder as the size of the problem increases, and a whole lot of other nasty theoretical baggage.
What this means in practice is that as we start adding dimensions to our planning problems, we quickly either run out of computation power, memory, or both. This is known as the Curse of Dimensionality. For 3-dimensions, we might be able to get away with A*, but as soon as we get to 4, 5, or 6 dimensions, A* quickly becomes useless (in spite of some recent research by Max Likachev to make it work well enough).
The simplest example of a 6-dimensional planning problem is called the “Piano Mover’s Problem.” It’s stated as follows: how can a rigid object (say a piano) get from point A to point B in 3D space if it can move and rotate in any direction?
(An example of the Piano Mover’s problem in OMPL)
Surprisingly, this problem remained unsolved for decades, in spite of its simplicity.
For these reasons, games have mostly stuck to the easy 2 dimensional problems, and fake everything else (even when having a solution to something like the Piano Mover’s Problem seems like a fundamental part of 3D game AI).
In the 70’s and 80’s, motion planning research focused on fast ways of computing the configuration space, and nice compression heuristics to make some planning problems dissolve into fewer dimensions. Unfortunately, this research did not result in easy, general solutions to high-dimensional problems. It was not until the late 90’s and early 00’s that robotics researchers began making headway into general high-dimensional problems.
Nowadays, planning in high dimensional spaces is much better explored. This is thanks largely to two major advances: randomized planners, and fast trajectory optimizers. I will explore these in the following sections.
I noticed that when I published a draft of this article, some people were confused by my terminology. In game development literature, the act of getting from point A to point B is usually called "pathfinding," yet I've been calling it "motion planning." Basically, motion planning is a generalization of pathfinding with a looser definition of "point" and "path" to include higher dimensional spaces (like rotations, or joints).
Motion planning can be thought of as pathfinding in the configuration space of the Agent,
One of the first general solutions to high-dimensional planning problems is called the Probabilistic Road Map. It takes ideas from the Navigation Mesh and Visibility Graph, and adds one more component: randomness.
First, some background. We already know that the hard parts of planning in high dimensional spaces involve computing the configuration space, and then actually finding the optimal path through that space.
PRMs solve both these issues by throwing out the idea of computing the configuration space altogether, and abandoning any notion of optimality. They work by randomly creating a visibility graph in the space using distance heuristics, and then searching that visibility graph for a solution. We are also allowed to make the assumption that we can collision check any configuration of the Agent relatively inexpensively.
Here is the algorithm in full:
Algorithm: PROBABALISTIC ROAD MAP (PRM)
Create an empty graph G
Add the start and goal nodes to G.
For N iterations:
Find a single uniform random sample of the configuration space and call it R. To do this, just interpret each dimension of the configuration space as a random number and generate exactly one point as a list of those random numbers.
If the agent is in collision at configuraiton R, continue.
Otherwise, add R to the graph G.
Find all the nodes in G which fall within distance d of R.
For each node which meets that criterion, N
Try to fit a collision-free straight line between R and N
If the edge is collision free, add it to G.
Plan using A* or Dijkstra’s Algorithm from the start to the goal.
In words, at each step we create a random point, and attach it to the neighbors in the graph based on which neighbor nodes in the graph it can "see", that is, if we are confident a straight line can be made betwen any new node and neighbors in the graph.
We can see right away that this is extremely similar to the Visibility Graph algorithm, except that we never ever reference “for every obstacle” or “for every vertex”, since, of course, it is NP-complete to compute those.
Instead, the PRM focuses on the structure of the random graph it creates alone, and the collisions between nearby nodes.
Properties of the PRM
We have already said that the algorithm is no longer optimal nor complete. But what can we say about its performance?
The PRM is what is called “Probabilistically Complete”. That means, as the number of iterations N approaches infinity, the probability that the PRM solves any solvable planning query becomes one. This sounds like a very loose statement (and it is), but in practice the PRM converges to the correct solution very quickly. It does mean, however, that the algorithm will have random performance, and may hang forever without finding a solution.
PRMs also have an interesting relationship with optimality. As the size of the PRM increases (and N grows to infinity), the number of possible paths increases to infinity, and the optimal path through the PRM becomes the true optimal path through the configuration space.
As you might imagine, due to its randomness, the PRM can produce some very ugly, long, and dangerous paths. In this sense the PRM is in no way optimal.
Like Visibility Graphs and Navigation Meshes, PRMs are fundamentally Multi-Query planners. They allow an agent to re-use the random graph for many planning queries efficiently. Level designers can also “bake in” a single PRM for each level.
The key “trick” of the PRM (and other randomized algorithm) is to represent the Configuration Space statistically rather than explicitly. It allows us to trade off the optimality of our solution with speed by accepting “good enough” solutions where they exist. It’s an extremely powerful tool – as it allows the programmer to decide how much work the planner is to do before returning a solution. This property is called the online property of the algorithm.
Essentially, the complexity of the PRM grows with the number of samples and the distance parameter d, rather than with the size or dimension of the configuration space. Compare this with A* or Visibility Graphs, which take exponentially more time as the dimension of the problem increases.
This property allows the PRM to plan for hundreds of dimensions quite quickly.
Since the invention of the PRM, many different variants have been proposed which increase its efficiency, path quality, and ease of use.
Sometimes, the Multi-Query quality of PRMs is unnecessary. Many times, its more desirable just to get from point A to point B without knowing anything about previous planning queries. For instance, if the environment changes between planning queries, its often better just to re-plan from scratch rather than rebuilding any stored information we have from previous planning queries.
In these cases, we can alter our random graph creation idea to account for the fact that we only want to plan from point A to point B once. One way of doing this is by changing our notion of a Graph to one of a Tree.
A tree is just a special kind of Graph where nodes are organized into “parents” and “children” such that every node in the tree has exactly one parent or zero or more children.
We can see how motion planning might be represented as a tree. Imagine it as though the Agent was exploring the configuration space systematically.
If our agent is in some state, there are several other states he can go to from there (or in our case, an infinite number of states). From each of those states, he can go to more states (but wouldn’t want to backtrack, since he’s already covered the state from which he came).
If we store where the Agent has been as the “Parent” nodes, and all of the places the Agent goes from there as “Children”, we can form a tree-like structure which grows from the Agent’s current state to all of the places the agent could potentially be. Eventually, the Agent ‘s tree encompasses the Goal state, and we have a solution.
This idea is very similar to A*, which expands states systematically and adds them to a tree-like structure (technically, a directed acyclic graph).
So could we randomly grow a tree from the Agent’s start configuration until it encompasses the goal configuration?
There are two classes of algorithms which attempt to do this. Both were invented in the late 90’s – early 00’s. One view is a “node-centric” view which picks a random node in the tree, and grows in a random direction. This class of algorithms is called “EST” for “Expansive Space Trees”. The other view is “sample centric”, and starts by randomly sampling nodes in the space, and then growing the closest node toward that random sample. Algorithms of this type are called “RRTs” for “Rapidly Exploring Randomized Trees.”
Generally, RRTs are much better than ESTs, for reasons I will not describe here.
Here’s how the RRT works:
Algorithm: RAPIDLY EXPLORING RANDOMIZED TREE (RRT)
Create an empty tree T.
Add the agent’s start configuration to T.
For N iterations, or until the goal is reached
Randomly sample a node R.
Find the closest node in T to R. Call it K.
Step along the ray from K to R by a small amount epsilon until:
If there is a collision, go back to sampling randomly.
Otherwise, add a new node at this configuration to T.
If we’ve reached a maximum distance d from K, go back to sampling randomly.
If the goal node is now within distance d from any node in the tree, we have a solution.
The above figures show roughly one step of the algorithm midway. We take a random sample (orange, r), find its nearest node (black, k), and then add one or more new nodes which “step toward” the random sample.
The tree will continue to grow in this way, randomly, until the goal is reached. Occasionally, we will want to randomly sample to goal as well, and greedily try to make a straight line to it. In practice, the probability of sampling the goal could be anywhere from 30% to 80% for the best results, depending on how cluttered the configuration space is.
The RRT is relatively simple. It can be implemented in just a few lines of code (assuming we’re able to easily find the nearest node in a tree).
You may not be surprised at this point to find that the RRT is only Probabilistically Complete. It’s also non-Optimal (in many cases, hilariously non-optimal).
(Robot arms are considered very high dimensional systems. In this case, we have two 7-joint robot arms holding a 1-joint wire cutter. This is a 15 dimensional system. Recent motion planning research has focused on such high dimensional systems.)
Due to its simplicity, RRT tends to also be extremely fast (for high dimensional problems, at least). Seven dimensional or higher planning problems can be solved in a matter of milliseconds with the fastest variants of RRT. When the number of dimensions approaches the dozens, RRT usually excels over all other planners in solving these problems. But note that "fast" in the high-dimensional planning community usually means taking a second or so to run the entire pipeline for 7 dimensions, or up to a minute for 20 or more dimensions. The reason for this is that RRT paths are often too terrible to be used directly, and must go through a lengthy pre-processing stage. That might shock programmers used to A* returning in milliseconds on a 2 dimensional problem -- but try running A* on a 7 dimensional problem, it will never return a solution!
Since RRTs invention and extreme success, there have been many attempts to extend it to other domains or increase its performance. Here are some RRT variants worth knowing about:
RRTConnect – grows two trees from the start and the goal, and attempts to connect them with a straight line at random intervals. As far as I know, this is the fastest motion planner known for very high dimensional problems. Here's a pretty picture of an RRT connect implementation I made (white is obstacles, blue is free space):
RRT* - Invented in the last couple of years. Guarantees optimality by re-balancing the tree. Thousands of times slower than RRTConnect.
T-RRT – Tries to create higher quality RRT paths by exploiting the gradient of a cost function.
Constrained RRT – Allows planning with arbitrary constraints (such as distance constraints, cost constraints, etc). Slower than RRT, but not by much.
Kinodynamic RRT – Planning in the space of control inputs rather than the configurations pace. Allows for planning of cars, carts, ships, and other agents which can’t change their rotation at will. Widely used in the DARPA Grand Challenge. Much much, MUCH slower than non-kinodynamic planners.
Discrete RRT – Application of RRTs to grids. Competitive with A* in 2D, faster in 3D or higher.
DARRT – Invented last year. Application of RRTs to symbolic action planning (previously considered the domain of A*).
Since randomized planners don’t provide any optimality guarantees, the trajectories they produce can be quite awful. So, rather than being used by the agent directly, trajectories produced by randomized planners are often fed through complex optimization stages which improve their quality. These optimization stages today make up the majority of planning time for randomized planners. A seven dimensional RRT might take only a few milliseconds to return a candidate path, but it can take up to 5 seconds to optimize the resulting trajectory.
In general, a Trajectory Optimizer is an algorithm which takes in a seed trajectory and a cost function, and modifies the seed trajectory so that it has a lower cost. For instance, if the cost function we wanted to minimize was trajectory length, an optimizer over that cost function would take a seed trajectory, and modify that trajectory so that it was shorter. This kind of optimizer is called a Trajectory Shortcutter.
One of the most common and extremely simple trajectory shortcutters (which could be called a Stochastic Hill Climbing Shortcutter) works like this:
Algorithm: Stochastic Hill Climbing Shortcutter
For N iterations:
Take two random points on the trajectory T (called t1 and t2)
If connecting a straight line between t1 and t2 would not make the trajectory any shorter, continue.
Otherwise, attempt to connect t1 and t2 with a straight line
If a collision-free straight line could be found, shortcut the trajectory T from t1 to t2, throwing out everything in between
This trajectory shortcutter could easily be modified by replacing the concept of “shorter” with any other cost (like distance from obstacles, smoothness, safety, etc.) Over time, the trajectory will end up falling into what is called a “local minimum” of the cost function. The length (or cost) of the trajectory will not decrease any longer.
There are far more complicated trajectory optimizers which go much faster than the hill climbing one I just described. These optimizers often exploit knowledge of the planning problem domain to speed things up.
Another way we can solve high dimensional motion planning problems is to distance ourselves from the idea of “searching a space” to find a path, and instead focus on trajectory optimization directly.
If it’s true that we can optimize a trajectory so that it is shorter, or meets some other criterion, then is it possible to solve the entire problem using optimization directly?
There are a large number of motion planners which use optimization to directly solve planning problems. Typically, these work by representing a trajectory using a parametric model, and then intelligently modifying the parameters of the model until a local minimum of a cost function is found.
One way of doing this is a process called gradient descent. If we have some cost function C(T), where T is a trajectory, it’s possible to compute its gradient D_C(T), which, when given a trajectory, tells you the way you must modify it in order to reduce its cost the most.
In practice, this is very complicated. You can imagine it sort of like this:
Algorithm: Gradient Descent Trajectory Optimizer
The more a trajectory passes through obstacles, the more it costs.
Start with a straight line from the agent to the goal. Call this T_0.
At all the points where the trajectory collides with obstacles, compute the direction which would take the point out of the obstacle the fastest.
Imagine that the trajectory is an elastic rubber band. Now imagine pulling each point in the direction given in step 3.
Pull all the points a little bit according to 4.
Repeat until the entire trajectory is out of collision.
Over time, the trajectory will tend to get “pushed” away from obstacles, while still remaining smooth. In general, such a planner can’t be complete or optimal, since it relies on the starting guess of a straight line as a strong heuristic. It will however be locally complete and locally optimal (and in practice will solve most problems).
(Note, in the above figure, the point gradients represent the *negative* gradient of the cost function)
In reality, trajectory optimization as planning is quite a hairy and complicated topic which is difficult to do justice to in this article. What exactly does it mean to compute the gradient of a trajectory? What does it mean to treat the trajectory like a rubber band? These questions have very complicated answers related to the calculus of variations.
Just know that trajectory optimization is a strong alternative to search-based or randomized planners for high-dimensional motion planning. Their advantages include extreme flexibility, theoretical guarantees on optimality, extremely high path quality, and relative speed.
(CHOMP being used for a robot to pick up a cup (Image courtesy of Dragan et. al))
In recent years, a number of very fast trajectory optimizers have become available which are providing strong competition to RRTs and PRMs in solving high dimensional problems. One of these, CHOMP, works through gradient descent and by using a clever representation of obstacles using a distance field. Another, TrajOpt, represents obstacles as convex sets and uses sequential quadratic programming techniques.
There is a whole world of theoretical motion planning beyond what I've covered here. I just want to give a sense for what else is out there.
Thus far we've only considered the case where the agent can move in any direction at any speed. What if the agent can't? One example of this is a car. A car cannot simply slide laterally, but must go forward or backward. These cases are called "non-holonomic" planning problems. There are several solutions to this problem -- but none of them are fast! The state of the art car planner can take up to a *minute* for the average plan.
Non-holonomic planning falls into another set of problems called "constrained planning problems". Besides constraints on motion, there can also be constraints on the physical configuration of the agent, on the maximum amount of force the agent can use to move, or regions where the agent is constrained by things other than obstacles. Again, there are many solutions to general constrained planning, but there are few *fast* solutions.
We can add time as another dimension in our plans, so that Agents can do things like track a moving target, or avoid moving obstacles. If done correctly, this can have some amazing results. However, this problem is made intractable by the fact that it's often impossible for the agent to predict the future.
What if our agen'ts plan can't be easily represented as a single trajectory? What if the agent has to interact with objects along the way (and not just avoid them)? These kinds of problems usually fall into the "hybrid planning" category -- there is usally some abstract aspect tied to the geometric aspect of the problem.Again, for these kinds of problems, there are no clear, universally accepted solutions.
In this article, we went over some of the basic concepts of motion planning, and presented several classes of algorithms which attempt to solve the problem of getting an Agent from point A to point B, from the dead simple to the extraordinarily complex.
Recent advances in motion planning research have led to a wide range of previously unsolvable problems now being within the reach of personal computers (and potentially, video game engines).
Unfortunately, many of these algorithms are still too slow for realtime applications -- especially for cars, ships, or other vehicles which can't move in any direction. That still requires further research.
I hope this wasn’t too dense, and that somebody finds something useful out of it!
Here's a summary of some of the algorithms we covered in table form (note, this assumes non-kinodynamic planning):