# Too Many Clicks! Unit-Based Interfaces Considered Harmful

**August 23, 2006**

**Page 5 of 6**

**Work calculations**

You can compute the work *W* that a player must do to specify a move using your user interface. If you measure *W*
in terms of game variables, such as the size of the gameplay area and
the number of player units, you can then compare different possible
user interfaces, even if you can only estimate *W*.

For an example, consider a UI for sculpting virtual clay. You have a voxel array of size *n*´*n*´*n*, and each voxel can be on or off. We’ll consider four possible user interfaces.

In the first interface, the user types in the coordinates of each voxel
that she wants turned on. The number of keystrokes it takes to specify
a number from 1 to *n* is proportional to log_{2}(*n*), so this takes work proportional to 3log_{2}(*n*) to specify each on-voxel. (From now on, I’ll drop the phrase “proportional to” and write *W*=f(*x*), with the understanding that it means *W*=O(f(*x*)).) If
we suppose the artist will efficiently specify only a surface, and not
fill in the entire inside of the sculpture, then the total work will be
*W*=3*n*^{2}log(*n*). (The size of the surface is proportional to *n*^{2}.)

In
the second user interface, the user chooses a point in three-space with
a three-dimensional mouse (such as 3DConnexion’s Spaceball), and clicks
the mouse to toggle its on/off state. The work needed to go to a
particular point in 3-space is then the combination of 3 movements;
each movement is on an axis ranging from 1 to *n*, with (let’s suppose) an average value of *n*/2. This seems as if it would then take work *W*=n^{5}/8
to create a sculpture. However, let’s suppose that after turning on one
voxel, the user moves to one of the 26 neighboring voxels. We will say
that this takes work proportional to the information needed to specify
one choice out of 26, which is log_{2}(26). We’ll approximate it as log_{2}(3^{3}=27), because the constant 3 in both this and in our previous value for *W* come from the three-dimensional nature of the sculpture. The total work is then *W*=3*n*^{2}log_{2}(3). This interface looks like an improvement over the first one.

In
our third interface, we’ll start the user off with a sphere of clay of
about the same volume as the desired sculpture, and the user will use
an ordinary 2D mouse to move a cursor around on the surface of the
clay, and click the left button to push the voxel under the cursor down
(perpendicular to the surface), and the right button down to pull it
up. We’ll use an accelerating push/pull interface which states that the
number of voxels pushed or pulled doubles when the clay is moved in the
same direction as the last click, and halves when moves in the opposite
direction, so that the proper position can be found with a binary
search taking time proportional to log_{2}(*n*). Suppose again the user only needs to work each point on the surface once. The work needed to move to the next voxel is 2log_{2}(3), because this movement is in 2 dimensions. The total work is then *W *= n^{2} ´ 2log_{2}(3) ´ log_{2}(*n*) = 2n^{2}log_{2}(*n*)log_{2}(3).
This is worse than the previous UI, even though we’re restricting
movement to be on a surface, because of the number of clicks that it
takes to push and pull the virtual clay.

In our
fourth interface, the user will move around the surface with a 2D
mouse, and push and pull points in and out as before. However, the
surface of the clay will have tension, so that pushing a point in or
out will drag all the neighboring voxels along. The result is that a
surface can be sculpted in a way similar to the way you can define a
curve using control points and a spline. Then *W* = 2c^{2}log_{2}(*n*)log_{2}(3), where *c*,
the number of control points, is now a function of the irregularity of
the sculpture, not of the number of voxels. For very high-resolution
modeling, *n*>>*c*, and *W* = O(log_{2}(*n*)). This is a vastly superior user interface.

To incorporate cognitive ergonomics into *W*,
you would also count the amount of memory the player needs to remember
the meaning of each keyboard shortcut, clickable icon, etc., and
incorporate a measure of the work done to convert displayed information
into relevant usable information (this is the tricky part). You could
also add a term for memory retrieval time. Retrieval time estimates for
different types of memories are given in (Anderson 1974); for
remembering one of a list of options (say, possible commands for a
unit), the time is *K*+*a**n*, where *n* is the number of options and *K* and *a*
are constants. Cognitive terms may be summed separately if you don’t
how to scale them so as to be comparable with the non-cognitive terms.

**Page 5 of 6**