Contents
Beyond AIML: Chatbots 102
 
 
Printer-Friendly VersionPrinter-Friendly Version
 
Latest News
spacer View All spacer
 
November 22, 2009
 
Video Game Watchdog National Institute On Media And The Family Shutting Down [11]
 
Modern Warfare 2 Infinity Ward's 'Most Successful PC Version' Yet [12]
 
New Tech, Design Details Of Project Natal To Emerge At Gamefest In February
spacer
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
November 22, 2009
 
Sucker Punch Productions
Character Artist
 
Sucker Punch Productions
3D Environment Artist
 
Sucker Punch Productions
Network Programmer
 
Sucker Punch Productions
Texture Artist
 
Sony Online Entertainment
Brand Manager
 
Monolith Productions
Sr. Software Engineer, Engine - Monolith Productions - #113767
 
Crystal Dynamics
Sr. Level Designer
 
Gargantuan Studios
Lead World Designer
spacer
Latest Features
spacer View All spacer
 
November 22, 2009
 
arrow Upping The Craft: Susan O'Connor On Games Writing [6]
 
arrow Small Developers: Minimizing Risks in Large Productions - Part II [6]
 
arrow iPhone Piracy: The Inside Story [48]
 
arrow And Yet It Grows: Analyzing the Size and Growth of the European Game Market [5]
 
arrow NPD: Behind the Numbers, October 2009 [13]
 
arrow Reflecting On Uncharted 2: How They Did It [5]
 
arrow Sponsored Feature: Rasterization on Larrabee -- Adaptive Rasterization Helps Boost Efficiency
 
arrow Postmortem: Wadjet Eye's The Blackwell Convergence [2]
spacer
Latest Blogs
spacer View All     Post     RSS spacer
 
November 22, 2009
 
Accepting the Inherent Value of Games
 
Planckogenesis, Part II: Song Structure & Gravy Train [1]
 
Designing Games Is About Matching Personalities [1]
spacer
About
spacer News Director:
Leigh Alexander
Features Director:
Christian Nutt
Editor At Large:
Chris Remo
Advertising:
John 'Malik' Watson
Recruitment/Education:
Gina Gross
 
Features
  Beyond AIML: Chatbots 102
by Bruce Wilcox
4 comments
Share RSS
 
 
August 14, 2008 Article Start Previous Page 3 of 6 Next
 

Flaws of AIML

Complaint #1: The biggest flaw of AIML is that it is simply too wordy and requires huge numbers of effectively redundant categories.

Since the pattern matching of AIML is so primitive and generic, it takes a lot of category information to perform a single general task. If you want the system to respond to a keyword, either alone or as a prefix, infix, or suffix in a sentence, it takes you four categories to do so, one for each condition (with three of them remapping using SRAI to the fourth ).

Advertisement

<category>
<pattern>MOTHER</pattern>
<template> Tell me more about your family. </template>
</category>

<category>
<pattern>* MOTHER</pattern>
<template><srai>MOTHER</srai> </template>
</category>

<category>
<pattern>MOTHER *</pattern>
<template><srai>MOTHER</srai> </template>
</category>

<category>
<pattern>* MOTHER *</pattern>
<template><srai>MOTHER</srai> </template>
</category>

Had AIML used regular expressions for patterns, this could have been reduced to a single category statement. This leads to a critical point. Conciseness is good and having to have multiple flavors of the same rule is bad. The more you write, the harder it is to keep it organized, debug it, etc.

This was a frequent problem for large-scale expert systems (or any software for that matter). Of course regular expressions are devilishly hard to read and being able to easily understand your rules is also important. But I find even using XML like this is hard to read. It lacks conciseness. The intrusion of the xml keyword structure makes it slow to skim read what you do have. XML is not readable; it is barely legible.

AIML's wildcard * matches one or more words, but there is no wildcard matching zero or more. Since the match must swallow the entire set of all words, this forces you to use multiple patterns to cover starting, in the middle, and ending a sentence. Being able to use a zero-or-more wildcard would be much better.

AIML is a self-contained system which manipulates words without any knowledge of them. If you had a dictionary to back you up (e.g., the WordNet downloadable dictionary of around 125,000 words), you could use parts of speech wildcards for better patterns. A.L.I.C.E, for example, dedicates around 1000 patterns to handle various adverbs at the beginning of the sentence, remapping by stripping off the adverb. If one could use a keyword in the pattern like %adverb, they could all have been reduced to a single pattern that would have covered far more cases than are actually covered at present.

Thus the A.L.I.C.E. 40,000 rule basic bot is really less than 5,000 when you remove all the waste in their pattern definitions. And 5,000 is generally not enough to get interesting behavior in an expert system in a limited domain, much less a broad one.

Complaint #2 Similarly, the pattern uses an exact word. It would be nice if you could match a list of synonyms in a single specification. E.g., I (know believe think) you. And, to be able to declare a list of synonyms so you can reuse it. E.g. Synonym ~believe = know believe think; Then you could write the pattern I ~believe you. Of the 40,000+ patterns in the publicly available standard ALICE, 9,000 are in the Reductions file, which remaps input. These include.

<category><pattern>I AM JACK</pattern><template><srai>call me jack</srai></template></category>

<category><pattern>I AM JAKE</pattern><template><srai>call me jake</srai></template></category>

<category><pattern>I AM JAMES</pattern><template><srai>call me james</srai></template></category>

<category><pattern>I AM JANE</pattern><template><srai>call me jane</srai></template></category>

... and so on for lots of other names.

I would much rather write a synonym set for a list of names and write this as a single pattern.

Complaint #3 Continuing the issue of exact word matching... Because you can't use wildcard sets of words, you cannot easily handle generic set-based sentences. Google, for example, has no problem taking in what is two hundred plus twelve thousand and 2 and then spitting out the correct answer (and a bunch of search alternatives).

But since you can't tell AIML you want a %number in your pattern, the best you can do is create a pattern like * PLUS * and then pass the two wildcard values off to some other code to process from the template. Below is a snippet from the aiml math file and is the AIML needed to do that. (Think is a template tag saying "execute but don't output". Then there are a bunch of assignments to variables, then an output header The answer is then javascript to do the math.)

<category> <pattern>* PLUS *</pattern> <template><think><set name="x"> <star/></set> <set name="y"><person><star index="2"/></person></set></think> The answer is

<script language="JavaScript"> var result = <get name="x"/> + <get name="y"/>;
document.write("<br/>result = " + result, "<br/>");</script> </template> </category>

But if it turns out that the * values do not contain numbers, you have already matched the pattern and are screwed on output. That is, the AIML reference guide says: As soon as the first complete match is found, the process stops, and the template that belongs to the category whose pattern was matched is processed by the AIML interpreter to construct the output.

It seems to say nothing about what happens if the result of the execution of the template is nada. So seems to me, you really want to have a more reliable match BEFORE you commit to the template. (When I tried this with A.L.I.C.E., it did come out with other output, suggesting it is going beyond AIML itself).

Complaint #4 Continuing the issue of matches generating no output... AIML creates a system with the potential for many rules that overlap and mask each other. E.g.,

<pattern>DO YOU WANT MOVIES </pattern> < template> <srai> Want Movies </srai></template>

<pattern> DO YOU WANT * </pattern> < template> No, I do not want * </template>

Both can match the input Do you want to go to the movies, but the AIML definition makes the first category match and the second never try with this input. If "Do you want movies" matches, it will remap the input into want movies and if THAT fails to match, you lose. No output. It would be better that if the system did not generate any output from a match, the match were considered as not having happened and the system moved on to another attempt.

Complaint #5 Categories using <srai> are hard to connect visually to the categories they are remapping to, making it impossible to see or guess what will happen. This is true normally, and some applications sort the patterns alphabetically into separate files by starting letter (a common and useful thing to do), which totally destroys the ability to see interconnections of patterns.

Complaint #6 It is hard to organize a collection of related chat information. There are only the category and topic mechanisms at work (and the file system), so everything must be shoehorned into them. Topics do not do a good job of encapsulating themselves. To launch a topic you have to manually enter a "set-topic" command in the output of some pattern. Which, by definition, means the category doing that is outside the topic (or it couldn't have matched). That is poor encapsulation and means lots of boring set-topics lying around. I would rather have the underlying engine manage topics automatically and have all topic data within the topic.

 
Article Start Previous Page 3 of 6 Next
 
Comments

Mike Rozak
profile image
What you really need to do is include probabilities in AIML. For example, instead of the synonym "George Bush" -> "George W Bush", include a probability of the synoym being correct, such as 90%. Likewise, "George" -> "George W Bush" might have a 1% chance. You might also have "George" -> "George Washington" with a 2% chance.

Also associate a probability with the context. If a player asks, "Does George like flying in Airforce one?", this will be parsed to "Does George W Bushlike flying in Airforce one?" (1%) as well as "Does George Washington like flying in Airfoce one?" (2%). However, some context logic will know that George W Bush is associated with airforce one, and have a higher probability for the context (90% context probability for a modern president, with 1% probability for anyone else).

Then, a combination of sentence-parse probabilities and context probabilities (1% x 90% = 0.9% vs. 2% x 1% = 0.02%) can disambiguate the meaning of a statement. This is a common speech recognition trick. (So you might want to learn about speech recognition, Viterbi searches, and Hidden Markov Models.)

I've already implemented this and am using it in my game, http://www.CircumReality.com .

You might find its use of text-to-speech interesting too. You'll find that your AIML tags for responses are completely inadequate, and need to include facial emotions, spoken emotions, and nuanced prosody.

You'll also find that hand-coding millions of responses isn't worth the work. Most of what players want to ask is more procedural, such as "Where is the nearest merchant/guard/toilet?" and "Did you see where Frank went before the murder occured?"

Kyle laozhao
profile image
well the dialog very interesting
sehr interessant!

you will find more in the http://sglab.cn/blog

Meng Mao
profile image
@Mike Rozak

Yeah, but your game has dialog like this:
http://www.circumreality.com/ScreenPreRelease4b.jpg

Mike Rozak
profile image
@Meng Mao

If you send me E-mail, I'll go into detail... but basically, without a mostly menu-driven dialogue system, players don't know what to say and/or get into ye-olde "guess the verb" problems that Zork and other IF often has.


none
 
Comment:
 


Submit Comment