Gamasutra: The Art & Business of Making Gamesspacer
Beyond Fašade: Pattern Matching for Natural Language Applications
arrowPress Releases
October 31, 2014
PR Newswire
View All





If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 
Beyond Fašade: Pattern Matching for Natural Language Applications

March 15, 2011 Article Start Page 1 of 5 Next
 

[Is the age of natural speech here? Telltale's Bruce Wilcox delves into techniques for natural language processing and contrasts AIML and Façade's approach against his own award-winning ChatScript, showing a path forward for word-based game technology.]

Understanding open-ended simple natural language (NL) requires huge amounts of knowledge and a variety of reasoning skills. And then there's dealing with bad spelling, bad grammar, and terse context-dependent input. No wonder games severely limit their use of NL.

But speech is a coming interface. It solves input issues on small devices as well as being more appropriate than a mouse or controller for some games. Those old text adventure games were a lot of fun even with extremely limited vocabulary and parsers. Google now provides free servers translating speech to text for web pages, so the focus will shift to using that text.

Scribblenauts accepts text nouns and adjectives in its latest puzzle game and Telltale Games has a planned product mixing nouns and verbs. In other words, full natural language games are only a matter of time. Bot Colony is currently attempting this and Façade already did so (sort of) five years ago.

In this paper I will look at successive refinements to NLP (natural language processing) as they apply to games, reviewing AIML, Façade, and ChatScript. I will cover how the state of the art has evolved from matching patterns of words to matching patterns of meaning.

This paper will make the following general points:

  1. Script syntax and semantics have a big impact on ease of content authoring.
  2. Matching sets of words instead of single words approximates matching meaning.
  3. The absence of words is as important as the presence.
  4. Wildcard matching (in place of parsing) should be reined in to reduce false positives.

AIML (1995)

Unrestricted NL input has existed for decades – chatbots like Eliza and A.L.I.C.E. (left). But the output is barely useful. It takes a lot of data to make any plausible approximation to natural language understanding. This places a premium on making it easy to author content.

A.L.I.C.E is written in AIML. AIML is a miserable system for authoring. As a programmer, I think of AIML as recursive self-modifying code at the assembly level of language processing -- in other words, a content creation and maintenance nightmare. And its pattern matching is feeble.

Pattern Matching

AIML is simple. You describe a category (what others call a rule) as a pattern of words and wildcards and what to do when the rule matches the current input. AIML's syntax is XML based, with a pattern (input) clause and a template (output) clause, like the rule below:

<category>
<pattern> I NEED HELP * </pattern>
<template>Can you ask for help in the form of a question?</template>
</category>

The pattern must cover the entire input of case-insensitive words; punctuation discarded. The wildcard * binds to any non-zero number of words, creating patterns matching many inputs.

Problems with authoring arise instantly. Matching the sequence I love you in all sentences requires four rules, and still can't match I really love you.

<category>
<pattern> I LOVE YOU </pattern>
<template>Whatever</template>
</category

<category>
<pattern> * I LOVE YOU </pattern>
<template>Whatever </template>
</category

<category>
<pattern> I LOVE YOU * </pattern>
<template>Whatever</template>
</category

<category>
<pattern> * I LOVE YOU * </pattern>
<template> Whatever</template>
</category

If only AIML wildcards matched 0 or more words, then a single pattern would do. You can make the above patterns match I really love you by adding in a wildcard between I and you, but then it would also accept sentences like I will never love you. Wildcards in AIML are inadequate.

Recursive Substitution

The power of AIML lies in recursive substitution. It can submit input to itself using the <srai> tag and put the contents of * into the output using <star/>. The rules for mapping Can you please tell me what LINUX is right now to the question What is LINUX? might look like this:

<category>
<pattern> * RIGHT NOW <\pattern>
<template> <srai><star/></srai></template>
</category>

=> CAN YOU PLEASE TELL ME WHAT LINUX IS and then

<category>
<pattern> CAN YOU PLEASE * <\pattern>
<template> <srai> Please <star/></srai></template>
</category>

=> PLEASE TELL ME WHAT LINUX IS and then

<category>
<pattern> PLEASE TELL ME WHAT * <\pattern>
<template> <srai> TELL ME WHAT <star/></srai></template>
</category>

=> TELL ME WHAT LINUX IS and then

<category>
<pattern> TELL ME WHAT * IS <\pattern>
<template> <srai> WHAT IS <star/></srai></template>
</category>

=> WHAT IS LINUX and finally

<category>
<pattern> WHAT IS LINUX <\pattern>
<template> LINUX is an operating system. </template>
</category>

=> LINUX is an operating system.

AIML does everything this way. A.L.I.C.E. has 1300 rules merely to remove what it considers useless adverbs like really and accordingly in sentences.

Extended Patterns

AIML allows you to augment patterns in two ways. One is with a topic you can set on the output side and test on the pattern side. This prioritizes rules inside a matching topic over rules outside of it. So if the topic became set to "tasty breakfast food" then this rule would get priority.

<topic name="* breakfast *">
<category>
<pattern>I like fish</pattern>
<template> Do you like sushi? </template>
</category>
</topic>

The problem with this is that you have to dedicate rules outside of this topic to set the topic name for you to match. How many rules you will need and how to craft them will be significant issues.

The other pattern augmentation mechanism is the that clause, which adds a pattern match on the last output sentence given by the system. Rules that match a that clause have priority over rules that don't. You can use that to script continuations based on expected reactions to your output.

<category>
<pattern>yes</pattern>
<that> Do * sushi </that>
<template>I hate sushi</template>
</category>

This rule responds to a simple yes only if the last output was a "Do" question that ends in sushi.

With pattern augmentation, you can write some obscurely clever code. But obscure code is generally self-punishing and few people can write or read such code.

Complex Computation

On output, aside from writing text and sending itself new input, you can call the OS system shell or use JavaScript. AIML lets you set variables, but other than the topic variable you can't consult them on the pattern side. AIML uses variables to handle pronoun resolution. When you author an output, you also author what pronouns you are affecting by specifying a new value for their corresponding variable and writing patterns to remap inputs with pronouns. Adds to the authoring burden but it works.

Implementation

Implementing AIML is easy. All the data is precompiled into a tree structure and input is matched in a specific order against nodes of the tree, effectively walking a decision tree. The first matching node then executes its template, which may start up a new match or do something else.

Inherent Weakness

AIML's basic mechanic is also its weakness. Recursively self-modifying input creates problems.

First, you can't glance at code, you have to read it and think about it. XML is not reader friendly. It's wordy, which is fine for a machine but hard on a human. Just consider the simple that rule about sushi above. You can't just look at it and know what it does. You have to think about it.

Second, you can't see a rule and know if it will match the input. You have to know all the prior transformations.

Because of this, writing and debugging become hard, and maintenance impossible.

Summary

AIML is clever and simple, and a good start for beginners writing simple bots. But after 15 years, A.L.I.C.E has a meager 120K rules. AIML depends on self-modifying the input, so if you don't know all the transformations available, you can't competently write new rules. And with AIML's simple wildcard, it's easy to get false positives (matches you don't want) and hard to write more discriminating patterns.

Still, A.L.I.C.E. is a top chatbot, coming in 2nd in the 2010 Loebner Competition.


Article Start Page 1 of 5 Next

Related Jobs

Forio
Forio — San Francisco, California, United States
[10.31.14]

Web Application Developer Team Lead
The Workshop
The Workshop — Marina del Rey, California, United States
[10.31.14]

Programmer
InnoGames GmbH
InnoGames GmbH — Hamburg, Germany
[10.31.14]

Mobile Developer C++ (m/f)
Activision Publishing
Activision Publishing — Santa Monica, California, United States
[10.31.14]

Tools Programmer-Central Team






Comments


Ryan Andonian
profile image
I saw you during the poster session at GDC. I have to say it put a smile on my face to stand next to Michael Mateas during your session ;)

Mark Taylor
profile image
Seems to me you fall into Wittgenstein's Trap of thinking that the meaning of a word can only be mapped to yet more words. You 'parse' an expression and then look up the appropriate response, but all in all the algorithm understands nothing of the expression. Should one expect an intelligent response when the responder does not comprehend the speech? Language is more than a circular game of definitions. These NL interfaces will go so far, but no further.

Brian Moriarty
profile image
Enjoyed your poster session at GDC. Interesting work!

Sun Moon Hwang
profile image
Thank you for this great article. I see the advancement. Hope to see some more in the future!

JoseArias NikanoruS
profile image
Quite interesting...



Right now I'm designing a Visual Novel and I wan it to give the player/reader a greater feeling of agency in a complex story using some techniques that I've been trying to develop.



What I was hoping was that in a near future we could implement ChatScript along my branching-story techniques (and even others "engines", maybe one to simulate "character personality" based on a neutral script) to give the player the "Full Experience".



Let's work toward that!!

Matthew Mouras
profile image
Fascinating read on NL applications. I knew next to nothing and now I know a lot more - thank you! Congrats on your Loebner win. Brilliant stuff.


none
 
Comment: