Gamasutra: The Art & Business of Making Gamesspacer
Beyond Fašade: Pattern Matching for Natural Language Applications
arrowPress Releases
April 24, 2019
Games Press
View All     RSS








If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Beyond Fašade: Pattern Matching for Natural Language Applications


March 15, 2011 Article Start Previous Page 3 of 5 Next
 

ChatScript (2010)

I created a new chatbot language for Avatar Reality called CHAT-L, and later revised it into ChatScript, an open-source chatbot engine. It covers Façade's NLP capabilities more generally and more simply.

Suzette is the chatbot I wrote. She won Best New Bot in her debut at the 2009 Chatterbox Challenge and won the 2010 Loebner Competition (Turing Test), fooling 1 of the 4 judges.

AIML was a simple word pattern-matcher. Façade pattern-matched into discourse acts, a tightly restricted form of meaning. ChatScript aims to pattern-match on general meaning.

It therefore focuses on detecting equivalence, paying heavy attention to sets of words and canonical representation. It also makes data available in a machine searchable form (fact triples).

ChatScript Patterns

ChatScript uses a simple visual syntax, borrowing neither from XML nor LISP. And ChatScript discards the need to match all words of the input. This avoids both the 4-rule syndrome and a surfeit of wildcards on the ends of patterns. Here is a simple ChatScript rule:

s: ( I love meat ) Do you really?

The rule tests statements (s:). The pattern is in parens (parens mean find in sequence). It matches if I, love, meat are in direct sequence anywhere in the input. The output is after the parens.

Rule Types

s: and ?: are rule types that say the rule gets applied to statements and questions respectively. Façade and AIML discard punctuation and with it the distinction between statements and questions. ChatScript tracks if the input has a question mark or has the structural form of a question. Rules can be restricted to statements, questions, or u: for the union of both.

You can script witty repartee using continuations (a:, b:, etc.) which test input immediately following a successful ChatScript rule output. It's much clearer than AIML's that.

s: ( I like spinach ) Are you a fan of the Popeye cartoons?

a: ( yes ) I used to watch him as a child. Did you lust after Olive Oyl?

b: ( no ) Me neither. She was too skinny.

b: ( yes ) You probably like skinny models.

a: ( no ) What cartoons do you watch?

b: ( none ) You lead a deprived life.

b: ( Mickey Mouse ) The Disney icon.

Concepts

ChatScript supports sets of words called concepts, which can represent word synonyms or affiliated words or a natural ordering of words:

concept: ~meat ( bacon ham beef meat flesh veal lamb chicken pork steak cow pig )

Here ~meat means a close approximation of meat and is the equivalent of a Façade {iMeat} but is easier to read and write. Then you can create a rule that responds to all sorts of meat:

s: ( I love ~meat ) Do you really? I am a vegan.

The ordered concept below shows the start of hand ordering in poker.

concept: ~pokerhand ( "royal flush" "straight flush" "4 of a kind" "full house" )

The pattern:

?: ( which * better * ~pokerhand * or * ~pokerhand ) …

detects questions like which is better, a full house or a royal flush and the system has functions that can exploit the ordered concept to provide a correct answer.

You can nest concepts within concepts, so this is fine:

concept: ~food ( ~meat ~dessert lasagna ~vegetables ~fruit )

Hierarchical inheritance is important both as a means of pattern generalization and as a mechanism for efficiently selecting rules to test. Concepts can be used to create full ontologies of verbs, nouns, adjectives, and adverbs, allowing one to match general or idiomatic meanings.

Canonical Form

Instead of Façade's stemming, ChatScript simultaneously matches both original and canonical forms of input words if you use the canonical form in a pattern.

For nouns, plurals canonize to singular, and possessive suffixes ' and 's transform to the word 's. Verbs switch to infinitive. Adjectives and adverbs revert to their base form. Determiners a an the some these those that become a. Text numbers like two thousand and twenty one transcribe into digit format and floating point numbers migrate to integers if they match value exactly. Personal pronouns like me, my, myself, mine move to the subject form I, while whom, whomever, whoever shift to who and anyone somebody anybody become someone.

Façade wrote the following hard-to-read rule which only works in present tense:

(defrule positional_Is

(template (tor am are is seem seems sound sounds look looks))

=> (assert (iIs ?startpos ?endpos)))

ChatScript's simple concept below accepts all tenses and conjugations of the listed verbs:

concept: ~be ( be seem sound look )

If you quote words or use words not in canonical form, the system will restrict itself to what you used in the pattern:

u: ( I 'like you ) This matches I like you but not I liked you.

s: ( I was ) This matches I was and Me was but not I am

WordNet Ontology

Façade failed to use a major value of WordNet, its ontology. In ChatScript, WordNet ontologies are invoked by naming the word and meaning you want.

concept: ~buildings ( shelter~1 living_accomodations~1 building~3 )

The concept ~buildings represents 760 general and specific building words found in the WordNet dictionary – any word which is a child of: definition 1 of shelter, definition 1 of accommodations, or definition 3 of building in WordNet's ontology.

Pattern Operators

AND word relations are done using ( ) or using a quoted string.

You can do OR choices using a concept or [] :

s: ( I ~love [ bacon ham steak pork ( fried egg ) "green egg" ] ) Me, too.

Similarly ChatScript supports OPTIONAL using { } :

u: ( I ~go to { a } store ) What store?

The absence of words, NOT, is represented using ! and means it must not be found anywhere after the current match location :

u: ( ![ not never rarely ] I * ~ingest * ~meat ) You eat meat.

u: ( !~negativeWords I * ~like * ~meat ) You like meat.

And ChatScript finds words UNORDERED using << >>, a simpler syntax than Façade's. Finding words in any order makes it easy to write a single pattern like:

u: ( << you ~like ~meat >> ) I do like meat.

to cover Do you like meat? and Is bacon something you desire? and Ham is something you like.

If you need to know where the start or end of the sentence is, you can use operators < or >. But commonly one doesn't care.

s: ( < I know ) You say you know.

?: ( what is love > ) Love is a wonderful thing.

Wildcards

ChatScript has a collection of wildcards. The unrestricted wildcard * means 0 or more words. The ranged wildcard *~2 means 0-2 words, while *~3 would be 0-3 words. Ranged wildcards are useful because they don't lose control by letting unrelated stuff make a match.. Specific length wildcards *1, *2, *3 … require that many words exactly. And there are even backward wildcards like *-2 which will find the word two words before the current position.

Consider the following ranged wildcard pattern:

s: ( I *~2 ~love *~2 ~meat )

This allows I love ham and I almost really love completely uncooked steak but won't accept I love you and hate bacon.

The ability to match concepts combined with ranged wildcards yields both economy and precision. You can detect thousands of insults with this simple pattern:

( !~negativewords you *~2 ~negativeAffect ) Why are you insulting me?

Using the ~negativeAffect set of 4000 words, the above responds to you dork and you have an ugly face but does not react to you aren't stupid (~negative words includes not, never, rarely, hardly), nor will it react to you can always tell the poor from the merely stupid.

AIML automatically binds matching wildcard text so you can retrieve it during output. Façade generates no text output so it doesn't capture anything. ChatScript allows you to capture matched words using an _ prefix. If you say I really adore raw chicken in the morning with soggy beans, the following pattern captures the specific meat and vegetable words found in the input and echoes them in the output:

s: ( I * ~like * _~meat * and * _~vegetable ) I hate _0 and _1.

Variables

User variables in ChatScript start with a $ and contain text (which auto-converts to and from numbers as needed). They can be used within patterns as well as for output.

s: ( I be * ~genderWords ) refine()

a: ( ~maleGenderWords ) $gender = male

a: ( ~femaleGenderWords ) $gender = female

?: ( $gender << what be I gender >> ) You are $gender.

s: ( $gender=male I ~like ~male ) I prefer women.

The first rule detects the user saying they are some form of gender (girl, king, policeman) and immediately refines it by testing each continuation until it finds a match and then stores away the user's gender. The second rule (for later) says if the user's gender has been defined and they ask what it is, tell them. The third rule matches only if the user's gender is male and the rest matches.

ChatScript also has system variables including %rand (a random value), %length (sentence length), %month (current month) and %tense. You can use an infinitive verb concept and yet still request verbs be in past tense as follows:

u: ( %tense=PAST I ~like you ) What happened?

Fact Triples

ChatScript supports fact triples and represents concepts and other things internally using them. You can write tables of data yourself, and process them however you want, typically forming sets or user-defined graphs, storing information about each member. A table definition names the columns of the table, then the code for processing each table line, and then you just fill in the table. For example:

table: ~malebooks( ^author ^title ^copyright )

createfact( ^author member ~author )

createfact( ^title member ~book )

createfact( ^title exemplar ^author )

if ( ^copyright != "*" ) { createfact( ^copyright birth ^title ) }

DATA:

"Orson Scott Card" "Ender's Game" 1985

"Daniel Defoe" "Robinson Crusoe" *

With appropriate rules, you can then recognize when the user types in a title or author and know which is connected to what, the genre, and when the book was published. For example:

u: ( who * [ write author pen ] * _~book objectquery( _0 exemplar ) ) @0object

The rule reacts to a book's name only if it knows the author. The object query (a predefined function) looks for all facts where the contents of _0 are the subject and the verb is exemplar. If it fails, the pattern fails. Queries always store found facts in specified places (the default for this query is the fact set @0). The output prints out the object field of one of @0's found facts (e.g., "Robinson Crusoe" exemplar "Daniel Defoe"), i.e. Daniel Defoe.

Functions

You can define pattern functions and output functions. One could create a WHAT_IS routine to detect all forms of what-is questions and use it like this:

u: ( WHAT_IS ( LINUX ) )

This doesn't involve recursive modification of the input. You know what the function is supposed to do by its name and you can separately inspect the function to see if it covers all forms of the question with direct patterns.

Summary

ChatScript patterns are compact and powerful yet understandable, easy to read and write. Unless you go out of your way to be obscure, you can know immediately if the incoming sentence should be matched by your pattern. So patterns can be debugged and maintained readily (you can ask the system what concepts a word belongs to and vice versa). You can even add a special comment in front of a rule that gives a prototypical input sentence. This both documents what the rule should match and allows the system to verify that it does as a part of a regression test.


Article Start Previous Page 3 of 5 Next

Related Jobs

Subset Games
Subset Games — Seattle, Washington, United States
[04.23.19]

Platform Engineer
Disbelief
Disbelief — Cambridge, Massachusetts, United States
[04.23.19]

Junior Programmer, Cambridge, MA
Disbelief
Disbelief — Cambridge, Massachusetts, United States
[04.23.19]

Senior Programmer, Cambridge, MA
Disbelief
Disbelief — Chicago, Illinois, United States
[04.23.19]

Senior Programmer, Chicago





Loading Comments

loader image