Are there any challenges in Kinect engineering that you haven't had a chance to tackle yet, or is there something that you're looking forward to tackling?
DQ: I think the big one coming up is speech. We pushed speech pretty hard in Sports 2. There was speech in the first round of launch titles; Kinectimals obviously had speech. But from day one the entire UI was gonna be speech-driven. Every game event had to have speech incorporated into it.
But it was also a very say-what-you-see approach; in golf, you change club [by saying] "four iron," kind of thing. What I'd like to see and what we're investigating now is a more natural conversation way of talking to the Kinect, so you can say, "Hey, caddy, give me a five iron," or "Hey, caddy, what should I use now?"
We're looking at that now, improving the speech system, so I think that would probably be the one that I'm personally the most interested in, mainly because I did so much work with speech in Sports 2.
Do you think that at this point with most of the visual input, whether it's 3D data or skeleton data, you've now encountered enough situations where you have a good toolbox to solve any of those problems?
DQ: Yeah, I think so. It's interesting now, as we look at new ideas, how quickly the engineers who've worked with Kinect a lot can pick out what the challenges will be. "If we do this event or this style of game, these are the things that we're going to have to deal with." That's just because we have so much experience with it now.
Our 13 sports now have been so varied -- as we said before, the gestures vary from sport to sport, so we have a good cross-section of what we've been doing and how we've solved problems in the past. As new ideas come in, we can all think, "This will be a challenge," or "Yes, we could do that pretty easily; we can copy what we did in track and field."
I think the only place that you might have new frontiers is if you go to a totally new genre like an adventure game. We recently did an article with Blitz Games for Puss in Boots, and one thing discussed was that if the developers did one-to-one tracking with the character, the character didn't look heroic on-screen anymore, because people have an exaggerated assumption of how cool they look when they're doing things -- which isn't exactly a problem you have with Sports.
DQ: Yeah. If you look at Star Wars, what they've done there is some really interesting stuff, blending in that one-to-one with extra animation into that so you use both at once. That means you get your power moment.
I've played it a couple of times, and it's interesting. When you stand there and realize that the character on screen is really puffing up their chest and getting ready for a swing, you find yourself mimicking that, and start doing it yourself, because you're getting into the thing. We call it "augmenteering" at Rare, joining in with avateering, which is that one-to-one mapping of animation. We did a little bit of augmenteering in Sports, but most of the time we were trying to get the one-to-one -- the player in the game -- as much as we could.
When it comes to speech, how much of a problem do you have with accents?
DQ: The speech system at the moment has what we call acoustic models. I'm Australian but actually I run the UK English because I think I've been in England long enough that I've lost my twang. Say we have execs come across from the States; if we leave the kits in U.S. mode, it does go down for the UK people speaking. So the acoustic models are quite tailored to the models. The UK model contains Scottish, Irish, the thick, pommy accents, whereas the U.S. mode has the Southern, and all of the American ones.
The reason those models exist and are different is that they have to include those accents for the regions. Our biggest challenge -- we have a Scottish guy at work, and he has the thickest thick accent. He actually interviewed me, and I could hardly understand what he was saying. If we know it works for him, we know it works. He's our test case, basically. "Good. It works for him." (laughs)
Whenever you do speech, they always recommend getting native speakers in front of the game, so we were sending people out to Japan and to Germany and everywhere get the native speakers talking and testing in front of the game.
Basically, what we're doing is lowering a number so it's as low as it needs to be to detect the speech, but still high enough to reject false accepts. It's just a tuning; we just dial it back and forth. We always have that infamous week at Rare where I turn it down too low and the game's just jumping around on any noise because it's just accepting everything. My name is mud for a week, and then we just turn it up again. It's really iterative, just trying to find that special spot -- and that special spot's different for each acoustic model, so the U.S. number is different from the UK number. It's just a tuning process.