|
While the Xbox 360's Kinect has proven popular with the mass consumer, developing games that accurately reflect player movement, and really take advantage of the 3D motion-sensing capabilities, has been a major challenge.
Here, David Quinn, who works at Microsoft's Rare studio in the UK as a Kinect engineer, details how he has approached different challenges when developing for the system and how he's handled them, over the course of developing Kinect Sports and its sequel.
How do you do a game like darts, where most of the player's arm is occluded by the body? How do you handle golf, when the Kinect camera loses track of the player's arms during the swing? How do you handle different accents across the UK and the U.S.?
Since Rare is a Microsoft first party, does the stuff you write end up going back into the Kinect SDK?
DQ: There are a couple of things that Rare has done have gone into the SDK. The avateering system, we did that at Rare; that was when you take the 20-joint skeleton and turn it into the 70-joint avatar. That was done at Rare. And this machine learning system that we've recently built with the platform team for Kinect Sports 2; we helped out with that, as well. They did the more mathematical side, and we worked on the tools.
Have you seen implementations of Kinect in third party games that have impressed you or that do things that you weren't expecting?
DQ: Sure. What Mass Effect had recently done with Kinect's speech system is an excellent use of speech. We pushed speech in Sports 2; that was always going to be a huge thing for us. It was going to be a key thing, a differentiator from Sports 1. But what the Mass Effect guys have done is bring it into a core title, showing it could be used with a controller. It doesn't have to be the "get up and dance" kind of experience. You can use speech in Kinect in a more core title, and it really demonstrated that. I think from here on in you'll see a lot of speech in core games.
Are you primarily concentrating on the skeleton and the visual tracking, or do you work a lot with speech as well?
DQ: I work with both of them, yeah. It's odd; Kinect is like a brand, but it's actually a group of technologies, really. I'm kind of the Kinect rep at the studio, so I kind of touch both. I did all the speech work for Sports 2, basically by myself, and then quite a bit of gesture work as well. The machine learning system in golf was kind of my responsibility as well.
Can you describe what that accomplishes?
DQ: For golf, the major problem is the player's side faces the camera, so we don't actually get a great feed off the skeleton tracking system, because the back half of the body is completely occluded. All those joints are kind of inferred, basically. It gives a good guess of where it thinks it is, but it has no real meaning.
So when the player does a backswing, it cuts their hands a little, detecting when they do a forward swing. We worked out a codey, hacky job -- "hacky" is a bad word -- an unscientific job of running the animation. But when the player actually hits the ball and it flies off into the air, that has to be very reliable, because it's so detrimental to gameplay. Obviously, that's the entire game: hitting the ball.
So, early days of golf, we kind of had it so you could to do a full backswing and we'd just kind of drop your hands, because we didn't want the ball to go, but our hand-coded system would actually release the ball.
That's when we went to the ATG guys, the advanced tech group in Microsoft: "This is kind of where we're seeing. We've got a problem with the golf swing; do you have any recommendations?" They came back with this idea of creating a machine learning system for gestures.
What we basically ended up doing was recording about 1600 clips of people doing golf swings in front of Kinect, tagging in the clip where the ball should release, and then getting the computer itself to work out what's consistent among all those clips.
Then what happens is it creates a trainer and a classifier and move around that classifier at runtime, so we can pipe in a live feed into the classifier, and it can go, "Yes, the ball should release now," because it's been trained on a load of clips. It knows when it should happen. When the golf ball flies off in golf, it's done in that system; there's no hand-written code. It's all mathematical.
|
At some point, I'd love to have "semi-intelligent conversations" with NPCs by using my voice (as apposed to the hackneyed dialog wheel/tree where none of the proposed choices presented are options I would ever do/say).
Dynamically interacting with NPCs has been around since the old Kings Quest and Ultima Games (where you could type in keywords and have players respond to queries) and now that the technology is here can't we improve upon it ?
I'm not asking for a completely revolutionary artificial intelligent avatar system (a la Milo), I'd just like to be able to interact in a way that is less "mechanical" (static dialog trees) and more natural (in a way that resembles a "conversation")
Wouldn't it be cool to play an open world detective game (a la LA Noire) where one component of the game would be interviewing people (witnesses, suspects) to find clues using your voice, and NPC may respond to specific queries/keywords (i.e. "Where were you Friday Night?", "What do you know about Fredrick Pierce?")
...Or have a way of "bartering" with NPCs over price in an RPG? ("I'll give you $500... How about $580...$520 is my final offer")
Seems to me the voice aspect of Kinect has the most potential and the one most criminally underutilized.
On a different note, it's interesting that articles like this one and the one a few days ago from Harmonix are coming out just as Steel Battalion is released to terrible reviews that are calling it flat broken and a black mark on kinect itself.
You might be right (i sure hope not). Google does have a developer API which dictates as you talk:
http://android-developers.blogspot.com/2011/12/add-voice-typing-to -your-ime.html
(but similar to Siri, it does require internet access). The utilization of language on Kinect I've seen seems half-baked most of the time. (I still laugh at the memory of "Lightsaber...ON!" as presented at E3.) That being said, I've been very pleased with Google's recognition accuracy (even without a grammar to choose from "options" like Kinect does)
I'm not sure if the previous Kinect voice enabled games (i.e. Mass 3) suffer from a "limited grammar" due to technical reasons (i.e. the recognition accuracy is just not up to par for anything advanced) or lack of "inspiration" (Bioware didnt want to invest much time adapting the experience for Kinect and in effect making 2 separate games).
I'd just like to see something from Microsoft moving toward the Natal/Kinect vision they sold us 3/4 years ago. (I'm not talking full fledged "Milo" here, I'm just asking for some
rudimentary non-critical character interactions). Its one of those things where if you can show us something compelling (and provide us the tools) we'd jump at the chance to offer new interesting experiences with the tech.
http://www.xbitlabs.com/news/multimedia/display/20120620215832_John_Carmack_Virt
ual_Reality_Gaming_Is_The_Next_Big_Step.html
which I think was by far the best thing at E3.
http://itunes.apple.com/us/itunes-u/linguistics-lectures/id425738097
On top of context, there's a chaotic pattern of cadence, speed, tonal sweeps and etc that humans use to understand each other.
If you listen to isolated words from natural language speech it's freakin hilarious.
That said, if SIRI was a DARPA project licensed to private industry, I wonder what the military is using right not to monitor phone conversations? I wonder when they will allow private industry to license that? Did IBM's WATSON use a speech recognition system or did they fake it? I wonder if they are reaching out to the game developer community?
Gaming studios have been disappearing over the course of this generation: it's quite startling, and I can't help but fear that Rare is next. In an ideal world, I feel Rare should go the way of Bungie. With Scott Henson at the helm, I doubt it's possible at this point, but in my eyes, it's probably their best bet to survive.