Charging at the Challenges of Kinect
June 25, 2012 Page 2 of 3
Does that have more overhead than other solutions?
DQ: You'd be surprised. 1600 clips sounds like a lot, but the thing is we record them quite quickly. I just wrote a tool, basically, that we ran on five dev kits at once. We had family days at Rare, so everyone would bring in their kids and partners. We wanted a wide cross-section of people doing the swings. Everyone would stand in front of their dev kits, and we would say, "Turn to the side. Do a golf swing." And we would just record them all onto the server.
The other interesting thing is, once we had all those clips, the engine doesn't really need to tag them up. We actually gave it to our testers and said, "Here's a hundred clips. Spend the next hour tagging them." They can just go through in a video-editing tool and say, "Here it is. Here it is. Here it is."
So it's not really an engineering-driven problem. That really helps as well. That's basically how we did all those tags. Now we've done that with golf, we're actually doing that with all of our events.
Is it a more effective way to determine natural motion -- the kind of motions players will do?
DQ: It's another tool in the tool belt, basically. The machine learning system we use in golf is very discrete; it's good at detecting specific events: the ball should release now. For example, table tennis is a very analog, skill-driven system, so it's a different kind of gesture.
You have to look at what you're trying to detect and then pick the right tool. Machine learning is just another one of those tools -- a very powerful one. I don't think we could have done golf to the level that we did without having that system.
You've worked with Kinect since the Project Natal days. Has Kinect come further in terms of recognizing people's movements and recognizing multiple people in front of the camera than you actually anticipated?
DQ: Yeah, I think so. Since the Kinect launched, we've had two upgrades to the tracking system: more data sets, more training. Every time that's happened we've seen it getting better and better. Whether it's beyond my expectations, I was pretty blown away the first time I saw it (laughs), so it's a very high bar.
I know you're working on Sports, and that sort of does limit things. You're going to pick specific sports. When you're working with the designers, do they have to come to you and say, "This is the idea we want to do. Can you figure out a way to do this in engineering?" Or is it more of a back-and-forth where you're like, "This is the kind of tracking that is possible"?
DQ: It's definitely a back-and-forth. I'd say for Sports 2, they picked a ton of tough ones for us: darts, baseball, and golf. When they first suggested darts, I was almost in disbelief.
Because your hand's going to be right in front of your face.
DQ: Absolutely. For the precise motion that they wanted, I was almost one of the guys going, "No, no, no. We can't do that." But then you look at it and you kind of look at the how could we do this kind of stuff. Darts is actually brilliant; it's one of my favorite games in Sports 2.
Darts uses a system nobody used at all in Sports 1; it's almost entirely around the depth feeds, that image feed of how far everyone is from the thing. We actually don't use the skeleton as much.
That's not something we really did much in Sports 1. It's just looking at all the information Kinect gives you and working out which bits you should look at to run the system that you want. The skeleton tracking, the depth feed -- all kinds of stuff.
Is it as much about excluding information as it is about including information?
DQ: It's definitely working out the context -- exactly what you're looking at. An example of tailoring the information was the boxing punch in Sports 1. Initially, we were looking at the skeleton feed, thinking that would be the best way to detect it; obviously, as the hand launches forward, that's a punch.
But since the hand's in front of your body, it's one of those occlusion issues. The skeleton feed can struggle with occlusion. So in the end, we turned to the depth feed and painted these panes of glass in front of the player. When you punch through the panes, if they all broke; that's how we did the punch.
It's one of those instances of taking the consistent information the game is receiving, and trying to look at specific bits [of that information]. That can vary from sport to sport depending upon if it's an analog-y moving game, or a precise dart throw, or a specific moment like a golf swing for when we want to release the ball. So they're all quite different problems.
Page 2 of 3