2D cameras as game peripherals already exist, but 3DV's work in progress, called the Z-Cam, is designed to sense images in full 3D using infrared. 3DV's marketing VP Tomer Barel outlined to Gamasutra how the company's depth sensing and gestural recognition technology could lead to new methods of game control.
Gamasutra spoke to Barel about the Z-Cam, which the company is aiming to release in the second half of 2008 for as yet un-revealed platforms, and he explained how the camera might work for development, synchronize with games, and change the way players interact with them.
Can you give me a picture of what differentiates this camera from some of the other pointing and sensing peripherals that have already been released on the market?
Tomer Barel: From a purely technical point of view, in any pointing device that you have, you have one, two -- or as many pointing devices as that might have -- points in space. You can add some information about that, but still these are points in space. What you have here is complete coverage of the whole field of view of the camera, in terms of depth information, or distance information.
So it's the difference between one or two points, to the whole area. And this allows you to really have control of games with however is fit, or most natural, to control the game. So this is real natural input, while, if you think about it, doing boxing with these two pointers in your hands is unnatural.
It's more natural than using a traditional controller, that's true, but it's unnatural still. And the way people play boxing, as I said earlier, demonstrates that. When you have full coverage of the whole field of view, either of your whole body, or whatever part of the body you want to use, you can be much more natural.
And you talked about how this sends its own light pulses forward from the sensor array on the camera, so it's not reliant on the lighting, and since it does depth, it's not reliant on the background; can you talk a little bit about that?
TB: So, if you compare this to standard 2D cameras, the way this works involves transmitting active illumination. That solves the light problem -- which is a big problem with 2D cameras, and the Eye Toy, or Microsoft Vision. That, obviously, not because they're not good cameras, they are good cameras, good technologies, but there are limitations to what you can do without light with 2D.
Here, you have you have your own illumination source, that enables you to work in any lighting conditions, including complete darkness -- there's no problem with that. That's one thing, the other thing that makes many people critical about 2D cameras is the background issues. And again, in 2D you cannot really separate your mother in the background from yourself.
Because it doesn't understand that one is the gamer and the other is someone that has nothing to do with it. Here, based on distance, you can decide to open a window, or just refer to a certain area, and have like a wall behind that area, and ignore everything else. So you can have a wall, a kind of a virtual wall around the player, and ignore whatever is in the background.
That's another big advantage of these 2D cameras. I would say another thing that is, I think, even more important than the other two things that I mentioned, comparing to other 2D cameras, and that is the fact that -- it's not only when the background is far apart, it's also your own background, or your own body.
You cannot detect gestures that are in front of the body. If you think about all the games you're playing with cameras that are 2D, they're always motion to the side. And that is very unnatural, because we do not do a lot -- if you think of our day-to-day lives -- we do not do a lot of things to the side. We do not do like this [gestures like someone playing EyeToy.]
We interact to the front. That's almost impossible to detect with 2D information; that's very easy to detect with 3D information. Even if it's not big movement. It could be static, still something, anything, is detectable because of the depth information.
One thing that you showed me was a current application to drop out the background and replace it with video, with the implication that it could be used with games that immersively take place around you. Could you talk about that a little bit?
TB: We have here two complementing capabilities: One is the ability to recognize gestures -- that I was talking about -- and the other, also based on the depth information, the capability to replace background. Because as I said, you can ignore the background, and if you can ignore it, you can easily replace it.
We do that real-time, sixty frames per second. You just replace it with whatever you want. The combination is that you can put yourself in a virtual environment, or the game environment, and control the game with gestures.
So actually for game design, you have two options: One is to control games by gestures and be represented either as an avatar instead of you, or in a first-person, kind-of have your hands or whatever, kind-of virtual hands, without the concept of your own image; and the alternative is to have your own image in the game, as you see in some of the Eye Toy games, but also use the control of gestures when you are in the game.
We can see a lot of potential for that in social networking -- anything that is online -- have yourself in the game. Not all gamers would like that, but in some situations, that's relevant for social networking, for MMOs, etc.
You were showing me the detection, and it seemed like it could detect five fingers and your palm pretty easily; is that, would you say, an accurate representation of what its capabilities are, presently?
TB: I also showed you the full skeleton. It can recognize whatever...
But I mean in terms of 'number of points'.
TB: It could generate any number of points. I mean, the thing to understand is: What the camera generates is the depth information. After you have the depth information, you decide what image processing algorithms to use to analyze this data.
So if you invest a lot in algorithms, you can have fifty points on your body, and generate, and recognize your head, your palms, your elbow, your shoulders, your feet; whatever you want. It's just a matter of what you need, and what you invest in terms of processing.
We are working, as a service to developers' community, on recognizing skeleton tracking. At the moment, we already have it recognizing the head, the two palms, and the torso; and on recognizing fingers, because we believe that this is the basis of developing further things. But, based on the application, if the application requires recognizing any other point on the body, there is no problem doing that. Investing in a process to do that.
What kind of resolution does a version that could go to market generate? What resolution image is it?
TB: Well, we haven't made a decision about the concrete specs of the camera that will go to the market. I can say that this camera that you see here, the Z-Cam, has an X-Y resolution of 320 by 240, and that's definitely enough to recognize fingers, as you could see.
That's in X-Y, and in depth, it has a resolution of 1 centimeter or more. Now there is a big advantage to the technology here, because you can open a window around yourself, and then the resolution spreads across a smaller area. So, the 256 variables spread across the resolution that is high like you need.
If you are not interested in the whole range of the camera, you can limit yourself to around the gamer, and get better resolution, in terms of depth.
I see. So if you limit it from the perspective of the depth that you pick, the movement control could be finer, basically.
TB: Yes. Exactly.
And about how far back could you potentially go from the camera, and still have it be an effective input method? Is that something you've tested?
TB: This prototype has a range of half a meter to three meters, or ten feet. And this is because this is what we seem to need for livingroom applications. The technology we have used in previous versions has been used up to a hundred meters. So, there is no real limit; you just need much more light, and that will make it much more costly to buy.
It's just a matter of what you want it for, and we don't want you to pay for things that you don't need the camera for. So this is why we limit ourselves to three meters.
And, speaking of costs, you talked about why you're not nailing down the cost of a potential production version yet. You think it will be in a comfortable range for mainstream applications. Could you talk about where you see that right now?
TB: Well, first of all, the camera will not be released rendered as a 3DV camera, so we will have a partner working with us -- obviously, a "bigger partner" could be a publisher bundling a game with this, it could be one of the first parties, it could be a hardware player of any other sort.
I guess, any pricing decision will be made together. But what I can say in terms of the cost is, as you said, it will be comfortably in the zone that is relevant for peripherals in the gaming arena, and it will be definitely possible to sell it as a bundle for the kind-of $69.99 price point that one can feel comfortable about. Or, sell it together with any webcam that is on the shelves in Best Buy or Fry's. Similar price-point.
So it's definitely comparable to the PlayStation Eye, or even just to, say, a Guitar Hero controller, potentially.
TB: Oh, the Guitar Hero controller is much more expensive. Although, if you think about Rock Band -- people who say that you cannot charge for peripherals should open their eyes and see what's happening with Rock Band, and how many units they've sold. This will be definitely cheaper than Rock Band. It will be comparable in range to the Playstation Eye, definitely.
One thing we talked about is that there is both a potential in the casual games, in the Wii Sports sort of vein, and for using this as a peripheral for more serious gamer-games, like an FPS or an RTS. Can you talk about how you see those two different, similar potentials rising?
TB: Because of the way that the market has developed around the Wii, and what we see happening around us, the natural thing to think about is: Instead of playing boxing with these two silly things in your hand, and doing it very quickly, you can do real boxing. Or you can do real baseball -- you can take your own bat from your room, and do the whole thing. And the camera, if the sophistication is enough, can detect the subtleties of how exactly you have held the bat, and how you hit, and you can even use gestures for the capture, and all that -- I'm not that familiar with baseball, because we don't play baseball in Israel!
But, it seems like a good direction. That's just one direction to kind-of take the game -- casual games, sports games -- and make the experience more natural, either with holding something, or not holding, depending on the concrete application.
Another direction is to use this for hardcore gaming. And we think the same kind of logic that makes you think, "Well, let's hold something that looks like a real baseball bat, or is a real baseball bat," holds for action games or action games and first-person shooters.
You can even combine a traditional controller and some relevant body gestures. You could duck, you could dodge, you could jump, you could sneak behind something. You could use your hands to do things that today are done kind-of with touching some buttons, like open a lock, throw stuff, throw a grenade, pick up a weapon, etc., and you can go all the way to thinking about new control schemes that are more convenient to control games.
Think about RTS: How difficult and unsuccessful it is for consoles. The reason is that there hasn't been found, so far, a control scheme that is convenient, that is nearly as comfortable for people as the mouse and PC. And we think that with your two hands, you can grab a unit, you can put it somewhere, you can use sort-of iPhone-like gestures to zoom in, zoom out, move your map, build something with your hands... So, just using your two hands, you can get an accuracy that is much better than any pointer, because you're used to that.
We can measure, because of the depth information, with respect to your eye, and this combination can create a control that can make RTS work on consoles maybe for the first time. So there's a wide range of genres and games that you can apply this control scheme to.
I can also envision a control scheme where you'd have your controller for things that are natural for your controller, but then there are things you could manipulate -- because you could have this camera basically running all the time while you play, but it isn't required for every aspect of...
TB: Absolutely. Absolutely. We definitely don't think that- There are things that the controllers we are used to using for many years are good for, and we are not trying to say, "Let's throw them away." But this could be a real enhancement to games that fit the controller. Because most games you use controllers for -- I mean, you use controllers for every game, but -- even for games that fit the controller, you do a lot of things that can be much more fun if you used your body for them.
Like the FPS, the action games; think about spells, think about, I don't know, Harry Potter. How much fun - They are now trying this with the Wii, but think about the context of using your hands to do this; holding a real wand, whatever wand you want, not this piece of metal; using both your hands, detecting very fine gestures that you do with your finger -- all this can make even a controller-based game much more fun.
One thing I want to talk about is, you talk about a 'handlebar' that you guys mocked up to do a racing game; could you talk about that a little bit?
TB: Yeah, so, again this is a concept- The camera is not limited to body. It just looks at whats in front of it, and recognizes what is happening. So any shape can be a part of what the camera recognizes, and to illustrate that, we thought: "Well, there are many games that you don't want to play with your bare hands; you want to hold something."
So we tried that with a motorcycle game. We bought a grip in a bike shop -- a plastic grip -- and we used that to control the gas. So in order to make it very easy to detect very fine movements of the grip, we put reflective tape -- which costs, I don't know, a cent or something -- on the grip, and every time you rotate it, you see the tape moving, and you can distinguish very fine movement between gas on and off, and even level of gas.
And think about the potential of this in the wider context: It doesn't have to be a grip. It could be a baseball bat, it could be a ball that you throw at something, it could be a basketball, it could be a sword, it could be a lightsaber, it could be anything that you would want to hold as part of the game. So the opportunities, the possibilities, are endless.
Are there any current partnerships that you're pursuing, or currently involved with, or are you just looking for people to work with on this project?
TB: I cannot comment on relationships that we have in this industry, and people outside of the industry, because of NDAs.
But you are still looking for people to work with, or interested parties?
TB: We are talking to many people now. We are mainly looking now for developers to experiment with the camera, and think about -- because again, as I said, the possibilities are endless, and what we are looking for is the creative power in the industry.
So we now have the ZCam in larger quantities, and we are really interested in developers coming -- independent developers or whatever -- in coming to us and cooperating around creating content. That's the main goal of this interview and launching, announcing this product.
And when do you think that it could potentially reach the market?
TB: Again, this depends. As I said, we are not going to release this on the market on our own; we're going to partner, and it depends on the relationships that we have, and again I cannot comment on specifics. What can say is that the product is going to be ready for mass production in the second half of 2008.