Point Of View

It is quite common to use terms taken form spoken language and storytelling as a metaphor for the game camera point of view. 1^st person shooters put you in the eyes of the character seeing your virtual hands or weapon on the bottom of the screen. In 3^rd person games you see your avatar from behind. I usually use the term 2^nd person experience to looking at the front side of your character – in a mirror like visualization.

At first 3^rd person looks like an easy direction for our gesture games (Especially if you consider the ‘On-Body’ ideas presented in the previous post) However –if your interaction also involve side facing interaction,

3^rd person POV involved fundamental difficulties I didn’t anticipate.

If you assume the tracking system is based on data collected from a front mounted camera (such as the Kinect and the most frequent installation of the PrimeSense sensor). It means the best tracked limbs will be those closer to the TV set. The leg and arm that are far from the set have high probability to suffer from occlusion. Naturally – the tracking of occluded limbs is less accurate – and highly depends on statistical pose models.

Let’s go back to the POV question – a 3^rd person representation will show the user the back side of his avatar. When side facing – the occluded limb will be the most visible one – and will draw most of the attention. This fundamental effect is sometimes so strong that most players will think something is wrong with the tracking algorithms.

In 2^nd person / mirror view, the non-occluded limbs are the one also most visible to the user. The occluded limb inaccuracy is easier to forgive, as it also seems occluded in the avatar’s rendering.

I am not claiming at all that we should avoid 3^rd person completely – merely pointing out on the challenges involved, and the considerations one should be aware of from the game design phase.

Some historical examples for 1st and 3rd person views:

1st person history: 3D Monster Maze (1981)

1st person shooters history: Wolfenstein 3D (1992)

3rd person history: Tomb Raider 1996

Simple game scenes

When people ask how you naturally walk forward in a gesture game, I usually warn of the possibility of hitting the TV set. Navigation is one of biggest challenges for 3D games in general – not only gesture games. Regardless of the chosen paradigm; walking in a 3D virtual world, represented on our 2D displays may be a very frustrating experience: you it’s hard to get a good sense on the depth of the rendered items and scene walls. This is also why so much 3D platform games sucks. Jumping to a higher platform and realizing it is too far only after falling to a lake of lava is not an enjoyable experience! At the end – only extremely talented level editors manage to create fun 3D platformers. For this blog – let’s get lazy this time, and postpone the navigation challenge to future posts…

Rail game

Actually – there are many fun experiences you can create without actually dealing with the navigation problems:

Static camera shooting games where enemies pop out behind cover, or come closer to the avatar to practice some martial art.
On-the-rails’ shooters – where the camera motion is predefined. Camera can stop when reaching enemies or when the avatar is behind a cover and continue when level is cleared
2^nd person martial arts and melee combat
2^nd person dancing game

Some historical examples for static scene + on the rails:

Static camera shooters history: Prohibition (1987)

Static camera shooters history: Operation Wolf (1987)

Rails history: Operation Thunderbolt (1988)

Rails History: House Of The Dead

In AngryBotsNI – we tried to play with different POV schemes:

You begin in 2^nd person ‘get to know your avatar’
Once you learn how to teleport you can jump to different scenes or levels
Each level has different POV, so you can experience also 3^rd person view

AngryBotsNI 3rd person view

AngryBotsNI 2nd person view

Coming up next: Part 3 - Shooting!

Microsoft Kinect did exceptionally well in bringing full body gesture gaming platform to the masses. The system consists of a 3D sensing solution from PrimeSense, together with a fancy 4 microphones beam forming array, speech recognition algorithms and top-notch computer vision algorithms running on the Xbox. It’s not only HW and algorithms: Microsoft Studios division also did a great job in bringing new experiences, such as those found in Kinect Adventures, and newer interaction models found in some of the Kinect Fun Labs (Check out Air Band!).

But putting aside several success stories, such as Kinect Sports, GunStringer, Halfbrick’s Fruit Ninja Kinect, Harmonix’s Dance Central; the majority of the available Kinect games leave much to be desired. It looks like gesture gaming creates ultra-casual experiences at best. In some cases – it seems like it’s even a step backwards from the Nintendo Wii catalog.

Of course, current generation of the technology has its own limitations – such as limited field of view (FOV) and less than finger resolution. However – the fidelity of the available input is still light years ahead of anything else we had in the past.

Until recently – the degrees of freedom available in the controller was predefined – out of the scope of the game designer (Harmonix’s Guitar Hero and Rock Band are of course an exception). The controller was something pre-defined, by the console maker, the PC HW vendor, or by the OS (IE: If you use a Windows machine – you can assume a keyboard and a 2 button wheal mouse). On the other hand - endless degrees of freedom is not only a blessing –it takes tremendous amount of design and research to define a successful new control scheme (Think of how all the 1^st person shooters, on the PC they converged to a nearly identical keyboard mapping and mouse scheme)

ATARI2600 Joystick

XBOX360 Controller

Back in the 80s, Atari2600 featured a single button 8 direction Joystick and a peddle. The input was well defined and accessible by both gamers and the rest of us. The NES added more buttons, but addressing the gamer’s endless request for power, latest generations of the controllers found in the Xbox and PlayStation feature 2 analog joysticks, 1 D-Pad, 4 action buttons, and 4 triggers. They are master-pieces of design and ergonomics. If you manage to get the hang of it you get exceptional level of control – most probably exceeding those found in many professional markets and even military devices. The F16 Joystick is years behind what the average teen uses to fly his virtual battlefield!

PS Move

XBOX Kinect

The downside of this evolution is that it left behind the majority of humanity. Those not interested that much in spending the time necessary to master all the buttons and sticks have differentiated from those who do. We call the two groups now ‘gamers’ and ‘casual gamers’. After my first play experience with the Sony PS Move controller, I tried to compare it to Kinect and had two main conclusions: “It’s better in that you have a button under your pointing finger. But, its worst due all the rest of the buttons there … WTF!?”

Yes - I am old now. I don’t have patience to learn what to press – it’s too much. I just keep pressing buttons until something happens. But I certainly do not agree to be cast out of the gamers club!

One aspect of the GUI revolution was making the display ‘soft’. This means the application can alter the display according to its state – to reflect the most useful controls at any given time. The contrast of an analog mechanical gage might be better than the LCD representation, but the benefits of being able to change it dynamically make it worth the sacrifice. The gesture interfaces, NUI direction, are ‘softening’ the input device. You can create a ‘virtual’ controller that will best suit the current application. You have a wheel when you need to drive and a sword when you need to cut (Only fruits – of course! Make salad not war!)

Going back to the main topic – I believe it is well possible to create responsive and challenging gesture games. Games that will be really fun – and not only in ‘party mode’

Many people imagine gesture control as an interface in which users need to memorize complicated gestures, that once detected trigger some kind of operation. So – gestures are simply complicated full body encoding for the controller buttons? Does not seem fun to my taste. I am much more in favor of gesture 1:1 mapping, where the motions are analogically mapped to control aspects.

PrimeSense, as the provider of the 3D technology – created several concept gaming interactions over the years. Thankfully – many of those demos are available in the openNI Arena website.

Initial prototypes demonstrated boxing experience, as well as basic body mapping – moving an Avatar in a boxed game area.

CKJ

Boxing

Items on Body

Once skeleton tracking algorithms were stable enough – we also demonstrated full body tracking. Mapping the motion to the Ogre3D’s Sinbad character was an instant success – and not only due to the extremely good job made by the original modeler or the talented computer vision researchers. One could draw the swords Sinbad holds on his back – merely by bringing both hands behind your neck. Drawing the swords was an extremely pleasing experience. You could easily imagine how you could battle enemies attacking Sinbad (Even though – this demo does not have any enemies). This was the first demonstration of a concept in gesture game I call ‘Items on Body’

Sinbad

SinbadNI: http://arena.openni.org/OpenNIArena/Applications/ViewApp.aspx?app_id=466

There are several benefits to this approach:

· You don’t need to memorize gestures – as the interactions are defined by the visible items on your avatar. It’s natural to touch what you see.
· Since the interactions are mapped to your body – you don’t encounter depth perception problems often experienced when trying to touch items in the virtual world. You can even enjoy ‘muscle memory’ once you know how to operate your items. The swords, in examples are always on your back. In the heat of the battle you can keep your eyes on the enemies, just as you would with real weapons you mastered

This all sounds good and easy – but the reality is always more challenging. To actually make it work, you will also need to deal with issues of retargeting and false activations

In case the avatar model body proportions are significantly different than those of the users, naive mapping the user’s skeleton joints to the avatar will have undesired results. Imagine the user touching his head when the avatar’s hands are much shorter. If you will implement the detection by a collision box on the avatar – it will never be triggered. You can implement a collision box on the user skeleton instead – but then you will sacrifice the learning curve, as learning to interact with new items will not be according to the visible interaction of the avatar’s hand with the item. This can be solved by smarter retargeting algorithm that will take the proportions differences into account or by scaling the model’s joints to match the user’s dimensions.

False activations are commonly related to the limitations of the tracking algorithm, and its behavior on occlusion, complicated poses, or motion blur. You can surely expect that the user’s hand will reach the item’s hot-zone unintentionally.

False triggering can be compensated by adding additional activation requirements:

· Temporal requirement: A short pause before triggering the item (During the pause there should be some animation)

· Require two hands operation (Like Sinbad's swords)

· Add requirement of touching and then moving in a certain direction

o Examples:

§ To remove the crown you need to touch it and then raise your hand

§ To remove sword, dagger or arrow you need to touch and move in a sensible direction

o If after touching the hand moved in another direction the operation is canceled

o One simple implementation is defining two collision boxes for each item – and requiring the user to pass through both in the right order to actually activate the item. As always – it is highly recommended to add as many visual feedbacks to the correct operations (Popping out the sword a bit, glowing and making sound)

Some of those ideas are demonstrated in the new Unity3D integration example: AngryBotsNI

AngyBotsNI

AngryBotsNI: http://arena.openni.org/OpenNIArena/Applications/ViewApp.aspx?app_id=586

Coming up next: Part 2 - POV and scene

Beyond casual: thoughts about gesture gaming

New to Beyond Casual? – start from part 1!

Part 2 : POV and basic scenes

1st person shooters history: Wolfenstein 3D (1992)

1st person shooters history: Wolfenstein 3D (1992)

Simple game scenes

Part 1: Opening and 'On-Body' Items

Items on Body