It is quite common to use terms taken form spoken language
and storytelling as a metaphor for the game camera point of view. 1st
person shooters put you in the eyes of the character seeing your virtual hands
or weapon on the bottom of the screen. In 3rd person games you see
your avatar from behind. I usually use the term 2nd person
experience to looking at the front side of your character – in a mirror like
visualization.
At first 3rd person looks like an easy direction
for our gesture games (Especially if you consider the ‘On-Body’ ideas presented
in the previous post) However –if your interaction also involve side facing
interaction,
3rd person POV involved fundamental difficulties I
didn’t anticipate.
If you assume the tracking system is based on data collected
from a front mounted camera (such as the Kinect and the most frequent
installation of the PrimeSense sensor). It means the best tracked limbs will be
those closer to the TV set. The leg and arm that are far from the set have high
probability to suffer from occlusion. Naturally – the tracking of occluded
limbs is less accurate – and highly depends on statistical pose models.
Let’s go back to the POV question – a 3rd person
representation will show the user the back side of his avatar. When side facing
– the occluded limb will be the most visible one – and will draw most of the
attention. This fundamental effect is sometimes so strong that most players
will think something is wrong with the tracking algorithms.
In 2nd person / mirror view, the non-occluded
limbs are the one also most visible to the user. The occluded limb inaccuracy
is easier to forgive, as it also seems occluded in the avatar’s rendering.
I am not claiming at all that we should avoid 3rd
person completely – merely pointing out on the challenges involved, and the
considerations one should be aware of from the game design phase.
Some historical examples for 1st and 3rd person views:
1st person history: 3D Monster Maze (1981)
1st person shooters history: Wolfenstein 3D (1992)
3rd person history: Tomb Raider 1996
Simple game scenes
When people ask how you naturally walk forward in a gesture
game, I usually warn of the possibility of hitting the TV set. Navigation is
one of biggest challenges for 3D games in general – not only gesture games.
Regardless of the chosen paradigm; walking in a 3D virtual world, represented
on our 2D displays may be a very frustrating experience: you it’s hard to get a
good sense on the depth of the rendered items and scene walls. This is also why
so much 3D platform games sucks. Jumping to a higher platform and realizing it
is too far only after falling to a lake of lava is not an enjoyable experience!
At the end – only extremely talented level editors manage to create fun 3D
platformers. For this blog – let’s get lazy this time, and postpone the navigation
challenge to future posts…
Rail game
Actually – there are many fun experiences you can create
without actually dealing with the navigation problems:
Static camera shooting
games where enemies pop out behind
cover, or come closer to the avatar to practice some martial art.
On-the-rails’ shooters –
where the camera motion is predefined. Camera can stop when reaching enemies or
when the avatar is behind a cover and continue when level is cleared
2nd person
martial arts and melee combat
2nd person
dancing game
Some historical examples for static scene + on the rails:
Static camera shooters history: Prohibition (1987)
Static camera shooters history: Operation Wolf (1987)
Rails history: Operation Thunderbolt (1988)
Rails History: House Of The Dead
In AngryBotsNI – we tried to play with different POV
schemes:
You begin in 2nd
person ‘get to know your avatar’
Once you learn how to
teleport you can jump to different scenes or levels
Each level has different
POV, so you can experience also 3rd person view
Microsoft Kinect did exceptionally well in bringing full
body gesture gaming platform to the masses. The system consists of a 3D sensing
solution from PrimeSense, together with a fancy 4 microphones beam forming array,
speech recognition algorithms and top-notch computer vision algorithms running
on the Xbox. It’s not only HW and algorithms: Microsoft Studios division also
did a great job in bringing new experiences, such as those found in Kinect
Adventures, and newer interaction models found in some of the Kinect Fun Labs
(Check out Air Band!).
But putting aside several success stories, such as Kinect
Sports, GunStringer, Halfbrick’s Fruit Ninja Kinect, Harmonix’s Dance Central;
the majority of the available Kinect games leave much to be desired. It looks
like gesture gaming creates ultra-casual experiences at best. In some cases –
it seems like it’s even a step backwards from the Nintendo Wii catalog.
Of course, current generation of the technology has its own
limitations – such as limited field of view (FOV) and less than finger
resolution. However – the fidelity of the available input is still light years
ahead of anything else we had in the past.
Until recently – the degrees of freedom available in the controller
was predefined – out of the scope of the game designer (Harmonix’s Guitar Hero
and Rock Band are of course an exception). The controller was something
pre-defined, by the console maker, the PC HW vendor, or by the OS (IE: If you
use a Windows machine – you can assume a keyboard and a 2 button wheal mouse). On
the other hand - endless degrees of freedom is not only a blessing –it takes tremendous
amount of design and research to define a successful new control scheme (Think
of how all the 1st person shooters, on the PC they converged to a
nearly identical keyboard mapping and mouse scheme)
ATARI2600 Joystick
XBOX360 Controller
Back in the 80s, Atari2600 featured a single button 8
direction Joystick and a peddle. The input was well defined and accessible by
both gamers and the rest of us. The NES added more buttons, but addressing the
gamer’s endless request for power, latest generations of the controllers found
in the Xbox and PlayStation feature 2 analog joysticks, 1 D-Pad, 4 action
buttons, and 4 triggers. They are master-pieces of design and ergonomics. If
you manage to get the hang of it you get exceptional level of control – most
probably exceeding those found in many professional markets and even military
devices. The F16 Joystick is years behind what the average teen uses to fly his
virtual battlefield!
PS Move
XBOX Kinect
The downside of this evolution is that it left behind the
majority of humanity. Those not interested that much in spending the time necessary
to master all the buttons and sticks have differentiated from those who do. We
call the two groups now ‘gamers’ and ‘casual gamers’. After my first play
experience with the Sony PS Move controller, I tried to compare it to Kinect
and had two main conclusions: “It’s better in that you have a button under your
pointing finger. But, its worst due all the rest of the buttons there … WTF!?”
Yes - I am old now. I don’t have patience to learn what to
press – it’s too much. I just keep pressing buttons until something happens.
But I certainly do not agree to be cast out of the gamers club!
One aspect of the GUI revolution was making the display
‘soft’. This means the application can alter the display according to its state
– to reflect the most useful controls at any given time. The contrast of an
analog mechanical gage might be better than the LCD representation, but the
benefits of being able to change it dynamically make it worth the sacrifice. The
gesture interfaces, NUI direction, are ‘softening’ the input device. You can
create a ‘virtual’ controller that will best suit the current application. You
have a wheel when you need to drive and a sword when you need to cut (Only
fruits – of course! Make salad not war!)
Going back to the main topic – I believe it is well possible
to create responsive and challenging gesture games. Games that will be
really fun – and not only in ‘party mode’
Many people imagine gesture control as an interface in which users need to memorize
complicated gestures, that once detected trigger some kind of operation.
So – gestures are simply complicated full body encoding for the controller
buttons? Does not seem fun to my taste. I am much more in favor of gesture 1:1 mapping,
where the motions are analogically mapped to control aspects.
PrimeSense, as the provider of the 3D technology – created several
concept gaming interactions over the years. Thankfully – many of those demos are
available in the openNI Arena website.
Initial prototypes demonstrated boxing experience, as well
as basic body mapping – moving an Avatar in a boxed game area.
Once skeleton tracking algorithms were stable enough – we also demonstrated full body tracking. Mapping the motion to the Ogre3D’s Sinbad character was an instant success – and not only due to the extremely good job made by the original modeler or the talented computer vision researchers. One could draw the swords Sinbad holds on his back – merely by bringing both hands behind your neck. Drawing the swords was an extremely pleasing experience. You could easily imagine how you could battle enemies attacking Sinbad (Even though – this demo does not have any enemies). This was the first demonstration of a concept in gesture game I call ‘Items on Body’
·You don’t need to memorize
gestures – as the interactions are defined by the visible items on your avatar.
It’s natural to touch what you see.
·Since the interactions are mapped
to your body – you don’t encounter depth perception problems often experienced
when trying to touch items in the virtual world. You can even enjoy ‘muscle memory’
once you know how to operate your items. The swords, in examples are always on
your back. In the heat of the battle you can keep your eyes on the enemies,
just as you would with real weapons you mastered
This all sounds good and easy – but the reality is always
more challenging. To actually make it work, you will also need to deal with issues
of retargeting and false activations
In case the avatar model body proportions are significantly different
than those of the users, naive mapping the user’s skeleton joints to the avatar
will have undesired results. Imagine the user touching his head when the avatar’s
hands are much shorter. If you will implement the detection by a collision box
on the avatar – it will never be triggered. You can implement a collision box
on the user skeleton instead – but then you will sacrifice the learning curve,
as learning to interact with new items will not be according to the visible
interaction of the avatar’s hand with the item. This can be solved by smarter
retargeting algorithm that will take the proportions differences into account or
by scaling the model’s joints to match the user’s dimensions.
False activations are commonly related to the limitations of
the tracking algorithm, and its behavior on occlusion, complicated poses, or
motion blur. You can surely expect that the user’s hand will reach the item’s hot-zone
unintentionally.
False triggering can be compensated by adding additional
activation requirements:
·Temporal requirement: A
short pause before triggering the item (During the pause there should be some
animation)
·Require two hands operation
(Like Sinbad's swords)
·Add requirement of touching
and then moving in a certain direction
oExamples:
§To remove the crown you
need to touch it and then raise your hand
§To remove sword, dagger or
arrow you need to touch and move in a sensible direction
oIf after touching the hand
moved in another direction the operation is canceled
oOne simple implementation
is defining two collision boxes for each item – and requiring the user to pass
through both in the right order to actually activate the item. As always – it is
highly recommended to add as many visual feedbacks to the correct operations (Popping
out the sword a bit, glowing and making sound)
Some of those ideas are demonstrated in the new Unity3D integration example: AngryBotsNI