Looking at modern UI designs, might get you think everything today must be smooth. 10 years ago, a button click triggered an instantaneous screen change, as the application switches states. Now we know better: we understand our users get disoriented by unnatural immediate changes. After all – we rarely experience such scene changes in real life (Except when we wake up from a nightmare – certainly not the desired experience for our daily interactions…)
So we want to animate our visuals. But just blindly adding animation to every interaction results an unresponsive system. In order to effectively plant visual feedback, we need to examine the interaction mental model, and consciously plan the sensation we target.
Starting an electrical ignition car switch by rotating the key, is way easier than rotating a dynamo leaver to generate a spark in an antique car
Pressing a physical button is easier than pulling a leaver down to engage a high voltage power switch
Clicking a mouse button feels softer than physical button
Clicking a virtual button on a touch screen feels just like touching a piece of glass
Clicking something in mid-air in a 3D interface, by itself - does not feel at all. Crap.
Bagel cursor states
For each softening process, the designers had to compensate on its side-effects – most commonly, the reduction of natural feedback. Virtual buttons gets pressed down and up when clicked, and 1:1 animation happens when you scroll a list of items in your iOS device. You can actually hear a recording of a loud mechanical shutter when a modern camera phone shoots (In the poorly designed ones – you also hear the motor winding the virtual film…) When we migrated to touch screens, we lost the button click sensation. Going to in-the-air interface gives away with even the most elementary tactile feedback of touching a physical surface.
In a simplified model, we are mentally capable of doing 2 operations in the same time. But those operations are not symmetric. Our visual focus can only maintain one target in the center, while the peripheral vision can not reach a comparable resolution or attention level. Adding sound effects are critical in order to allow the user to operate the system on his secondary attention 'slot'. Try to take a picture with your Smartphone without looking at the screen. Naturally – the sound effects supply you with reassuring, so you will know the machine followed your intention.
Shadow touch queues
Consider simulating a virtual touch screen using gesture detection:
When interacting with a real surface – we receive some visual queues as we get closer and closer to the surface. You will see dropped shadow that gets closer and darker as your finger gets close to the touch point. On a glossy surface, you might see a blurry reflection that merges with your finger tip upon touch. If you think stereo vision is the dominant queue here – try touching a non glossy back illuminated screen that does not have the other queues (Your screen can be a perfect candidate) Try to do it slowly. Can you accurately anticipate when you will reach the touch point?
After implementing several continues cursor feedback, the users get some initial depth perception. But then we encountered another issue. Weaving your hands in the air feels like… well – just that. It certainly does not feel like touching anything. But the virtual surface simulation was not about limitless in-the-air interaction! The moment of touch, should feel different, just as it is in real hard surface. While the feedback during hovering is continues, the touch moment must create a non-continuous sensation. An immediate, non-continuous visual change, combined with crafted sound effects on click and release are important.
Non continuous touch point
Kinectimals / Frontier Developments
When Microsoft launched the Kinect, back in November 2011, we took all 6 launch titles for a spin. Reaching Frontier Development's "Kinectimals" – and after, patiently, waiting for the annoying opening video to pass – we got a bit confused. There we stood, several gesture-savvy engineers and researchers, petting an adorable cub, and trying to figure out how it managed to track our finger interactions using our same PrimeSensor system found in the heart of Kinect. Of course, after a few embarrassing minute we figure it out - it didn't! The game's virtual hand avatar brilliantly interacts with the pet in a natural expected way. If you put your hand on top of the furry head, you just can't help but petting!
The motion parallax effect has been
vastly used in computer games for decades. In the 80s its was something you
actually looked for in a game. If space invaders began with static camera,
moving craft was quickly simulated by moving the background. At some point –
the shooters and platformers rendering added the motion parallax effect – by showing
more then one layer of background, each moving at different speed. Back then, was
not simple – especially if you didn’t have access to HW 'blittering'
In our time – 3D rendering is commodity. And game programmers
don’t need to give it a 2nd thought – as moving the virtual camera
within the scene will also create the motion parallax. So for 3D content – we can
simply take it for granted. But history repeat itself – and motion parallax is
strong today in mobile platforms
Posters
For every process of revolution there are countless
evolutions that derive from it or support it. After all, market mechanics do
not tend to throw away all existing material
s upon changes. Upon the wide adaptation of TVs, some shows were actually
hearing music with some static slide or even just showing the orchestra. For a notable period commercials
were a mixture of just hearing the reporter citing the vendor’s slogans, or
just still stream showing the printed advertisement poster. We continue to
consume the older media with the newer inventions, as the cost of creating
everything from scratch is intolerable.
With the dropping cost of flat panel displays – digital signage
is replacing the printed posters. Some of those continue to involve and add a
level of interaction – but what kind of content will populate it?
Totally new content can be customized for the poster
dimensions, user’s state of mind, and technical capabilities of the media (Like
touch and gesture detection). Just like the common annoying little flash banners
– such signs can offer you to kick a ball, punch sponge bob, leave a virtual graffiti
or a funny augmented picture of yourself
The whole thing deserves several posts for itself of course –
there are countless amazing possibilities here!
But before the full revolution
happens – we already see those signs running legacy content:
Non interactive TV
commercials
Interactive internet flash advertisement
Slideshow of static posters
– that were designed for the correct location and audience, but do not enjoy the
interactive and dynamic possibilities of the medium
“Its allllive!”
If we can track the viewer of the poster – wouldn’t it be
nice to give him some motion parallax as he moves? But – our legacy content is
not 3D… Do we really need to redesign all our posters for that?
Let’s look on the design process of a poster. The designer
thinks on the place of the poster, at what distance the audience might be
standing. When done right, the visual composition is cleverly designed to draw
attention to the main merchandize and brand.
Technically speaking – he will be using Adobe Photoshop or similar package. Specifically –
he will use the features of layers extensively. There will be at least one
layer for the background, and separate layers for the objects, captions and
more. Before production – those layers will be flattened and sent to print.
Going back to the motion parallax – what if we consider taking the layers
before this process? The layer separation, can be combined with estimation of
the designated virtual distance of viewer, and printed items – and almost
automatically provides us with what we need in order to create a motion
parallax sign!
The focus here is not on technology – but of the production
process. Using this small idea – we can utilize the existing content and commonly
used tools. Solving the content evolution may be one of the kicks the industry
needs in order to justify the cost of adopting this new technology!
Conversion process:
The process begins by taking the original psd file that was
used to create a standard poster
Estimate the user distance that was targeted by the designer
at the time of creation
Estimate the appropriate distance of each layer from the plane
of the poster
Simple geometry can now give us the ‘native’ scale and
distance of each layer
Rendering process:
Pick the active viewer (IE using face detection)
Using the active viewer position, pan and scale each layer
A nice addition is adding some ‘shallow depth of field effect’
on the background. It need not be accurate – our brain is not that sensitive (Our
pupil’s F-number changes all the times with various lighting conditions anyhow).
As the viewer get closer and closer –
the background can blur away. It also nicely draws the attentions towards the
main object
If you follow this blog and still haven’t experienced different gesture
games modalities by yourself then -mister- something surely went wrong! Its all too easy to get everything up and
running on your home desktop – there should not be any excuses J
Today’s post features some technical clarifications (Mainly for readers new to the Kinect hacking community)
Get the HW
PrimeSense powered devices: Xtion and Kinect
Get yourself an off-the shelf openNI compliant depth sensor:
it can be either a Kinect or an ASUS Xtion.
Amazingly, until recently such equipment was considered an ultra-high-end piece of technology – now the Kinect is out, you buy it online for around 150$...
An openNI compliant computer vision middleware, free for users
of PrimeSense based sensors (Like Kinect). Nite gives full body tracking, hand
tracking, user segmentation and more.
SensorKinect:
An openNI compliant Kinect sensor – that will feed the depth
stream to the Nite computer vision algorithms.
A package of openNI, Nite and Xtion sensor driver, can be downloaded at:
In general: I highly recommend getting to know Unity. This
game engine has recently gained enormous momentum, due to many well-made
aspects: from its amazing IDE and workflow to its equally impressive cross
platform nature. Even the business model is impressive – you can start for
free!
A part of the openNI website. Basically it’s all free
downloads, many of which comes with full sources. You may register freely,
enjoy the available content and upload your own demos!
Based on Unity3D’s AngryBots sample project, this is our concepts playground. Don’t be tricked to think this is just another
example. Inside you can find many game modalities, and experiments. Current version demonstrates: On
Body Items, Shooting,
POV
changes, cloning,
etc..
AngryBotsNI sources and binaries are freely available in the openNI Arena
The guys keep evolving this – and coming versions will also
include walking
schemes, flying, gliding and more - so stay tuned!
(Credits for this work belong to Len Zhong, Geva Tal and Ran Shani - thanks guys!)
If you are serious in getting into gesture gaming – I
suggest downloading it and going over the various demonstrated
experiences. I can easily imagine how picking and polishing the right modalities, can combine with your original assets and game design, into a really
excellent – non casual game.
…At no point in this movie is the line "Me Tarzan, you Jane" spoken. When Jane and Tarzan meet, it is she who initiates the verbal exchange, repeatedly indicating herself and giving her name until he repeats it. She then points to him, indicating that she wants to know if there's a word for who he is as "Jane" is the word for who she is, until eventually he understands and says, "Tarzan."
It seems human communication combines words and gestures – could Tarzan and Jane communicate on voice only? And how did he got that cool hair-cut living in the jungle? Man – it does not make any sense
Jungle Hunt (1982) …totally unrelated sorryJ
The story of the personal virtual assistance
When I speak with people about natural user interfaces (NUI) some people just think that voice is the answer to everything. “After all – talking is the most natural thing, right?”. Yes, talking is natural – no argue about it. The verbal language is only a part of the picture.
Sci-Fi has always been much into the idea of having a digital servant to help us in our day to day tasks. And many wonder how come this didn’t became our day to day common interface with machines. But the full story is its not only related about the accuracy of the speech recognition algorithms alone. Having a butler that follows you is not a pleasant experience if this butler is just incapable. In order to really help you – he needs to be able to do things, and he needs to have a solid context of his master and surrounding.
Perhaps – this is the main reason why Apple’s Siri is actually meaningful: for the first time, the assistant has solid context and capabilities. Siri knows me, where I am, who the people in my contact list are, and she can send messages, add reminders and even solve algebra (With the help of Wolfram Alpha). Apple just managed to bring the context to an interesting level.
Gesture + Voice
Adults use voice to communicate. But if you close your eyes – your comprehension level will drop significantly. We look at each other and use the whole body language when we communicate. In many cases – body language holds more information than the spoken word
Imagine you go to shopping – when you will be asked which shoe you want you will probably just say ‘this one’. The communicated information is now encapsulated by your pointing gesture and surrounding context.
Now - back to reality.
Imagine an application that ask you which item you want to choose and you just point to it and say ‘this!’. Hey – I don’t need even voice recognition for that! Just detecting the pointing gesture together with some synchronized vocal burst might be enough. And it will work in any language! (Just like Tarzan and Jane…). If you don’t give up on the decades of work done in speech recognition, you can simply use it to dramatically improve accuracy. Body language takes a huge part in our communication context.
Assistant vs. tool
Another way to look at modern life – is that we all want to be served. Just like kings of prior centauries. But everyone will be kings now!
A dining king
So imagine the king and queen sit together to dine. They have like 5 cooks and 10 waiters. This surrounding staff is responsible to doing what is hard (IE: prepare a steak) or bringing what is out of reach. But if the food is on his plate, the king will prefer to take the fork and bring the food to his mouth by himself (Asking the servant to do it will be awkward and freaky)
Yes – sometimes we prefer to do stuff ourselves. In such cases – we prefer to use tools.
Making sounds while playing
Actually – I could continue the philosophical MMI discussion for an hour – but you guys read this to have fun, right? Let’s bring on gaming!
There are only a few examples of game experiences that combine gestures and voice. Some examples:
But let’s just stretch our imagination a bit furtherer…
"Bang bang, my baby shot me down"
If you look at kids when they play around – you will notice many of their games are actually role playing. They imagine they are some hero and they try to imitate the appropriate comic gestures. It does not end with gestures –they are also imitating the sound effects!
·Cowboys yell ‘bang bang’ as they shoot
·Kong-fu master make ‘shhhh, ffff’ to simulate impossibly quick karate chops
·Wizards and other super-natural beings make all sort of sounds (KAMEAMEA!)
By analyzing the audio stream, we can detect sounds coordinated to the gesture and give it some meaning:
·Karate chops and kicks gets powerful on many appropriate sounds: you see some white trail effects and it actually inflict more damage!
·A boxing hit explodes when the user say ‘boom’
·A tennis racket get emphasized once the player shouts on the hit
Instead of ‘collecting’ magic spells and scrolls, the master wizard can show you how to move and what to say in order to invoke magic!
This way you actually learn the magic spells that works in the imaginary virtual world of the game. The learning is practically done by the user’s mind – just like it is really imagined in the fantasy story!
Another example is triggering ‘bullet time’ slow motion scene using sounds.
Imagine the player encounter many enemies coming towards him. He stands in a battle pose and start saying ‘ta-ka-ta-ka-ta-ka’. Then the system continues the ticking sounds. The enemies and world physics are now in slow motion. The player can easily hit all enemies. After a timeout – world time return to normal and all the enemies fall on the floor together!
One of the oldest cross cultural dreams involves flying. Humans
always looked up with envy at birds, as they fly across the sky. We learned to
build machines that help us compensate for our lack of wings or sufficient
strength, but boxing ourselves also means we lost the craved free flight experience.
And when we try to get it back, it is extremely dangerous – physics punish us
with the great gravitational pull of earth. Goodbye Newton – I am switching to my
virtual world!
Lilienthal’s “Fliegeberg” (1894)
Gliding
Para-Gliding
Let’s first discuss unpowered flight first.
A natural full body gliding control can be inspired from free-fall
skydive sport
Gliding forward by holding the hands backward
Putting the hands closer to the body reduces the lift and
increase speed of fall (This can be related to the angle of the hands)
Moving the hands down creates lift. In our simplified model –
we can ignore the up motions (Unlike birds – it would be OK if don’t really
have to fold our ‘wings’ in the process)
This lift gets stronger with the down motion
Once airborne – the lift gets x3 stronger (So the best way
to liftoff, is to first jump together with a strong flapping motion)
The same mechanics can also support special super-jumps: if
the user simply jumps and uses his hands too – he will reach higher altitudes!
A full physical model is of course an overkill – but a carefully
thought simplified model can encapsulate the diversity of behaviors we require.
Going back to high-school physics books to refresh our knowledge of moments, torque,
and trigonometry can get us to a sufficient point (And to think you thought it
will never be useful…).
We assume two rectangular ‘wings’, without any airfoil
Lift forces are generated by the air-drag below the wings,
and in the upwards normal direction. The force magnitude is a factor of: the
combined virtual speed and the local hand motion, as well as the angle between
the wing and the air flow vector
Moving the hand s up changes the wing angle accordingly
Moving the hands forward/backward may also change the wing
rotation
You can freely add constants such as wing surface, drag
factor, and the universal gravity constant (g) – all those should be tuned until
you reach a fun experience, that matches the dynamics of the game.
And ppppplease: you don’t need HW accelerated Physics
engine to simulate a couple of trigonometric function per frame…
Rocketman!
The Flying Moonman / ahillel
Why bother flying with wings when we can have jets?
Control model can simply use the hands as two elements that
can add drag
This will translate to the following gestures:
·Slow down when spread
around
·Control pitch when moving hand
together forward and backwards
·Roll with one hand forward
and another backwards
·Change yaw/bearing by
spreading only one hand
"jetman" Yves Rossy
Chicken and egg problem?
OK – so we can fly. It still does not mean it’s going to
be fun. If you play with any of the available bird/dragon controlling games,
you will discover most don’t really give you the satisfaction of flying. If it’s
too easy to fly – it just feels like another flight simulator. If it’s too hard to fly – we are back at fitness
vs. fun equation of the previous post. We need to find some special game
dynamics that will actually make it fun and challenging.
Daedalus and Icarus
You can upgrade to allow flying by collecting/applying
limited magic. Imagine a game where you eat a special potion that turns your
hands into wings à
so your flapping lift gets amplified…
For the Darwinists, you can evolve andgradually
increase your wing surface (So you begin as a chicken and end as easily gliding
eagle – of course this has nothing to do with natural selection)
Alternately, you can create a game logic where everything is
possible without flying – by walking. You break the fitness/fun equation by only
allowing the user to fly a bit in order to jump higher / faster. The level
design should not encourage your players to overuse it.
As I wrote in the 2nd post:,
navigation in a 3D virtual universe is challenging. It’s not that
hard to implement a navigation scheme, but it is hard to design an enjoyable 3D level, that will not cause your players depth perception to be constantly challenged. Are we really living in a 3D universe?
We spend most of our life on our feet. We don’t experience
the 3rd dimension in the same level birds or flying insect do. On
top of that – we are usually intimidated by full freedom of movement. Imagine
walking in a jungle where you can move anywhere freely. Can you feel secure
while constantly spending mental effort on choosing your path? Now imagine
there is a road you can follow. Maybe even a city with roads and sidewalks? Now
let’s think again on the ‘on-the-rails’ types of experiences discussed at the 2nd
post – and compare it to your last trip to IKEA. Yes – we sure all like rail
shopping J
But I already had a post with those excuses before – and the whole narrative
of my posts is definitely not claiming we all should remain in static camera /
rails games. Moving around in 3D virtual is challenging indeed – but it’s also
a very powerful experience. For gesture games – we will need to invent creative new
ways to make us feel on the move while remaining in front of the TV set.
Fun vs. fitness
Walking and running by jumping in place with alternate legs
is something you can experience in Kinect Sports, as a well as several other Kincet
games. It is quite effective in the experience sense – you feel as you are walking.
You even get tired. But if you are not into making fitness games, you should
consider need to find some ways to empower the user. In fiction – part of the
hero’s magic is easily achieving things that are considered hard or impossible.
You character imagined incredible fitness should probably allow him to walk
miles without giving it a second thought.
In this post I will discuss several schemes for walking. Not
all those schemes are fully tested – but their certainly worth the discussion
Torso tilt - walking and strafing
When the user’s torso is tilted above a certain angle –
motion in the appropriate direction begins. This is quite simple to implement
reliably, and you can also apply it to sideways strafes.
The most problematic aspect of going with this scheme all
the way is that tedious and uncomfortable. The user will need to move his legs
in order maintain his balance, and when he will get lazy, he will begin to use
his back muscles and put high pressure on his spine.
Down to level design – it is not recommended for cases where
the user is expected to walk a lot.
If your game is sci-fi themed, you can use the jet-pack as a
nice metaphor that explains this scheme naturally: once the user decides to
power on the jets, his body tilt controls the motion.
Rotation
While literal movement in place is limited to the living-room
space; rotation in place is not. But apart from my 2nd grade
teacher, you will not enjoy the graphics if you reach a point where your back
is to the screen. We need to challenge with two problems: be able to continue facing
the screen after rotation and the camera angle changes.
Some simple solutions
Use the torso tilt for rotation purposes instead of strafing
Begin automatic camera rotation to when the user’s torso bearing
passes a certain threshold.
Another possibility is ‘asymmetric rotation mapping’.
For the reference, let’s define the case of user facing the
TV as 0 degrees. Now the user can turn to right or left. Right turn is the
positive degrees change and left turn is negative.
Let’s imagine the camera rotation happens only when the user’s
bearing changes away from 0 and ceases when it moves towards 0. The user
can rotation to any direction and maintain his virtual bearing when he
returns to face the screen.
You can also multiply the user rotation by some factor when
moving ‘away’ from center bearing and creating a smaller scale on the return path
– it will all do the trick as long as it is asymmetric.
In place walking
Walking in place can be used to walk forward. If that is too
tedious (Fun or fitness?) you can reserve the actual walking in place to signal
on character running (So you don’t expect the user to walk all the time – but when
he does he get a gratifying empowerment by seeing his avatar running)
Running is achieved by actually moving the legs up and down
– in a walk in place fashion
To run forward, user need to raise legs alternately (left –
right – left – etc.)
Optionally:
To retreat quickly, the user can step in same leg (left –
left – left – etc.)
Walking in place with one leg forward and another backward
rotates the avatar
Hand walking
When we walk naturally we also move our hands from side to
side to maintain balance
This can be used to simulate walking instead of detecting
the legs motion
As mentioned in the opening to this post – navigation is a
big challenge. Some user might prefer one scheme over others. In example, the
user might feel natural and intuitive in the torso tilt scheme, but will not
enjoy using it for long periods.
Sometimes – you can consider cascading several techniques,
and allowing more than one way to work:
Tilt to walk, walk in place – to run
Both sideways tilt and asymmetric torso rotation
Walking and reaching the edges of the play area to begin
automatic walking.
Requiring a tilt or in place motion to initiate/cease automatic
walking, instead of asking the user to
maintain it