Hands, Gestures and Popping back to ‘Prague’

Just a short post to follow up on this previous post;

Hands, Gestures and a Quick Trip to ‘Prague’

I said that if I ‘found time’ then I’d revisit that post and that code and see if I could make it work with the Kinect for Windows V2 sensor rather than with the Intel RealSense SR300 which I used in that post.

In all honesty, I haven’t ‘found time’ but I’m revisiting it anyway Smile

I dug my Kinect for Windows V2 and all of its lengthy cabling out of the drawer, plugged it into my Surface Book and … it didn’t work. Instead, I got the flashing white light which usually indicates that things aren’t going so well.

Not to be deterred, I did some deep, internal Microsoft research (ok, I searched the web Smile) and came up with this;

Kinect Sensor is not recognized on a Surface Book

and getting rid of the text value within that registry key sorted out that problem and let me test that my Kinect for Windows V2 was working in the sense that the configuration verifier says;

image

which after many years of experience I have learned to interpret as “Give it a try!” Winking smile and so I tried out a couple of the SDK samples and they worked fine for me and so I reckoned I was in a good place to get started.

However, the Project Prague bits were not so happy and I found they were logging a bunch of errors in the ‘Geek View’ about not being able to connect/initialise to either the SR300 or the Kinect camera.

This seemed to get resolved by me updating my Kinect drivers – I did an automatic update and Windows found new drivers online which took me to this version;

image

which I was surprised that I didn’t have already as it’s quite old but that seemed to make the Project Prague pieces happy and the Geek View is back in business showing output from Kinect;

image

and from the little display window on the left there it felt like this operated at a range of approx 0.5m to 1.0m. I wondered whether I could move further away but that didn’t seem to be the case in the quick experiment that I tried.

The big question for me then was whether the code that I’d previously written and run against the SR300 would “just work” on the Kinect for Windows V2 and, of course, it does Smile Revisit the previous post for the source code if you’re interested but I found my “counting on four fingers” gesture was recognised quickly and reliably here;

image

This is very cool – it’d be interesting to know exactly what ‘Prague’ relies on from the perspective of the camera and also from the POV of system requirements (CPU, RAM, GPU, etc) in order to make this work but it looks like they’ve got a very decent system going for recognising hand gestures across different cameras.

Context

We all know that the way in which we interact with computing devices has changed and continues to change.

If we go back maybe 10-15 years then we interacted with devices in the way in which they could best accommodate us.

We had graduated from punch-cards on to keyboards and screens and mice and those mechanisms stuck because they were highly productive for a lot of the computing tasks we undertook but they are mostly about the human bending from their natural modes of interaction to use ones that a machine could hope to process.

Those modes of interaction work well to the present day and I’m using them to type this blog post. They haven’t gone away and I doubt they will in the foreseeable future.

But in the last decade or so, the devices that we’re using are taking different forms. Forms that don’t always sit well with the traditional mechanisms for input and output. Forms that fit into your pocket or on your wrist or sit on a wall in a shared space.

These form factors often need different types of interaction and we’ve seen the rise of touch and speech and especially in the realm of the smartphone and the tablet.

The fact that those types of interaction mechanisms even exist though is testament to at least a couple of things;

    • the ever increasing ‘power’ of our computing devices where ‘power’ definitely includes the raw processing power of the CPUs and GPUs but also reflects the accessibility in terms of dimensions like price, size, production quantities and so on.
    • the ever growing ‘power’ of cloud services and, again, that notion of ‘power’ is more than just the number or specification of the servers available to you as a developer. It’s some combination of price, availability, connectivity, CPUs, GPUs, memory and so on.

For quite a while on this blog, I’ve been experimenting with various technologies that fit into this space mostly with a Microsoft slant – technologies like;

and a few more and it feels like this is an area where the technology becomes more capable and available on an almost daily basis.

Recently, I wanted to bring some of this together in a form that’s more structured than some of the posts on this blog and so, with my colleague Andrew, we made a new show for developers on Channel9 that we called ‘Context’.

I say ‘for developers’ because a big portion (probably about 80-90%) is code.

Right now, we have 2 episodes published.

Episode 1 – Speech

image

Episode 2 – Faces

image

Why’s the show called ‘Context’?

It’s because we believe that ‘context is king’. I’ve written about this elsewhere mostly as a way of trying to get my own thoughts in order on the topic Smile 

but the idea is that mechanisms like speech, touch, pen, ink, gaze, gesture, body tracking, mice, keyboard, screens, holograms, etc. only bring benefit when considered in the light of the user’s context which includes at least;

    • ‘Where are they?’ in a broad sense like location (GPS), location (work/home/football/etc), activity (driving/walking/cycling/etc), time of day, etc.
    • What are they trying to do?
    • What device are they trying to do it on?

and perhaps a few more dimensions like network connection, time of day, etc.

We’ve got more of these shows coming over the next few weeks and months, on a 3-4 week release cycle and we’re very keen to take feedback so if you watch any of them please do leave comments on the Channel 9 page for us to pick up.