Sorry for the title – I couldn’t resist and, no, I’ve not switched to writing a travel blog just yet although I’ll keep the idea in my back pocket for the time when the current ‘career’ hits the ever-looming buffers 
But, no, this post is about ‘Project Prague’ and hand gestures and I’ve written quite a bit in the past about natural gesture recognition with technologies like the Kinect for Windows V2 and with the RealSense F200 and SR300 cameras.
Kinect has has great capabilities for colour, depth, infra-red imaging and a smart (i.e. cloud-trained AI) runtime which can bring all those streams together and give you (human) skeletal tracking of 25 joints on 6 bodies at 30 frames per second. It can also do some facial tracking and has an AI based gesture recognition system which can be trained to recognise human-body based gestures like “hands above head” or “golf swing” and so on.
That camera has a range of approx 0.5m to 4.5m and perhaps because of this long range it does not have a great deal of support for hand-based gestures although it can report some hand joints and a few different hand states like open/closed but it doesn’t go much beyond that.
I’ve also written about the RealSense F200 and SR300 cameras although I never had a lot of success with the SR300 and those cameras have a much shorter range (< 1m) than the Kinect for Windows V2 but have/had some different capabilities in that they have surfaced functionality like;
- Detailed facial detection providing feature positions etc and facial recognition.
- Emotion detection providing states like ‘happy’, ‘sad’ etc (although this got removed from the original SDK at a later point)
- Hand tracking features
- The SDK has great support for tracking of hands down to the joint level with > 20 joints reported by the SDK
- The SDK also has support for hand-based gestures such as “V sign”, “full pinch” etc.
With any of these cameras and their SDKs the processing happens locally on the (high bandwidth) data at frame rates of 15/30/60 FPS and so it’s quite different to those scenarios where you might be selectively capturing data and sending it to the cloud for processing as you see with the Cognitive Services but both approaches have their benefits and are open to being used in combination.
In terms of this functionality around hand tracking and gestures, I bundled some of what I knew about this into a video last year and published it to Channel9 although it’s probably quite a bit out of date at this point;

but it’s been a topic that interested me for a long time and so when I saw ‘Project Prague’ announced a few weeks ago I was naturally interested.
My first question on ‘Prague’ was whether it would be make use of a local-processing or a cloud-based-processing model and, if the former, whether it would require a depth camera or would be based purely on a web cam.
It turns out that ‘Prague’ is locally processing data and does require either a Kinect for Windows V2 camera or a RealSense SR300 camera with the recommendation on the website being to use the SR300.
I dug my Intel RealSense SR300 out of the drawer where it’s been living for a few months, plugged it in to my Surface Book and set about seeing whether I could get a ‘Prague’ demo up and running on it.
Plugging in the SR300
I hadn’t plugged the SR300 into my Surface Book since I reinstalled Windows and so I wondered how that had progressed since the early days of the camera and since Windows has moved to Creators Update (I’m running 15063.447).
I hadn’t installed the RealSense SDK onto this machine but Windows seemed to recognise the device and install it regardless although I did find that the initial install left some “warning triangles” in device manager that had to be resolved by a manual “Scan for hardware changes” from the Device Manager menu but then things seemed to sort themselves out and Device Manager showed;

which the modern devices app shows as;

and that seemed reasonable and I didn’t have to visit the troubleshooting page but I wasn’t surprised to see that it existed based on my previous experience with the SR300 but, instead, I went off to download ‘Project Prague’.
Installing ‘Prague’
Nothing much to report here – there’s an MSI that you download and run;

and “It Just Worked” so nothing to say about that.
Once installation was figured, as per the docs, the “Microsoft Gestures Service” app ran up and I tried to do as the documentation advised and make sure that the app was recognising my hand – it didn’t seem to be working as below;

but then I tried with my right hand and things seemed to be working better;

This is actually the window view (called the ‘Geek View’!) of a system tray application (the “gestures service”) which doesn’t seem to be a true service in the NT sense but instead seems to be a regular app configured to run at startup on the system;

so, much like the Kinect Runtime it seems that this is the code which sits and watches frames from the camera and then applications become “clients” of this service and the “DiscoveryClient” which is also highlighted in the screenshot as being configured to run at startup is one such demo app which picks up gestures from the service and (according to the docs) routes the gestures through to the shell.
Here’s the system tray application;

and if I perform the “bloom” gesture (familiar from Windows Mixed Reality) then the system tray app pops up;

and tells me that there are other gestures already active to open the start menu and toggle the volume. The gestures animate on mouse over to show how to execute them and I had no problem with using the gesture to toggle the volume on my machine but I did struggle a little with the gesture to open the start menu.
The ‘timeline’ view in the ‘Geek View’ here is interesting because it shows gestures being detected or not in real time and you can perhaps see on the timeline below how I’m struggling to execute the ‘Shell_Start’ gesture and it’s getting recognised as a ‘Discovery_Tray’ gesture. In that screenshot the white blob indicates a “pose” whereas the green blobs represent completed “gestures”.

There’s also a ‘settings’ section here which shows me;

and then on the GestPacks section;

suggests that the service has integration for various apps. At the time of writing, the “get more online” option didn’t seem to link to anything that I could spot but I noticed by running PowerPoint that the app is monitoring which app is in the foreground and is switching its gestures list to relate to that contextual app.
So, when running PowerPoint, the gesture service shows;

and those gestures worked very well for me in PowerPoint – it was easy to start a slideshow and then advance the slides by just tapping through in the air with my finger. These details can also be seen in the settings app;

which suggests that these gestures are contextual within the app – for example the “Rotate Right 90” option doesn’t show up until I select an object in PowerPoint;

and I can see this dynamically changing in the ‘Geek View’ – here’s the view when no object is selected;

and I can see that there are perhaps 3 gestures registered whereas if I select an object in PowerPoint then I see;

and those gestures worked pretty well for me
Other Demo Applications
I experimented with the ‘Camera Viewer’ app which works really well. Once again, from the ‘Geek View’ I can see that this app has registered some gestures and you can perhaps see below that I am trying out the ‘peace’ gesture and the geek view is showing that this is registered, that it has completed and the app is displaying some nice doves to show it’s seen the gesture;

One other interesting aspect of this app is that it displays a ‘Connecting to Gesture Service’ message as you bring it back into focus suggesting that there’s some sort of ‘connection’ to the gestures service that comes/goes over time.
These gestures worked really well for me and by this point I was wondering how these gestures apps were plugging into the architecture here, how they were implemented and so I wanted to see if I could write some code. I did notice that the GestPacks seem to live in a folder under the ‘Prague’ installation;

and a quick look at one of the DLLs (e.g. PowerPoint) shows that this is .NET code interop’ing into PowerPoint as you’d expect although the naming suggests there’s some ATL based code in the chain here somewhere;

Coding ‘Prague’
The API docs link leads over to this web page which points to a Microsoft.Gestures namespace that seems to suggest is part of .NET Core 2.0. That would seem to suggest that (right now) you’re not going to be able to reference this from a Universal Windows App project but you can reference it from a .NET Framework project and so I just referenced it from a command line project targeting .NET Framework 4.6.2.
The assemblies seem to live in the equivalent of;
“C:\Users\mtaulty\AppData\Roaming\Microsoft\Prague\PragueVersions\LatestVersion\SDK”
and I added a reference to 3 of them;

It’s also worth noting that there are a number of code samples over in this github repository;
https://github.com/Microsoft/Gestures-Samples
Although, at the time of writing, I haven’t really referred to those too much as I was trying to see what my experience was like in ‘starting from scratch’ and to that end I had a quick look at what seemed to be the main assembly in the object browser;

and the structure seemed to suggest that the library is using TCP sockets as an ‘RPC’ mechanism to communicate between an app and the gestures service and a quick look at the gestures service process with Process Explorer did show that it was listening for traffic;

So, how to get a connection? It seems fairly easy in that the docs point you to the GesturesEndpointService class and there’s a GesturesEndpointServiceFactory to make those and then IntelliSense popped up as below to reinforce the idea that there is some socket based comms going on here;

From there, I wanted to define my own gesture which would allow the user to start with an open spread hand and then tap their thumb onto their four fingers in sequence which seemed to consist of 5 stages and so I read the docs around how gestures, poses and motion work and added some code to my console application to see if I could code up this gesture;
namespace ConsoleApp1
{
using Microsoft.Gestures;
using Microsoft.Gestures.Endpoint;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
class Program
{
static void Main(string[] args)
{
ConnectAsync();
Console.WriteLine("Hit return to exit...");
Console.ReadLine();
ServiceEndpoint.Disconnect();
ServiceEndpoint.Dispose();
}
static async Task ConnectAsync()
{
Console.WriteLine("Connecting...");
try
{
var connected = await ServiceEndpoint.ConnectAsync();
if (!connected)
{
Console.WriteLine("Failed to connect...");
}
else
{
await serviceEndpoint.RegisterGesture(CountGesture, true);
}
}
catch
{
Console.WriteLine("Exception thrown in starting up...");
}
}
static void OnTriggered(object sender, GestureSegmentTriggeredEventArgs e)
{
Console.WriteLine($"Gesture {e.GestureSegment.Name} triggered!");
}
static GesturesServiceEndpoint ServiceEndpoint
{
get
{
if (serviceEndpoint == null)
{
serviceEndpoint = GesturesServiceEndpointFactory.Create();
}
return (serviceEndpoint);
}
}
static Gesture CountGesture
{
get
{
if (countGesture == null)
{
var poses = new List<HandPose>();
var allFingersContext = new AllFingersContext();
// Hand starts upright, forward and with fingers spread...
var startPose = new HandPose(
"start",
new FingerPose(
allFingersContext, FingerFlexion.Open),
new FingertipDistanceRelation(
allFingersContext, RelativeDistance.NotTouching));
poses.Add(startPose);
foreach (Finger finger in
new[] { Finger.Index, Finger.Middle, Finger.Ring, Finger.Pinky })
{
poses.Add(
new HandPose(
$"pose{finger}",
new FingertipDistanceRelation(
Finger.Thumb, RelativeDistance.Touching, finger)));
}
countGesture = new Gesture("count", poses.ToArray());
countGesture.Triggered += OnTriggered;
}
return (countGesture);
}
}
static Gesture countGesture;
static GesturesServiceEndpoint serviceEndpoint;
}
}
I’m very unsure as to whether my code is specifying my gesture ‘completely’ or ‘accurately’ but what amazed me about this is that I really only took one stab at it and it “worked”.
That is, I can run my app and see my gesture being built up from its 5 constituent poses in the ‘Geek View’ and then my console app has its event triggered and displays the right output;

What I’d flag about that code is that it is bad in that it’s using async/await in a console app and so it’s likely that thread pool threads are being used to dispatch all the “completions” which mean that lots of threads are potentially running through this code and interacting with objects which may/not have thread affinity – I’ve not done anything to mitigate that here.
Other than that, I’m impressed – this was a real joy to work with and I guess the only way it could be made easier would be to allow for the visual drawing or perhaps the recording of hand gestures.
The only other thing that I noticed is that my CPU can get a bit active while using these bits and they seem to run at about 800MB of memory but then Project Prague is ‘Experimental’ right now so I’m sure that could change over time.
I’d like to also try this code on a Kinect for Windows V2 – if I do that, I’ll update this post or add another one.