Baby Steps with Spatial Mapping in 2D and 3D Using XAML and SharpDX

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

I’ve been living in fear and hiding from a particular set of APIs Winking smile

Ever since HoloLens and Windows Mixed Reality first came along, I’ve been curious about the realities of the spatial mapping APIs and yet I’ve largely just treated them as a “black box”.

Naturally, that’s not to say that I haven’t benefitted from those APIs because I’ve been using them for many months in Unity via the Mixed Reality Toolkit and its support for spatial mapping and the prefab that’s fairly easy to drop into a Unity project as I first explored in this post last year;

Hitchhiking the HoloToolkit-Unity, Leg 3–Spatial Understanding (& Mapping)

That said, I’ve still had it on my “to do list” for a long time to visit these APIs a little more directly and that’s what this blog post is about.

It’s important to say that the post is mostly meant to be just “for fun” to give me a place to write down some explorations – I’m not planning to do an exhaustive write up of the APIs and what I’ll end up with by the end of this post is going to be pretty “rough”.

It’s also important to say that there are official documentation pages which detail a lot more than I’m about to write up in this post.

Spatial mapping

Spatial mapping in DirectX

but (as usual) I hadn’t really read those documents in nearly enough detail until I started to explore on my own for this post – it’s the exploration that drives the learning.

Additionally, there’s a great official sample that goes along with those documents;

Holographic Spatial Mapping Sample

but, again, I hadn’t actually seen this sample until I got well into writing this post and was trying to figure things out and I realised that I was largely trying to produce a much simpler, less functional piece of code which targeted a different type of application than the one in the sample but there are many similarities between where I ended up and that sample.

So, if you want the definitive views on these topics there are lots of links to visit.

In the meantime, I’m going to write up my own experiments here.

Choosing a Sandbox to Play In

Generally speaking, if I’m wanting to experiment with some .NET APIs then I write a console application. It seems the quickest, easiest thing to spin up.

In a Mixed Reality world, the equivalent seems to be a 2D XAML application. I find it is much quicker to Code->Deploy->Test->Debug when working on a 2D XAML application than when working on (e.g.) a 3D Unity application.

Of course, the output is then a 2D app rather than an immersive app but if you just want to test out some UWP APIs (which the spatial mapping APIs are) then that’s ok.

Specifically, in this case, I found that trying to make use of these APIs in a 2D environment seemed to actually be helpful to gaining some understanding of them as it stopped me from just looking for a quick Unity solution to various challenges and I definitely felt that I wasn’t losing anything by at least starting my journey inside of a 2D XAML application where I could quickly iterate.

Getting Going – Asking for Spatial Mapping API Access

I made a quick, blank 2D XAML UWP application in Visual Studio and made sure that its application manifest gave me the capability to use Spatial Mapping.

When I look in Visual Studio today, I don’t see this listed as an option in the UI and so I hacked the manifest file in the XML editor;

image

where uap2 translates as a namespace to;

xmlns:uap2=http://schemas.microsoft.com/appx/manifest/uap/windows10/2

in case you ever got stuck on that one. From there, I had a blank app where I could write some code to run on the Loaded event of my main XAML page.

Figuring out the SurfaceSpatialObserver

At this point, I had an idea of what I wanted to do and I was fairly sure that I needed to spin up a SpatialSurfaceObserver which does a lot of the work of trying to watch surfaces as they are discovered and refined by HoloLens.

The essence of the class would seem to be to check whether spatial mapping is supported and available via the IsSupported and RequestAccessAsync() methods.

Once support is ascertained, you define some “volumes” for the observer to observe for spatial mapping data via the SetBoundingVolume/s method and then you can interrogate that data via the GetObservedSurfaces method.

Additionally, there’s an event ObservedSurfacesChanged to tell you when the data relating to surfaces has changed because the device has added/removed or updated data.

This didn’t seem too bad and so my code for checking for support ended up looking as below;

  async void OnLoaded(object sender, RoutedEventArgs e)
    {
      bool tryInitialisation = true;

      if (Windows.Foundation.Metadata.ApiInformation.IsApiContractPresent(
          "Windows.Foundation.UniversalApiContract", 4, 0))
      {
        tryInitialisation = SpatialSurfaceObserver.IsSupported();
      }

      if (tryInitialisation)
      {
        var access = await SpatialSurfaceObserver.RequestAccessAsync();

        if (access == SpatialPerceptionAccessStatus.Allowed)
        {
          this.InitialiseSurfaceObservation();
        }
        else
        {
          tryInitialisation = false;
        }
      }
      if (!tryInitialisation)
      {
        var dialog = new MessageDialog(
          "Spatial observation is either not supported or not allowed", "Not Available");

        await dialog.ShowAsync();
      }
    }

Now, as far as I could tell the SpatialSurfaceObserver.IsSupported() method only became available in V4 of the UniversalApiContract so I’m trying to figure out whether it’s safe to call that API or not as you can see above before using it.

The next step would be perhaps to try and define volumes and so I ploughed ahead there…

Volumes, Coordinate Systems, Reference Frames, Locators – Oh My Winking smile

I wanted to keep things as simple as possible and so I chose to look at the SetBoundingVolume method which takes a single SpatialBoundingVolume and there are a number of ways of creating these based on Boxes, Frustrums and Spheres.

I figured that a sphere was a fairly understandable thing and so I went with a sphere and decided I’d use a 5m radius on my sphere hoping to determine all surface information within that radius.

However, to create a volume you first need a SpatialCoordinateSystem and the easiest way I found of getting hold of one of those was to get hold of a frame of reference.

Frames of reference can either be “attached” in the sense of being head-locked and following the device or they can be “stationary” where they don’t follow the device.

A stationary frame of reference seemed easier to think about and so I went that way but to get hold of a frame of reference at all I seemed to need to use a SpatialLocator which has a handy GetDefault() method on it and then I can use the CreateStationaryFrameOfReferenceAtCurrentLocation() method to create my frame.

So…my reasoning here is that I’m creating a frame of reference at the place where the app starts up and that it will never move during the app’s lifetime. Not perhaps the most “flexible” thing in the world, but it seemed simpler than any other options so I went with it.

With that in place, my “start-up” code looks as below;

  void InitialiseSurfaceObservation()
    {
      // We want the default locator.
      this.locator = SpatialLocator.GetDefault();

      // We try to make a frame of reference that is fixed at the current position (i.e. not
      // moving with the user).
      var frameOfReference = this.locator.CreateStationaryFrameOfReferenceAtCurrentLocation();

      this.baseCoordinateSystem = frameOfReference.CoordinateSystem;

      // Make a box which is centred at the origin (the user's startup location)
      // and is hopefully oriented to the Z axis and a certain width/height.
      var boundingVolume = SpatialBoundingVolume.FromSphere(
        this.baseCoordinateSystem,
        new SpatialBoundingSphere()
        {
          Center = new Vector3(0, 0, 0),
          Radius = SPHERE_RADIUS
        }
      );
      this.surfaceObserver = new SpatialSurfaceObserver();
      this.surfaceObserver.SetBoundingVolume(boundingVolume);
    }

Ok…I have got hold of a SpatialSurfaceObserver that’s observing one volume for me defined by a sphere. What next?

Gathering and Monitoring Surfaces Over Time

Having now got my SpatialSurfaceObserver with a defined volume, I wanted some class that took on the responsibility of grabbing any surfaces from it, putting them on a list and then managing that list as the observer fired events to flag that surfaces had been added/removed/updated.

In a real application, it’s likely that you’d need to do this in a highly performant way but I’m more interested in experimentation here than performance and so I wrote a small SurfaceChangeWatcher class which I can pass the SpatialSurfaceObserver to.

Surfaces are identified by GUID and so this watcher class maintains a simple Dictionary<Guid,SpatialSurfaceInfo>. On startup, it calls the GetObservedSurfaces method to initially populate its dictionary and then it handles the ObservedSurfacesChanged event to update its dictionary as data changes over time.

It aggregates up the changes that it sees and fires its own event to tell any interested parties about the changes.

I won’t post the whole source code for the class here but will just link to it instead. It’s not too long and it’s not too complicated.

Source SurfaceChangeWatcher.cs.

Checking for Surface Data

At this point, I’ve enough code to fire up the debugger and debug my 2D app on a HoloLens or an emulator and see if I can get some spatial mapping data into my code.

It’s worth remembering that the HoloLens emulator is good for debugging spatial mapping as by default the emulator places itself into a “default room” and it can switch to a number of other rooms provided with the SDK and also to custom rooms that have been recorded from a HoloLens.

So, debugging on the emulator I can see that in the first instances here I see 22 loaded surfaces coming back from the SpatialSurfaceObserver;

image

and you can see the ID for my first surface and the UpdateTime that it’s associated with.

I also notice that very early on in the application I see the ObservedSurfacesChanged event fire and my code in SurfaceChangeWatcher simply calls back into the LoadSurfaces method shown in the screenshot above which then attempts to figure out which surfaces have been added/removed or updated since they were last queried.

So, getting hold of the surfaces within a volume and responding to their changes as they evolve doesn’t seem too onerous.

But, how to get the actual polygonal mesh data itself?

Getting Mesh Data

Once you have hold of a SpatialSurfaceInfo, you can attempt to get hold of the SpatialSurfaceMesh which it represents via the TryComputeLatestMeshAsync method.

This method wants a “triangle density” in terms of how many triangles it should attempt to bring back per cubic metre. If you’ve used the Unity prefab then you’ll have seen this parameter before and in my code here I chose a value of 100 and stuck with it.

The method is also asynchronous and so you can’t just demand the mesh in realtime but it’s a fairly simple call and here’s a screenshot of me back in the debugger having made that call to get some data;

image

That screenshot shows that I’ve got a SpatialSurfaceMesh and it contains 205 vertices in the R16G16B16A16IntNormalized format and that there are 831 triangle vertices in an R16Uint format and it also gives me the Id and the UpdateTime of the SpatialSurfaceInfo.

It’s also worth noting the VertexPositionScale which needs to be applied to the vertices to reconstruct them.

Rendering Mesh Data

Now, at this point I felt that I had learned a few things about how to get hold of spatial mapping meshes but I thought that it wasn’t really “enough” if I didn’t make at least some attempt to render the meshes produced.

I thought about a few options around how I might do that given that I’m running code inside of a 2D XAML application.

I wondered whether I might somehow flatten the mesh and draw it with a XAML Canvas but that seemed unlikely and I suspected that the best road to go down would be to keep the data in the format that it was already being provided in and try and hand it over to DirectX for rendering.

That led me to wonder whether something from Win2D might be able to draw it for me but Win2D stays true to its name and doesn’t (as far as I know) get into the business of wrapping up Direct3D APIs.

So…I figured that I’d need to bite the bullet and see if I could bring this into my 2D app via the XAML SwapChainPanel integration element with some rendering provided by SharpDX.

It’s worth saying that I’ve hardly ever used SwapChainPanel and I’ve never used SharpDX before so I figured that putting them together with this mesh data might be “fun” Winking smile

A UWP SharpDX SwapChainPanel Sample

In order to try and achieve that, I went on a bit of a search to try and see if I could find a basic sample which illustrated how to integrate SharpDX code inside of a XAML application rendering to a SwapChainPanel.

It took me a little while to find that sample as quite a few of the SharpDX samples seem to be out of date these days and I asked around on Twitter before finding this great sample which uses SharpDX and SwapChainPanel to render a triangle inside of a UWP XAML app;

https://github.com/minhcly/UWP3DTest

That let me drop a few SharpDX packages into my project;

image

and the sample was really useful in that it enabled me to drop a SwapChainPanel into my XAML UI app and, using code that I lifted and reworked out of the sample, I could get that same triangle to render inside of my 2D XAML application.

That gave me a little hope that I might be able to get the mesh data rendered inside of my application too.

Building a SharpDX Renderer (or trying to!)

I wrote a class SwapChainPanelRenderer (source) which essentially takes the SwapChainPanel and my SurfaceChangeWatcher class and it puts them together in order to retrieve/monitor spatial meshes as they are produced by the SpatialSurfaceObserver.

The essence of that class is that it goes through a few steps;

  1. It Initialises D3D via SharpDX largely following the pattern from the sample I found.
  2. It creates a very simple vertex shader and pixel shader much like the sample does although I ended up tweaking them a little.
  3. Whenever a new SpatialSurfaceInfo is provided by the SurfaceChangeWatcher the renderer attempts asks the system to compute the mesh for it and creates a number of data structures from that mesh;
    1. A vertex buffer to match the format provided by the mesh
    2. An index buffer to match the format provided by the mesh
    3. A constant buffer with details of how to transform the vertices provided by the mesh
  4. Whenever the renderer is asked to render, it loads up the right vertex/index/constant buffers for each of the meshes that it knows about and asks the system to render them passing through a few transformation pieces to the vertex shader.

It’s perhaps worth noting a couple of things around how that code works – the first would be;

  • In order to get hold of the actual vertex data, this code relies on using unsafe C# code and IBufferByteAccess in order to be able to grab the real data buffers rather than copying it.

The second point that might be worth mentioning is that I spent quite a bit of time trying to see if I could get the mesh rendering right.

  • I’m not 100% there at the time of writing but what I have managed to get working has been done by consulting back with the official C++ sample which has a more complex pipeline but I specifically consulted it around how to make use of the SpatialSurfaceMesh.VertexPositionScale property and I tried to make my code line up with the sample code around that in as much as possible.

I must admit that I spent a bit of time staring at my code and trying to compare to the sample code as a way of trying to figure out if I could improve the way mine was seeming to render and I think I can easily spend more time on it to make it work better.

The last point I’d make is that there’s nothing in the code at the time of writing which attempts to align the HoloLens position, orientation and view with what’s being shown inside of the 2D app. What that means is;

  • The 2D app starts at a position (0,0.5,0) so half a metre above where the HoloLens is in the world.
  • The 2D app doesn’t know the orientation of the user so could be pointing in the wrong direction with respect to the mesh.

This can make the app a little “disorientating” unless you are familiar with what it’s doing Smile

Trying it Out

At the time of writing, I’ve mostly been trying this code out on the emulator but I have also experimented with it on HoloLens.

Here’s a screenshot of the official sample 3D app with its fancier shader running on the emulator where I’m using the “living room” room;

image

and here’s my 2D XAML app running in a window but, hopefully, rendering a similar thing albeit in wireframe;

image

and, seemingly, there’s something of a mirroring going on in there as well which I still need to dig into!

Wrapping Up & The Source

As I said at the start of the post, this one was very much just “for fun” but I thought I’d write it down so that I can remember it and maybe some pieces of it might be useful to someone else in the future.

If you want the source, it’s all over here on github so feel free to take it, play with it and feel very free to improve it Smile.

Hands, Gestures and Popping back to ‘Prague’

Just a short post to follow up on this previous post;

Hands, Gestures and a Quick Trip to ‘Prague’

I said that if I ‘found time’ then I’d revisit that post and that code and see if I could make it work with the Kinect for Windows V2 sensor rather than with the Intel RealSense SR300 which I used in that post.

In all honesty, I haven’t ‘found time’ but I’m revisiting it anyway Smile

I dug my Kinect for Windows V2 and all of its lengthy cabling out of the drawer, plugged it into my Surface Book and … it didn’t work. Instead, I got the flashing white light which usually indicates that things aren’t going so well.

Not to be deterred, I did some deep, internal Microsoft research (ok, I searched the web Smile) and came up with this;

Kinect Sensor is not recognized on a Surface Book

and getting rid of the text value within that registry key sorted out that problem and let me test that my Kinect for Windows V2 was working in the sense that the configuration verifier says;

image

which after many years of experience I have learned to interpret as “Give it a try!” Winking smile and so I tried out a couple of the SDK samples and they worked fine for me and so I reckoned I was in a good place to get started.

However, the Project Prague bits were not so happy and I found they were logging a bunch of errors in the ‘Geek View’ about not being able to connect/initialise to either the SR300 or the Kinect camera.

This seemed to get resolved by me updating my Kinect drivers – I did an automatic update and Windows found new drivers online which took me to this version;

image

which I was surprised that I didn’t have already as it’s quite old but that seemed to make the Project Prague pieces happy and the Geek View is back in business showing output from Kinect;

image

and from the little display window on the left there it felt like this operated at a range of approx 0.5m to 1.0m. I wondered whether I could move further away but that didn’t seem to be the case in the quick experiment that I tried.

The big question for me then was whether the code that I’d previously written and run against the SR300 would “just work” on the Kinect for Windows V2 and, of course, it does Smile Revisit the previous post for the source code if you’re interested but I found my “counting on four fingers” gesture was recognised quickly and reliably here;

image

This is very cool – it’d be interesting to know exactly what ‘Prague’ relies on from the perspective of the camera and also from the POV of system requirements (CPU, RAM, GPU, etc) in order to make this work but it looks like they’ve got a very decent system going for recognising hand gestures across different cameras.

Hands, Gestures and a Quick Trip to ‘Prague’

Sorry for the title – I couldn’t resist and, no, I’ve not switched to writing a travel blog just yet although I’ll keep the idea in my back pocket for the time when the current ‘career’ hits the ever-looming buffers Winking smile

But, no, this post is about ‘Project Prague’ and hand gestures and I’ve written quite a bit in the past about natural gesture recognition with technologies like the Kinect for Windows V2 and with the RealSense F200 and SR300 cameras.

Kinect has has great capabilities for colour, depth, infra-red imaging and a smart (i.e. cloud-trained AI) runtime which can bring all those streams together and give you (human) skeletal tracking of 25 joints on 6 bodies at 30 frames per second. It can also do some facial tracking and has an AI based gesture recognition system which can be trained to recognise human-body based gestures like “hands above head” or “golf swing” and so on.

That camera has a range of approx 0.5m to 4.5m and perhaps because of this long range it does not have a great deal of support for hand-based gestures although it can report some hand joints and a few different hand states like open/closed but it doesn’t go much beyond that.

I’ve also written about the RealSense F200 and SR300 cameras although I never had a lot of success with the SR300 and those cameras have a much shorter range (< 1m) than the Kinect for Windows V2 but have/had some different capabilities in that they have surfaced functionality like;

  • Detailed facial detection providing feature positions etc and facial recognition.
  • Emotion detection providing states like ‘happy’, ‘sad’ etc (although this got removed from the original SDK at a later point)
  • Hand tracking features
    • The SDK has great support for tracking of hands down to the joint level with > 20 joints reported by the SDK
    • The SDK also has support for hand-based gestures such as “V sign”, “full pinch” etc.

With any of these cameras and their SDKs the processing happens locally on the (high bandwidth) data at frame rates of 15/30/60 FPS and so it’s quite different to those scenarios where you might be selectively capturing data and sending it to the cloud for processing as you see with the Cognitive Services but both approaches have their benefits and are open to being used in combination.

In terms of this functionality around hand tracking and gestures, I bundled some of what I knew about this into a video last year and published it to Channel9 although it’s probably quite a bit out of date at this point;

image

but it’s been a topic that interested me for a long time and so when I saw ‘Project Prague’ announced a few weeks ago I was naturally interested.

My first question on ‘Prague’ was whether it would be make use of a local-processing or a cloud-based-processing model and, if the former, whether it would require a depth camera or would be based purely on a web cam.

It turns out that ‘Prague’ is locally processing data and does require either a Kinect for Windows V2 camera or a RealSense SR300 camera with the recommendation on the website being to use the SR300.

I dug my Intel RealSense SR300 out of the drawer where it’s been living for a few months, plugged it in to my Surface Book and set about seeing whether I could get a ‘Prague’ demo up and running on it.

Plugging in the SR300

I hadn’t plugged the SR300 into my Surface Book since I reinstalled Windows and so I wondered how that had progressed since the early days of the camera and since Windows has moved to Creators Update (I’m running 15063.447).

I hadn’t installed the RealSense SDK onto this machine but Windows seemed to recognise the device and install it regardless although I did find that the initial install left some “warning triangles” in device manager that had to be resolved by a manual “Scan for hardware changes” from the Device Manager menu but then things seemed to sort themselves out and Device Manager showed;

image

which the modern devices app shows as;

image

and that seemed reasonable and I didn’t have to visit the troubleshooting page but I wasn’t surprised to see that it existed based on my previous experience with the SR300 but, instead, I went off to download ‘Project Prague’.

Installing ‘Prague’

Nothing much to report here – there’s an MSI that you download and run;

image

and “It Just Worked” so nothing to say about that.

Once installation was figured, as per the docs, the “Microsoft Gestures Service” app ran up and I tried to do as the documentation advised and make sure that the app was recognising my hand – it didn’t seem to be working as below;

image

but then I tried with my right hand and things seemed to be working better;

image

This is actually the window view (called the ‘Geek View’!) of a system tray application (the “gestures service”) which doesn’t seem to be a true service in the NT sense but instead seems to be a regular app configured to run at startup on the system;

image

so, much like the Kinect Runtime it seems that this is the code which sits and watches frames from the camera and then applications become “clients” of this service and the “DiscoveryClient” which is also highlighted in the screenshot as being configured to run at startup is one such demo app which picks up gestures from the service and (according to the docs) routes the gestures through to the shell.

Here’s the system tray application;

image

and if I perform the “bloom” gesture (familiar from Windows Mixed Reality) then the system tray app pops up;

image

and tells me that there are other gestures already active to open the start menu and toggle the volume. The gestures animate on mouse over to show how to execute them and I had no problem with using the gesture to toggle the volume on my machine but I did struggle a little with the gesture to open the start menu.

The ‘timeline’ view in the ‘Geek View’ here is interesting because it shows gestures being detected or not in real time and you can perhaps see on the timeline below how I’m struggling to execute the ‘Shell_Start’ gesture and it’s getting recognised as a ‘Discovery_Tray’ gesture. In that screenshot the white blob indicates a “pose” whereas the green blobs represent completed “gestures”.

image

There’s also a ‘settings’ section here which shows me;

image

and then on the GestPacks section;

image

suggests that the service has integration for various apps. At the time of writing, the “get more online” option didn’t seem to link to anything that I could spot but I noticed by running PowerPoint that the app is monitoring which app is in the foreground and is switching its gestures list to relate to that contextual app.

So, when running PowerPoint, the gesture service shows;

image

and those gestures worked very well for me in PowerPoint – it was easy to start a slideshow and then advance the slides by just tapping through in the air with my finger. These details can also be seen in the settings app;

image

which suggests that these gestures are contextual within the app – for example the “Rotate Right 90” option doesn’t show up until I select an object in PowerPoint;

image

and I can see this dynamically changing in the ‘Geek View’ – here’s the view when no object is selected;

image

and I can see that there are perhaps 3 gestures registered whereas if I select an object in PowerPoint then I see;

image

and those gestures worked pretty well for me Smile 

Other Demo Applications

I experimented with the ‘Camera Viewer’ app which works really well. Once again, from the ‘Geek View’ I can see that this app has registered some gestures and you can perhaps see below that I am trying out the ‘peace’ gesture and the geek view is showing that this is registered, that it has completed and the app is displaying some nice doves to show it’s seen the gesture;

image

One other interesting aspect of this app is that it displays a ‘Connecting to Gesture Service’ message as you bring it back into focus suggesting that there’s some sort of ‘connection’ to the gestures service that comes/goes over time.

These gestures worked really well for me and by this point I was wondering how these gestures apps were plugging into the architecture here, how they were implemented and so I wanted to see if I could write some code. I did notice that the GestPacks seem to live in a folder under the ‘Prague’ installation;

image

and a quick look at one of the DLLs (e.g. PowerPoint) shows that this is .NET code interop’ing into PowerPoint as you’d expect although the naming suggests there’s some ATL based code in the chain here somewhere;

image

Coding ‘Prague’

The API docs link leads over to this web page which points to a Microsoft.Gestures namespace that seems to suggest is part of .NET Core 2.0. That would seem to suggest that (right now) you’re not going to be able to reference this from a Universal Windows App project but you can reference it from a .NET Framework project and so I just referenced it from a command line project targeting .NET Framework 4.6.2.

The assemblies seem to live in the equivalent of;

“C:\Users\mtaulty\AppData\Roaming\Microsoft\Prague\PragueVersions\LatestVersion\SDK”

and I added a reference to 3 of them;

image

It’s also worth noting that there are a number of code samples over in this github repository;

https://github.com/Microsoft/Gestures-Samples

Although, at the time of writing, I haven’t really referred to those too much as I was trying to see what my experience was like in ‘starting from scratch’ and to that end I had a quick look at what seemed to be the main assembly in the object browser;

image

and the structure seemed to suggest that the library is using TCP sockets as an ‘RPC’ mechanism to communicate between an app and the gestures service and a quick look at the gestures service process with Process Explorer did show that it was listening for traffic;

image

So, how to get a connection? It seems fairly easy in that the docs point you to the GesturesEndpointService class and there’s a GesturesEndpointServiceFactory to make those and then IntelliSense popped up as below to reinforce the idea that there is some socket based comms going on here;

image

From there, I wanted to define my own gesture which would allow the user to start with an open spread hand and then tap their thumb onto their four fingers in sequence which seemed to consist of 5 stages and so I read the docs around how gestures, poses and motion work and added some code to my console application to see if I could code up this gesture;

namespace ConsoleApp1
{
  using Microsoft.Gestures;
  using Microsoft.Gestures.Endpoint;
  using System;
  using System.Collections.Generic;
  using System.Threading.Tasks;

  class Program
  {
    static void Main(string[] args)
    {
      ConnectAsync();

      Console.WriteLine("Hit return to exit...");

      Console.ReadLine();

      ServiceEndpoint.Disconnect();
      ServiceEndpoint.Dispose();
    }
    static async Task ConnectAsync()
    {
      Console.WriteLine("Connecting...");

      try
      {
        var connected = await ServiceEndpoint.ConnectAsync();

        if (!connected)
        {
          Console.WriteLine("Failed to connect...");
        }
        else
        {
          await serviceEndpoint.RegisterGesture(CountGesture, true);
        }
      }
      catch
      {
        Console.WriteLine("Exception thrown in starting up...");
      }
    }
    static void OnTriggered(object sender, GestureSegmentTriggeredEventArgs e)
    {
      Console.WriteLine($"Gesture {e.GestureSegment.Name} triggered!");
    }
    static GesturesServiceEndpoint ServiceEndpoint
    {
      get
      {
        if (serviceEndpoint == null)
        {
          serviceEndpoint = GesturesServiceEndpointFactory.Create();
        }
        return (serviceEndpoint);
      }
    }
    static Gesture CountGesture
    {
      get
      {
        if (countGesture == null)
        {
          var poses = new List<HandPose>();

          var allFingersContext = new AllFingersContext();

          // Hand starts upright, forward and with fingers spread...
          var startPose = new HandPose(
            "start",
            new FingerPose(
              allFingersContext, FingerFlexion.Open),
            new FingertipDistanceRelation(
              allFingersContext, RelativeDistance.NotTouching));

          poses.Add(startPose);

          foreach (Finger finger in
            new[] { Finger.Index, Finger.Middle, Finger.Ring, Finger.Pinky })
          {
            poses.Add(
              new HandPose(
              $"pose{finger}",
              new FingertipDistanceRelation(
                Finger.Thumb, RelativeDistance.Touching, finger)));
          }
          countGesture = new Gesture("count", poses.ToArray());
          countGesture.Triggered += OnTriggered;
        }
        return (countGesture);
      }
    }
    static Gesture countGesture;
    static GesturesServiceEndpoint serviceEndpoint;
  }
}

I’m very unsure as to whether my code is specifying my gesture ‘completely’ or ‘accurately’ but what amazed me about this is that I really only took one stab at it and it “worked”.

That is, I can run my app and see my gesture being built up from its 5 constituent poses in the ‘Geek View’ and then my console app has its event triggered and displays the right output;

image

What I’d flag about that code is that it is bad in that it’s using async/await in a console app and so it’s likely that thread pool threads are being used to dispatch all the “completions” which mean that lots of threads are potentially running through this code and interacting with objects which may/not have thread affinity – I’ve not done anything to mitigate that here.

Other than that, I’m impressed – this was a real joy to work with and I guess the only way it could be made easier would be to allow for the visual drawing or perhaps the recording of hand gestures.

The only other thing that I noticed is that my CPU can get a bit active while using these bits and they seem to run at about 800MB of memory but then Project Prague is ‘Experimental’ right now so I’m sure that could change over time.

I’d like to also try this code on a Kinect for Windows V2 – if I do that, I’ll update this post or add another one.