Hands, Gestures and Popping back to ‘Prague’

Just a short post to follow up on this previous post;

Hands, Gestures and a Quick Trip to ‘Prague’

I said that if I ‘found time’ then I’d revisit that post and that code and see if I could make it work with the Kinect for Windows V2 sensor rather than with the Intel RealSense SR300 which I used in that post.

In all honesty, I haven’t ‘found time’ but I’m revisiting it anyway Smile

I dug my Kinect for Windows V2 and all of its lengthy cabling out of the drawer, plugged it into my Surface Book and … it didn’t work. Instead, I got the flashing white light which usually indicates that things aren’t going so well.

Not to be deterred, I did some deep, internal Microsoft research (ok, I searched the web Smile) and came up with this;

Kinect Sensor is not recognized on a Surface Book

and getting rid of the text value within that registry key sorted out that problem and let me test that my Kinect for Windows V2 was working in the sense that the configuration verifier says;

image

which after many years of experience I have learned to interpret as “Give it a try!” Winking smile and so I tried out a couple of the SDK samples and they worked fine for me and so I reckoned I was in a good place to get started.

However, the Project Prague bits were not so happy and I found they were logging a bunch of errors in the ‘Geek View’ about not being able to connect/initialise to either the SR300 or the Kinect camera.

This seemed to get resolved by me updating my Kinect drivers – I did an automatic update and Windows found new drivers online which took me to this version;

image

which I was surprised that I didn’t have already as it’s quite old but that seemed to make the Project Prague pieces happy and the Geek View is back in business showing output from Kinect;

image

and from the little display window on the left there it felt like this operated at a range of approx 0.5m to 1.0m. I wondered whether I could move further away but that didn’t seem to be the case in the quick experiment that I tried.

The big question for me then was whether the code that I’d previously written and run against the SR300 would “just work” on the Kinect for Windows V2 and, of course, it does Smile Revisit the previous post for the source code if you’re interested but I found my “counting on four fingers” gesture was recognised quickly and reliably here;

image

This is very cool – it’d be interesting to know exactly what ‘Prague’ relies on from the perspective of the camera and also from the POV of system requirements (CPU, RAM, GPU, etc) in order to make this work but it looks like they’ve got a very decent system going for recognising hand gestures across different cameras.

Hands, Gestures and a Quick Trip to ‘Prague’

Sorry for the title – I couldn’t resist and, no, I’ve not switched to writing a travel blog just yet although I’ll keep the idea in my back pocket for the time when the current ‘career’ hits the ever-looming buffers Winking smile

But, no, this post is about ‘Project Prague’ and hand gestures and I’ve written quite a bit in the past about natural gesture recognition with technologies like the Kinect for Windows V2 and with the RealSense F200 and SR300 cameras.

Kinect has has great capabilities for colour, depth, infra-red imaging and a smart (i.e. cloud-trained AI) runtime which can bring all those streams together and give you (human) skeletal tracking of 25 joints on 6 bodies at 30 frames per second. It can also do some facial tracking and has an AI based gesture recognition system which can be trained to recognise human-body based gestures like “hands above head” or “golf swing” and so on.

That camera has a range of approx 0.5m to 4.5m and perhaps because of this long range it does not have a great deal of support for hand-based gestures although it can report some hand joints and a few different hand states like open/closed but it doesn’t go much beyond that.

I’ve also written about the RealSense F200 and SR300 cameras although I never had a lot of success with the SR300 and those cameras have a much shorter range (< 1m) than the Kinect for Windows V2 but have/had some different capabilities in that they have surfaced functionality like;

  • Detailed facial detection providing feature positions etc and facial recognition.
  • Emotion detection providing states like ‘happy’, ‘sad’ etc (although this got removed from the original SDK at a later point)
  • Hand tracking features
    • The SDK has great support for tracking of hands down to the joint level with > 20 joints reported by the SDK
    • The SDK also has support for hand-based gestures such as “V sign”, “full pinch” etc.

With any of these cameras and their SDKs the processing happens locally on the (high bandwidth) data at frame rates of 15/30/60 FPS and so it’s quite different to those scenarios where you might be selectively capturing data and sending it to the cloud for processing as you see with the Cognitive Services but both approaches have their benefits and are open to being used in combination.

In terms of this functionality around hand tracking and gestures, I bundled some of what I knew about this into a video last year and published it to Channel9 although it’s probably quite a bit out of date at this point;

image

but it’s been a topic that interested me for a long time and so when I saw ‘Project Prague’ announced a few weeks ago I was naturally interested.

My first question on ‘Prague’ was whether it would be make use of a local-processing or a cloud-based-processing model and, if the former, whether it would require a depth camera or would be based purely on a web cam.

It turns out that ‘Prague’ is locally processing data and does require either a Kinect for Windows V2 camera or a RealSense SR300 camera with the recommendation on the website being to use the SR300.

I dug my Intel RealSense SR300 out of the drawer where it’s been living for a few months, plugged it in to my Surface Book and set about seeing whether I could get a ‘Prague’ demo up and running on it.

Plugging in the SR300

I hadn’t plugged the SR300 into my Surface Book since I reinstalled Windows and so I wondered how that had progressed since the early days of the camera and since Windows has moved to Creators Update (I’m running 15063.447).

I hadn’t installed the RealSense SDK onto this machine but Windows seemed to recognise the device and install it regardless although I did find that the initial install left some “warning triangles” in device manager that had to be resolved by a manual “Scan for hardware changes” from the Device Manager menu but then things seemed to sort themselves out and Device Manager showed;

image

which the modern devices app shows as;

image

and that seemed reasonable and I didn’t have to visit the troubleshooting page but I wasn’t surprised to see that it existed based on my previous experience with the SR300 but, instead, I went off to download ‘Project Prague’.

Installing ‘Prague’

Nothing much to report here – there’s an MSI that you download and run;

image

and “It Just Worked” so nothing to say about that.

Once installation was figured, as per the docs, the “Microsoft Gestures Service” app ran up and I tried to do as the documentation advised and make sure that the app was recognising my hand – it didn’t seem to be working as below;

image

but then I tried with my right hand and things seemed to be working better;

image

This is actually the window view (called the ‘Geek View’!) of a system tray application (the “gestures service”) which doesn’t seem to be a true service in the NT sense but instead seems to be a regular app configured to run at startup on the system;

image

so, much like the Kinect Runtime it seems that this is the code which sits and watches frames from the camera and then applications become “clients” of this service and the “DiscoveryClient” which is also highlighted in the screenshot as being configured to run at startup is one such demo app which picks up gestures from the service and (according to the docs) routes the gestures through to the shell.

Here’s the system tray application;

image

and if I perform the “bloom” gesture (familiar from Windows Mixed Reality) then the system tray app pops up;

image

and tells me that there are other gestures already active to open the start menu and toggle the volume. The gestures animate on mouse over to show how to execute them and I had no problem with using the gesture to toggle the volume on my machine but I did struggle a little with the gesture to open the start menu.

The ‘timeline’ view in the ‘Geek View’ here is interesting because it shows gestures being detected or not in real time and you can perhaps see on the timeline below how I’m struggling to execute the ‘Shell_Start’ gesture and it’s getting recognised as a ‘Discovery_Tray’ gesture. In that screenshot the white blob indicates a “pose” whereas the green blobs represent completed “gestures”.

image

There’s also a ‘settings’ section here which shows me;

image

and then on the GestPacks section;

image

suggests that the service has integration for various apps. At the time of writing, the “get more online” option didn’t seem to link to anything that I could spot but I noticed by running PowerPoint that the app is monitoring which app is in the foreground and is switching its gestures list to relate to that contextual app.

So, when running PowerPoint, the gesture service shows;

image

and those gestures worked very well for me in PowerPoint – it was easy to start a slideshow and then advance the slides by just tapping through in the air with my finger. These details can also be seen in the settings app;

image

which suggests that these gestures are contextual within the app – for example the “Rotate Right 90” option doesn’t show up until I select an object in PowerPoint;

image

and I can see this dynamically changing in the ‘Geek View’ – here’s the view when no object is selected;

image

and I can see that there are perhaps 3 gestures registered whereas if I select an object in PowerPoint then I see;

image

and those gestures worked pretty well for me Smile 

Other Demo Applications

I experimented with the ‘Camera Viewer’ app which works really well. Once again, from the ‘Geek View’ I can see that this app has registered some gestures and you can perhaps see below that I am trying out the ‘peace’ gesture and the geek view is showing that this is registered, that it has completed and the app is displaying some nice doves to show it’s seen the gesture;

image

One other interesting aspect of this app is that it displays a ‘Connecting to Gesture Service’ message as you bring it back into focus suggesting that there’s some sort of ‘connection’ to the gestures service that comes/goes over time.

These gestures worked really well for me and by this point I was wondering how these gestures apps were plugging into the architecture here, how they were implemented and so I wanted to see if I could write some code. I did notice that the GestPacks seem to live in a folder under the ‘Prague’ installation;

image

and a quick look at one of the DLLs (e.g. PowerPoint) shows that this is .NET code interop’ing into PowerPoint as you’d expect although the naming suggests there’s some ATL based code in the chain here somewhere;

image

Coding ‘Prague’

The API docs link leads over to this web page which points to a Microsoft.Gestures namespace that seems to suggest is part of .NET Core 2.0. That would seem to suggest that (right now) you’re not going to be able to reference this from a Universal Windows App project but you can reference it from a .NET Framework project and so I just referenced it from a command line project targeting .NET Framework 4.6.2.

The assemblies seem to live in the equivalent of;

“C:\Users\mtaulty\AppData\Roaming\Microsoft\Prague\PragueVersions\LatestVersion\SDK”

and I added a reference to 3 of them;

image

It’s also worth noting that there are a number of code samples over in this github repository;

https://github.com/Microsoft/Gestures-Samples

Although, at the time of writing, I haven’t really referred to those too much as I was trying to see what my experience was like in ‘starting from scratch’ and to that end I had a quick look at what seemed to be the main assembly in the object browser;

image

and the structure seemed to suggest that the library is using TCP sockets as an ‘RPC’ mechanism to communicate between an app and the gestures service and a quick look at the gestures service process with Process Explorer did show that it was listening for traffic;

image

So, how to get a connection? It seems fairly easy in that the docs point you to the GesturesEndpointService class and there’s a GesturesEndpointServiceFactory to make those and then IntelliSense popped up as below to reinforce the idea that there is some socket based comms going on here;

image

From there, I wanted to define my own gesture which would allow the user to start with an open spread hand and then tap their thumb onto their four fingers in sequence which seemed to consist of 5 stages and so I read the docs around how gestures, poses and motion work and added some code to my console application to see if I could code up this gesture;

namespace ConsoleApp1
{
  using Microsoft.Gestures;
  using Microsoft.Gestures.Endpoint;
  using System;
  using System.Collections.Generic;
  using System.Threading.Tasks;

  class Program
  {
    static void Main(string[] args)
    {
      ConnectAsync();

      Console.WriteLine("Hit return to exit...");

      Console.ReadLine();

      ServiceEndpoint.Disconnect();
      ServiceEndpoint.Dispose();
    }
    static async Task ConnectAsync()
    {
      Console.WriteLine("Connecting...");

      try
      {
        var connected = await ServiceEndpoint.ConnectAsync();

        if (!connected)
        {
          Console.WriteLine("Failed to connect...");
        }
        else
        {
          await serviceEndpoint.RegisterGesture(CountGesture, true);
        }
      }
      catch
      {
        Console.WriteLine("Exception thrown in starting up...");
      }
    }
    static void OnTriggered(object sender, GestureSegmentTriggeredEventArgs e)
    {
      Console.WriteLine($"Gesture {e.GestureSegment.Name} triggered!");
    }
    static GesturesServiceEndpoint ServiceEndpoint
    {
      get
      {
        if (serviceEndpoint == null)
        {
          serviceEndpoint = GesturesServiceEndpointFactory.Create();
        }
        return (serviceEndpoint);
      }
    }
    static Gesture CountGesture
    {
      get
      {
        if (countGesture == null)
        {
          var poses = new List<HandPose>();

          var allFingersContext = new AllFingersContext();

          // Hand starts upright, forward and with fingers spread...
          var startPose = new HandPose(
            "start",
            new FingerPose(
              allFingersContext, FingerFlexion.Open),
            new FingertipDistanceRelation(
              allFingersContext, RelativeDistance.NotTouching));

          poses.Add(startPose);

          foreach (Finger finger in
            new[] { Finger.Index, Finger.Middle, Finger.Ring, Finger.Pinky })
          {
            poses.Add(
              new HandPose(
              $"pose{finger}",
              new FingertipDistanceRelation(
                Finger.Thumb, RelativeDistance.Touching, finger)));
          }
          countGesture = new Gesture("count", poses.ToArray());
          countGesture.Triggered += OnTriggered;
        }
        return (countGesture);
      }
    }
    static Gesture countGesture;
    static GesturesServiceEndpoint serviceEndpoint;
  }
}

I’m very unsure as to whether my code is specifying my gesture ‘completely’ or ‘accurately’ but what amazed me about this is that I really only took one stab at it and it “worked”.

That is, I can run my app and see my gesture being built up from its 5 constituent poses in the ‘Geek View’ and then my console app has its event triggered and displays the right output;

image

What I’d flag about that code is that it is bad in that it’s using async/await in a console app and so it’s likely that thread pool threads are being used to dispatch all the “completions” which mean that lots of threads are potentially running through this code and interacting with objects which may/not have thread affinity – I’ve not done anything to mitigate that here.

Other than that, I’m impressed – this was a real joy to work with and I guess the only way it could be made easier would be to allow for the visual drawing or perhaps the recording of hand gestures.

The only other thing that I noticed is that my CPU can get a bit active while using these bits and they seem to run at about 800MB of memory but then Project Prague is ‘Experimental’ right now so I’m sure that could change over time.

I’d like to also try this code on a Kinect for Windows V2 – if I do that, I’ll update this post or add another one.

Windows 10, WPF, RealSense SR300, Person Tracking–Continued

I failed to get the new person tracking feature of the RealSense SR300 camera working in this previous post but when I wrote that post I was struggling to get the SR300 camera working at all.

In my last post I had more success in that I managed to get the camera working on my Surface Pro 3 with some UWP code which did some facial work.

That made me think that it might be time to give person tracking a try on my Surface Pro 3 and I’ve had much more success there although it’s fair to say that I don’t have everything working just yet and it hasn’t been a completely ‘free ride’ so far.

Person Tracking isn’t part of what’s supported in the UWP SDK and so I went back to the .NET SDK in order to try and get something up and running and the release notes for the SDK say that person tracking is in preview, so that needs to be remembered when experimenting with it.

To try things out, I made a WPF application and I made a UI that consists purely of a Canvas and an Image;

    <Grid>
        <Image
            x:Name="displayImage" />
        <Canvas
            x:Name="displayCanvas" />
    </Grid>

the idea here is that the image can display video frames from the camera and the Canvas can overlay data picked up from the person tracking modules.

I also set up my project to reference the RealSense SDK in a more ‘direct’ way than I’ve had to do previously in that I copied the source out of;

c:\program files (x86)\Intel\RSSDK\framework\common\pxcclr.cs

and I added that copied project to my solution as a referenced project;

image

I’ll come back to why I ended up doing this in a moment.

I spent quite a bit of time writing some code behind this XAML UI which I’ve included below and which I’ve tried to comment reasonably completely;

namespace WpfApplication30
{
  using System.Windows;
  using System.Windows.Media.Imaging;

  public partial class MainWindow : Window
  {
    public MainWindow()
    {
      InitializeComponent();
      this.Loaded += OnLoaded;
    }
    void OnLoaded(object sender, RoutedEventArgs e)
    {
      // The FrameRenderInfo is a class I've written which does 2 things for me
      // 1) Provides methods to 'copy/capture' per-frame data.
      // 2) Provides methods to render that data to a canvas.
      // Hence, we construct it with our canvas.
      this.frameRenderInfo = new FrameRenderInfo(this.displayCanvas);

      // Standard - create the RealSense SenseManager, main object for talking
      // to the camera 'pipeline'
      this.senseManager = PXCMSenseManager.CreateInstance();

      // Ask it to switch on person tracking.
      var status = this.senseManager.EnablePersonTracking();

      if (status == pxcmStatus.PXCM_STATUS_NO_ERROR)
      {
        this.personModule = this.senseManager.QueryPersonTracking();

        // Configure person tracking to suit our needs.
        this.ConfigureModules();

        // Initialise the sense manager giving it a callback to call
        // when it has frames of data for us.
        status = this.senseManager.Init(
          new PXCMSenseManager.Handler()
          {
            onModuleProcessedFrame = OnModuleProcessedFrameThreadPool
          }
        );

        if (status == pxcmStatus.PXCM_STATUS_NO_ERROR)
        {
          // Mirror so that L<->R are reversed.
          this.senseManager.captureManager.device.SetMirrorMode(
            PXCMCapture.Device.MirrorMode.MIRROR_MODE_HORIZONTAL);

          // Tell it to throw frames at us as soon as it has some via
          // our callback.
          this.senseManager.StreamFrames(false);
        }
      }
    }
    void ConfigureModules()
    {
      using (var config = this.personModule.QueryConfiguration())
      {
        // We ask for a maximum of one person.
        var tracking = config.QueryTracking();
        tracking.SetMaxTrackedPersons(1);
        tracking.Enable();

        // We ask for a maximum of 1 person and we ask for full body tracking although
        // I've yet to see that deliver anything that's not in the upper body.
        var skeleton = config.QuerySkeletonJoints();
        skeleton.SetMaxTrackedPersons(1);
        skeleton.SetTrackingArea(
          PXCMPersonTrackingConfiguration.SkeletonJointsConfiguration.SkeletonMode.AREA_FULL_BODY);
        skeleton.Enable();
      }
    }
    /// <summary>
    /// Handler for each frame, runs on some thread that I don't own and so I have to
    /// copy data from the frame, store it for later when we can render it from the
    /// UI thread.
    /// </summary>
    /// <param name="mid"></param>
    /// <param name="module"></param>
    /// <param name="sample"></param>
    /// <returns></returns>
    pxcmStatus OnModuleProcessedFrameThreadPool(int mid, PXCMBase module, PXCMCapture.Sample sample)
    {
      // We check to see if our 'buffer' is busy. If so, we drop the frame and carry on.
      if (this.frameRenderInfo.Acquire())
      {
        // Copy the data for the color video frame.
        this.frameRenderInfo.CaptureColorImage(sample);

        // Do we have data from the person tracking module? Hopefully...
        if (mid == PXCMPersonTrackingModule.CUID)
        {
          using (var data = this.personModule.QueryOutput())
          {
            // Copy the bounding boxes around the torso and the head.
            this.frameRenderInfo.CapturePersonData(data);

            // Copy any individual joints that we can find.
            this.frameRenderInfo.CaptureJointData(data);
          }
        }
        // Now, switch to the UI thread to draw what we have captured.
        this.Dispatcher.Invoke(this.RenderOnUIThread);
      }
      return (pxcmStatus.PXCM_STATUS_NO_ERROR);
    }
    /// <summary>
    /// Draws the data that was previously captured.
    /// </summary>
    void RenderOnUIThread()
    {
      // Create our bitmap if we haven't already - need to do this here because
      // it needs to be done on the UI thread and it can't be done before we
      // have the sizes from the video frames.
      this.InitialiseColorFrameBitmap();

      // Update with the latest video frame.
      this.frameRenderInfo.RenderBitmap(this.colorFrameBitmap);

      // Draw boxes for the head and torso.
      this.frameRenderInfo.RenderPersonData();

      // Draw ellipses for any joints we know about.
      this.frameRenderInfo.RenderJointData();

      // Allow our 'buffer' to pick up the next lot of data (we may have
      // missed frames in the meantime).
      this.frameRenderInfo.Release();
    }    
    /// <summary>
    /// Creates the WriteableBitmap that we use for displaying frames
    /// from the colour source.
    /// </summary>
    void InitialiseColorFrameBitmap()
    {
      if (this.colorFrameBitmap == null)
      {
        this.colorFrameBitmap = this.frameRenderInfo.CreateBitmap();
        this.displayImage.Source = this.colorFrameBitmap;
      }
    }
    FrameRenderInfo frameRenderInfo;
    WriteableBitmap colorFrameBitmap;
    PXCMPersonTrackingModule personModule;
    PXCMSenseManager senseManager;
  }
}

This all relies on a class called FrameRenderInfo which I use to do two things (it should almost certainly be 2 classes (at least));

    1. Gather data from the frames of data as they arrive.
    2. Render that data on the UI thread at a later point.

That class could do with a lot of optimisation but the quick version that I have so far looks like this and I’ve tried to comment it again;

namespace WpfApplication30
{
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Threading;
  using System.Windows;
  using System.Windows.Controls;
  using System.Windows.Media;
  using System.Windows.Media.Imaging;
  using System.Windows.Shapes;
  using static PXCMPersonTrackingData.PersonJoints;
  class FrameRenderInfo
  {
    internal FrameRenderInfo(Canvas canvas)
    {
      this.canvas = canvas;
      this.jointEllipses = new Dictionary<JointType, Ellipse>();
      this.whiteBrush = new SolidColorBrush(Colors.White);
    }
    /// <summary>
    /// Attempts! to ensure that we are either handling a frame or 
    /// rendering a frame but not both at the same time.
    /// </summary>
    /// <returns></returns>
    internal bool Acquire()
    {
      return (Interlocked.CompareExchange(ref this.busyFlag, 1, 0) == 0);
    }
    /// <summary>
    /// Companion to Acquire
    /// </summary>
    internal void Release()
    {
      Interlocked.Decrement(ref this.busyFlag);
    }
    /// <summary>
    /// Copies the data from the current colour frame off the video source
    /// into a buffer so that we can later render it. Also picks up the
    /// dimensions of that frame on first use.
    /// </summary>
    /// <param name="sample"></param>
    internal void CaptureColorImage(PXCMCapture.Sample sample)
    {
      PXCMImage.ImageData colorImage;

      if (sample.color.AcquireAccess(
          PXCMImage.Access.ACCESS_READ,
          PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32,
          out colorImage) == pxcmStatus.PXCM_STATUS_NO_ERROR)
      {
        if (!this.colorFrameDimensions.HasArea)
        {
          this.colorFrameDimensions.Width = sample.color.info.width;
          this.colorFrameDimensions.Height = sample.color.info.height;
          this.xScaleMultiplierCameraToCanvas = 1.0d / this.colorFrameDimensions.Width;
          this.yScaleMultiplierCameraToCanvas = 1.0d / this.colorFrameDimensions.Height;
        }

        if (this.colorFrameBuffer == null)
        {
          this.colorFrameBuffer = new byte[
            this.colorFrameDimensions.Width * this.colorFrameDimensions.Height * 4];
        }
        colorImage.ToByteArray(0, this.colorFrameBuffer);

        sample.color.ReleaseAccess(colorImage);
      }
    }
    /// <summary>
    /// Captures the bounding boxes around the head and torso for later rendering.
    /// </summary>
    /// <param name="data"></param>
    internal void CapturePersonData(PXCMPersonTrackingData data)
    {
      this.headBox.w = this.bodyBox.w = 0;
      this.headBox.h = this.bodyBox.h = 0;

      if (data.QueryNumberOfPeople() > 0)
      {
        var personData = data.QueryPersonData(
          PXCMPersonTrackingData.AccessOrderType.ACCESS_ORDER_BY_ID, 0);

        if (personData != null)
        {
          this.CaptureHeadAndBodyBoxes(personData);
        }
      }
    }
    /// <summary>
    /// Does the work of capturing the head and torso bounding boxes.
    /// </summary>
    /// <param name="person"></param>
    void CaptureHeadAndBodyBoxes(PXCMPersonTrackingData.Person person)
    {
      var tracking = person.QueryTracking();

      var boundingBox = tracking?.Query2DBoundingBox();

      if ((boundingBox != null) &&
        (boundingBox.confidence > 50))
      {
        this.bodyBox = boundingBox.rect;
      }
      boundingBox = tracking?.QueryHeadBoundingBox();

      if ((boundingBox != null) &&
        (boundingBox.confidence > 50))
      {
        this.headBox = boundingBox.rect;
      }
    }
    /// <summary>
    /// Captures the skeletal joints - unsure exactly about the role here of StartTracking()
    /// but it seems to be needed to get the joints. 
    /// </summary>
    /// <param name="data"></param>
    internal void CaptureJointData(PXCMPersonTrackingData data)
    {
      this.currentJoints = null;

      if (data.QueryNumberOfPeople() > 0)
      {
        if (data.GetTrackingState() == PXCMPersonTrackingData.TrackingState.TRACKING_STATE_DETECTING)
        {
          data.StartTracking(0);
        }
        else
        {
          var personData = data.QueryPersonData(
            PXCMPersonTrackingData.AccessOrderType.ACCESS_ORDER_BY_ID, 0);

          var joints = personData?.QuerySkeletonJoints();
          var jointCount = joints?.QueryNumJoints();

          if (jointCount > 0)
          {
            if (!joints.QueryJoints(out this.currentJoints))
            {
              this.currentJoints = null;
            }
          }
        }
      }
    }   

    /// <summary>
    /// Renders the head and torso boxes to the Canvas by scaling them and
    /// controlling their visibility.
    /// </summary>
    internal void RenderPersonData()
    {
      if (this.headRectangle == null)
      {
        this.headRectangle = this.MakeRectangle(Colors.Red);
        this.bodyRectangle = this.MakeRectangle(Colors.Green);
      }
      this.PositionCanvasShapeForBoundingBox(this.headRectangle, 
        this.headBox.x, this.headBox.y, this.headBox.w, this.headBox.h);
      this.PositionCanvasShapeForBoundingBox(this.bodyRectangle, 
        this.bodyBox.x, this.bodyBox.y, this.bodyBox.w, this.bodyBox.h);
    }
    /// <summary>
    /// Renders the joint data to the Canvas by drawing ellipses and 
    /// trying to avoid re-creating every ellipse every time although
    /// very questionable as to whether that's better/worse than
    /// showing/hiding them as I haven't tested.
    /// </summary>
    internal void RenderJointData()
    {
      foreach (var ellipse in this.jointEllipses.Values)
      {
        ellipse.Visibility = Visibility.Hidden;
      }
      if (this.currentJoints != null)
      {
        // We shoot for the joints where there is at least some confidence.
        var confidentJoints = this.currentJoints.Where(c => c.confidenceImage > 0).ToList();

        foreach (var joint in confidentJoints)
        {
          if (!this.jointEllipses.ContainsKey(joint.jointType))
          {
            this.jointEllipses[joint.jointType] = MakeEllipse();
          }
          var ellipse = this.jointEllipses[joint.jointType];

          ellipse.Visibility = Visibility.Visible;

          this.PositionCanvasShapeTopLeftAt(
            ellipse,
            (int)joint.image.x,
            (int)joint.image.y);
        }
      }
    }   
    /// <summary>
    /// Updates the writeable bitmap with the latest frame captured.
    /// </summary>
    /// <param name="colorFrameBitmap"></param>
    internal void RenderBitmap(WriteableBitmap colorFrameBitmap)
    {
      colorFrameBitmap.WritePixels(
        this.colorFrameDimensions,
        this.colorFrameBuffer,
        this.colorFrameDimensions.Width * 4,
        0);
    }
    internal WriteableBitmap CreateBitmap()
    {
      return (new WriteableBitmap(
        this.colorFrameDimensions.Width,
        this.colorFrameDimensions.Height,
        96,
        96,
        PixelFormats.Bgra32,
        null));
    }
    void PositionCanvasShapeTopLeftAt(
      Shape shape,
      int x,
      int y)
    {
      Canvas.SetLeft(
        shape,
        this.ScaleXCameraValue(x, this.canvas.ActualWidth) - (shape.Width / 2));

      Canvas.SetTop(
        shape,
        this.ScaleYCameraValue(y, this.canvas.ActualHeight) - (shape.Height / 2));
    }
    void PositionCanvasShapeForBoundingBox(
      Shape shape,
      int x,
      int y,
      int width,
      int height)
    {
      if (width == 0)
      {
        shape.Visibility = Visibility.Hidden;
      }
      else
      {
        shape.Visibility = Visibility.Visible;

        shape.Width = this.ScaleXCameraValue(width, this.canvas.ActualWidth);

        shape.Height = this.ScaleYCameraValue(height, this.canvas.ActualHeight);

        Canvas.SetLeft(
          shape,
          this.ScaleXCameraValue(x, this.canvas.ActualWidth));

        Canvas.SetTop(
          shape,
          this.ScaleYCameraValue(y, this.canvas.ActualHeight));
      }
    }
    double ScaleXCameraValue(double value, double maximumValue)
    {
      return (value * this.xScaleMultiplierCameraToCanvas * maximumValue);
    }
    double ScaleYCameraValue(double value, double maximumValue)
    {
      return (value * this.yScaleMultiplierCameraToCanvas * maximumValue);
    }
    Ellipse MakeEllipse()
    {
      var ellipse = new Ellipse()
      {
        Fill = whiteBrush,
        Width = ELLIPSE_DIAMETER,
        Height = ELLIPSE_DIAMETER
      };
      this.canvas.Children.Add(ellipse);
      return (ellipse);
    }
    Rectangle MakeRectangle(Color rectangleColor)
    {
      var rectangle = new Rectangle()
      {
        Stroke = new SolidColorBrush(rectangleColor),
        StrokeThickness = 2
      };
      this.canvas.Children.Add(rectangle);
      return (rectangle);
    }
    SolidColorBrush whiteBrush;
    Dictionary<JointType, Ellipse> jointEllipses;
    PXCMPersonTrackingData.PersonJoints.SkeletonPoint[] currentJoints;
    double xScaleMultiplierCameraToCanvas;
    double yScaleMultiplierCameraToCanvas;
    PXCMRectI32 headBox;
    PXCMRectI32 bodyBox;
    Int32Rect colorFrameDimensions;
    byte[] colorFrameBuffer;
    int busyFlag;
    Canvas canvas;
    Rectangle headRectangle;
    Rectangle bodyRectangle;
    const int ELLIPSE_DIAMETER = 20;
  }
}

The code above won’t compile against the SDK as shipped and that’s because I spent quite a long time running this and hitting this error;

image

and this was coming from the call into QueryJoints in the SDK as you can see below (I changed my code back to be able to repro the error);

image

This is what caused me to reference the .NET SDK as source rather than as a library so that I could debug it better and I spent some time scratching my head over these 2 functions (copied from the Intel SDK file pxcmpersontrackingdata.cs);

    [DllImport(DLLNAME)]
    [return: MarshalAs(UnmanagedType.Bool)]
    internal static extern Boolean PXCMPersonTrackingData_PersonJoints_QueryJoints(IntPtr instance, IntPtr joints);

    internal static Boolean QueryJointsINT(IntPtr instance, out SkeletonPoint[] joints)
    {
      int njoints = PXCMPersonTrackingData_PersonJoints_QueryNumJoints(instance);

      IntPtr joints2 = Marshal.AllocHGlobal(Marshal.SizeOf(typeof(SkeletonPoint)) * njoints);

      Boolean sts = PXCMPersonTrackingData_PersonJoints_QueryJoints(instance, joints2);

      if (sts)
      {
        joints = new SkeletonPoint[njoints];

        for (int i = 0, j = 0; i < njoints; i++, j += Marshal.SizeOf(typeof(SkeletonPoint)))
        {
          joints[i] = new SkeletonPoint();
          Marshal.PtrToStructure(new IntPtr(joints2.ToInt64() + j), joints[i]);
        }
      }
      else
      {
        joints = null;
      }
      Marshal.FreeHGlobal(joints2);
      return sts;
    }

before deciding that I couldn’t see how the marshalling layer would know how to deal with this type SkeletonPoint (also copied from the same file in the SDK);

  [Serializable]
        [StructLayout(LayoutKind.Sequential)]
        public class SkeletonPoint
        {
            public JointType jointType;
            public Int32 confidenceImage;
            public Int32 confidenceWorld;
            public PXCMPoint3DF32 world;
            public PXCMPointF32 image;
            public Int32[] reserved;

            public SkeletonPoint()
            {
                reserved = new Int32[10];
            }
        };

and, specifically, how it would handle that array called reserved which doesn’t have any size on it. In the native SDK, I saw that it was defined as;

struct SkeletonPoint
		{
			JointType jointType;
			pxcI32 confidenceImage;
			pxcI32 confidenceWorld;
			PXCPoint3DF32 world;
			PXCPointF32   image;
			pxcI32 reserved[10];
		};

and so I figured that I’d make my own change to the definition so as to add;

            [MarshalAs(UnmanagedType.ByValArray, SizeConst = 10)]
            public Int32[] reserved;

and that seemed to help a lot except that I still struggled with this function which is part of the class PersonJoints (also taken from the same file in the SDK);

public Boolean QueryJoints(SkeletonPoint[] joints)
			{
				return QueryJointsINT(instance, out joints);
			}

because it passes the parameter joints as an out parameter to the QueryJointsINT function but joints isn’t passed as an out parameter into QueryJoints itself and so it wasn’t clear to me how the newly allocated array was supposed to get back to the caller and so I changed that definition to;

public Boolean QueryJoints(out SkeletonPoint[] joints)
			{
				return QueryJointsINT(instance, out joints);
			}

and then things started to work a bit better.

Note – I’m fairly certain that work could be done in the declaration of the external function that is called via PInvoke here in order to let the marshalling layer do more of the work around this array but, for now, I’ve tried to make the minimal change to fix the problem that I was seeing and, hopefully, the SDK will get updated and this error will go away.

As an aside, this post on the RealSense forums seems to have hit the same issue.

With that issue out of the way, I did manage to spark up the code here and get it working ‘reasonably’ with the caveats of;

    1. My performance isn’t great but I suspect I could do quite a bit to improve that.
    2. I don’t seem to get too many joints reported from the SDK – usually left hand, right hand, mid spine and head.
    3. I don’t often get the bounding box for the head reported, it seems quite sporadic.

but the bounding box for the torso seems rock solid.

I should also say that I’ve played around a few times with the SkeletonMode parameter that is passed to SetTrackingArea as part of the configuration here – I’ve varied it between its 4 options (upper body, upper body rough, full body, full body rough) to try and see which works best and I haven’t worked that out just yet.

Here’s a quick screen capture of it ‘mostly working’ Smile

I’m going to keep experimenting with this SDK as there’s a lot more data that can be obtained from it in terms of person tracking and, hopefully, more info and samples will come out around how it’s meant to work as I’m working somewhat in the dark with this at the moment but I thought I’d share this early experiment as an indication of what’s coming and so that it might help anyone else that’s experimenting with the preview too.