Windows 10, WPF, RealSense SR300, Person Tracking–Continued

I failed to get the new person tracking feature of the RealSense SR300 camera working in this previous post but when I wrote that post I was struggling to get the SR300 camera working at all.

In my last post I had more success in that I managed to get the camera working on my Surface Pro 3 with some UWP code which did some facial work.

That made me think that it might be time to give person tracking a try on my Surface Pro 3 and I’ve had much more success there although it’s fair to say that I don’t have everything working just yet and it hasn’t been a completely ‘free ride’ so far.

Person Tracking isn’t part of what’s supported in the UWP SDK and so I went back to the .NET SDK in order to try and get something up and running and the release notes for the SDK say that person tracking is in preview, so that needs to be remembered when experimenting with it.

To try things out, I made a WPF application and I made a UI that consists purely of a Canvas and an Image;

    <Grid>
        <Image
            x:Name="displayImage" />
        <Canvas
            x:Name="displayCanvas" />
    </Grid>

the idea here is that the image can display video frames from the camera and the Canvas can overlay data picked up from the person tracking modules.

I also set up my project to reference the RealSense SDK in a more ‘direct’ way than I’ve had to do previously in that I copied the source out of;

c:\program files (x86)\Intel\RSSDK\framework\common\pxcclr.cs

and I added that copied project to my solution as a referenced project;

image

I’ll come back to why I ended up doing this in a moment.

I spent quite a bit of time writing some code behind this XAML UI which I’ve included below and which I’ve tried to comment reasonably completely;

namespace WpfApplication30
{
  using System.Windows;
  using System.Windows.Media.Imaging;

  public partial class MainWindow : Window
  {
    public MainWindow()
    {
      InitializeComponent();
      this.Loaded += OnLoaded;
    }
    void OnLoaded(object sender, RoutedEventArgs e)
    {
      // The FrameRenderInfo is a class I've written which does 2 things for me
      // 1) Provides methods to 'copy/capture' per-frame data.
      // 2) Provides methods to render that data to a canvas.
      // Hence, we construct it with our canvas.
      this.frameRenderInfo = new FrameRenderInfo(this.displayCanvas);

      // Standard - create the RealSense SenseManager, main object for talking
      // to the camera 'pipeline'
      this.senseManager = PXCMSenseManager.CreateInstance();

      // Ask it to switch on person tracking.
      var status = this.senseManager.EnablePersonTracking();

      if (status == pxcmStatus.PXCM_STATUS_NO_ERROR)
      {
        this.personModule = this.senseManager.QueryPersonTracking();

        // Configure person tracking to suit our needs.
        this.ConfigureModules();

        // Initialise the sense manager giving it a callback to call
        // when it has frames of data for us.
        status = this.senseManager.Init(
          new PXCMSenseManager.Handler()
          {
            onModuleProcessedFrame = OnModuleProcessedFrameThreadPool
          }
        );

        if (status == pxcmStatus.PXCM_STATUS_NO_ERROR)
        {
          // Mirror so that L<->R are reversed.
          this.senseManager.captureManager.device.SetMirrorMode(
            PXCMCapture.Device.MirrorMode.MIRROR_MODE_HORIZONTAL);

          // Tell it to throw frames at us as soon as it has some via
          // our callback.
          this.senseManager.StreamFrames(false);
        }
      }
    }
    void ConfigureModules()
    {
      using (var config = this.personModule.QueryConfiguration())
      {
        // We ask for a maximum of one person.
        var tracking = config.QueryTracking();
        tracking.SetMaxTrackedPersons(1);
        tracking.Enable();

        // We ask for a maximum of 1 person and we ask for full body tracking although
        // I've yet to see that deliver anything that's not in the upper body.
        var skeleton = config.QuerySkeletonJoints();
        skeleton.SetMaxTrackedPersons(1);
        skeleton.SetTrackingArea(
          PXCMPersonTrackingConfiguration.SkeletonJointsConfiguration.SkeletonMode.AREA_FULL_BODY);
        skeleton.Enable();
      }
    }
    /// <summary>
    /// Handler for each frame, runs on some thread that I don't own and so I have to
    /// copy data from the frame, store it for later when we can render it from the
    /// UI thread.
    /// </summary>
    /// <param name="mid"></param>
    /// <param name="module"></param>
    /// <param name="sample"></param>
    /// <returns></returns>
    pxcmStatus OnModuleProcessedFrameThreadPool(int mid, PXCMBase module, PXCMCapture.Sample sample)
    {
      // We check to see if our 'buffer' is busy. If so, we drop the frame and carry on.
      if (this.frameRenderInfo.Acquire())
      {
        // Copy the data for the color video frame.
        this.frameRenderInfo.CaptureColorImage(sample);

        // Do we have data from the person tracking module? Hopefully...
        if (mid == PXCMPersonTrackingModule.CUID)
        {
          using (var data = this.personModule.QueryOutput())
          {
            // Copy the bounding boxes around the torso and the head.
            this.frameRenderInfo.CapturePersonData(data);

            // Copy any individual joints that we can find.
            this.frameRenderInfo.CaptureJointData(data);
          }
        }
        // Now, switch to the UI thread to draw what we have captured.
        this.Dispatcher.Invoke(this.RenderOnUIThread);
      }
      return (pxcmStatus.PXCM_STATUS_NO_ERROR);
    }
    /// <summary>
    /// Draws the data that was previously captured.
    /// </summary>
    void RenderOnUIThread()
    {
      // Create our bitmap if we haven't already - need to do this here because
      // it needs to be done on the UI thread and it can't be done before we
      // have the sizes from the video frames.
      this.InitialiseColorFrameBitmap();

      // Update with the latest video frame.
      this.frameRenderInfo.RenderBitmap(this.colorFrameBitmap);

      // Draw boxes for the head and torso.
      this.frameRenderInfo.RenderPersonData();

      // Draw ellipses for any joints we know about.
      this.frameRenderInfo.RenderJointData();

      // Allow our 'buffer' to pick up the next lot of data (we may have
      // missed frames in the meantime).
      this.frameRenderInfo.Release();
    }    
    /// <summary>
    /// Creates the WriteableBitmap that we use for displaying frames
    /// from the colour source.
    /// </summary>
    void InitialiseColorFrameBitmap()
    {
      if (this.colorFrameBitmap == null)
      {
        this.colorFrameBitmap = this.frameRenderInfo.CreateBitmap();
        this.displayImage.Source = this.colorFrameBitmap;
      }
    }
    FrameRenderInfo frameRenderInfo;
    WriteableBitmap colorFrameBitmap;
    PXCMPersonTrackingModule personModule;
    PXCMSenseManager senseManager;
  }
}

This all relies on a class called FrameRenderInfo which I use to do two things (it should almost certainly be 2 classes (at least));

    1. Gather data from the frames of data as they arrive.
    2. Render that data on the UI thread at a later point.

That class could do with a lot of optimisation but the quick version that I have so far looks like this and I’ve tried to comment it again;

namespace WpfApplication30
{
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Threading;
  using System.Windows;
  using System.Windows.Controls;
  using System.Windows.Media;
  using System.Windows.Media.Imaging;
  using System.Windows.Shapes;
  using static PXCMPersonTrackingData.PersonJoints;
  class FrameRenderInfo
  {
    internal FrameRenderInfo(Canvas canvas)
    {
      this.canvas = canvas;
      this.jointEllipses = new Dictionary<JointType, Ellipse>();
      this.whiteBrush = new SolidColorBrush(Colors.White);
    }
    /// <summary>
    /// Attempts! to ensure that we are either handling a frame or 
    /// rendering a frame but not both at the same time.
    /// </summary>
    /// <returns></returns>
    internal bool Acquire()
    {
      return (Interlocked.CompareExchange(ref this.busyFlag, 1, 0) == 0);
    }
    /// <summary>
    /// Companion to Acquire
    /// </summary>
    internal void Release()
    {
      Interlocked.Decrement(ref this.busyFlag);
    }
    /// <summary>
    /// Copies the data from the current colour frame off the video source
    /// into a buffer so that we can later render it. Also picks up the
    /// dimensions of that frame on first use.
    /// </summary>
    /// <param name="sample"></param>
    internal void CaptureColorImage(PXCMCapture.Sample sample)
    {
      PXCMImage.ImageData colorImage;

      if (sample.color.AcquireAccess(
          PXCMImage.Access.ACCESS_READ,
          PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32,
          out colorImage) == pxcmStatus.PXCM_STATUS_NO_ERROR)
      {
        if (!this.colorFrameDimensions.HasArea)
        {
          this.colorFrameDimensions.Width = sample.color.info.width;
          this.colorFrameDimensions.Height = sample.color.info.height;
          this.xScaleMultiplierCameraToCanvas = 1.0d / this.colorFrameDimensions.Width;
          this.yScaleMultiplierCameraToCanvas = 1.0d / this.colorFrameDimensions.Height;
        }

        if (this.colorFrameBuffer == null)
        {
          this.colorFrameBuffer = new byte[
            this.colorFrameDimensions.Width * this.colorFrameDimensions.Height * 4];
        }
        colorImage.ToByteArray(0, this.colorFrameBuffer);

        sample.color.ReleaseAccess(colorImage);
      }
    }
    /// <summary>
    /// Captures the bounding boxes around the head and torso for later rendering.
    /// </summary>
    /// <param name="data"></param>
    internal void CapturePersonData(PXCMPersonTrackingData data)
    {
      this.headBox.w = this.bodyBox.w = 0;
      this.headBox.h = this.bodyBox.h = 0;

      if (data.QueryNumberOfPeople() > 0)
      {
        var personData = data.QueryPersonData(
          PXCMPersonTrackingData.AccessOrderType.ACCESS_ORDER_BY_ID, 0);

        if (personData != null)
        {
          this.CaptureHeadAndBodyBoxes(personData);
        }
      }
    }
    /// <summary>
    /// Does the work of capturing the head and torso bounding boxes.
    /// </summary>
    /// <param name="person"></param>
    void CaptureHeadAndBodyBoxes(PXCMPersonTrackingData.Person person)
    {
      var tracking = person.QueryTracking();

      var boundingBox = tracking?.Query2DBoundingBox();

      if ((boundingBox != null) &&
        (boundingBox.confidence > 50))
      {
        this.bodyBox = boundingBox.rect;
      }
      boundingBox = tracking?.QueryHeadBoundingBox();

      if ((boundingBox != null) &&
        (boundingBox.confidence > 50))
      {
        this.headBox = boundingBox.rect;
      }
    }
    /// <summary>
    /// Captures the skeletal joints - unsure exactly about the role here of StartTracking()
    /// but it seems to be needed to get the joints. 
    /// </summary>
    /// <param name="data"></param>
    internal void CaptureJointData(PXCMPersonTrackingData data)
    {
      this.currentJoints = null;

      if (data.QueryNumberOfPeople() > 0)
      {
        if (data.GetTrackingState() == PXCMPersonTrackingData.TrackingState.TRACKING_STATE_DETECTING)
        {
          data.StartTracking(0);
        }
        else
        {
          var personData = data.QueryPersonData(
            PXCMPersonTrackingData.AccessOrderType.ACCESS_ORDER_BY_ID, 0);

          var joints = personData?.QuerySkeletonJoints();
          var jointCount = joints?.QueryNumJoints();

          if (jointCount > 0)
          {
            if (!joints.QueryJoints(out this.currentJoints))
            {
              this.currentJoints = null;
            }
          }
        }
      }
    }   

    /// <summary>
    /// Renders the head and torso boxes to the Canvas by scaling them and
    /// controlling their visibility.
    /// </summary>
    internal void RenderPersonData()
    {
      if (this.headRectangle == null)
      {
        this.headRectangle = this.MakeRectangle(Colors.Red);
        this.bodyRectangle = this.MakeRectangle(Colors.Green);
      }
      this.PositionCanvasShapeForBoundingBox(this.headRectangle, 
        this.headBox.x, this.headBox.y, this.headBox.w, this.headBox.h);
      this.PositionCanvasShapeForBoundingBox(this.bodyRectangle, 
        this.bodyBox.x, this.bodyBox.y, this.bodyBox.w, this.bodyBox.h);
    }
    /// <summary>
    /// Renders the joint data to the Canvas by drawing ellipses and 
    /// trying to avoid re-creating every ellipse every time although
    /// very questionable as to whether that's better/worse than
    /// showing/hiding them as I haven't tested.
    /// </summary>
    internal void RenderJointData()
    {
      foreach (var ellipse in this.jointEllipses.Values)
      {
        ellipse.Visibility = Visibility.Hidden;
      }
      if (this.currentJoints != null)
      {
        // We shoot for the joints where there is at least some confidence.
        var confidentJoints = this.currentJoints.Where(c => c.confidenceImage > 0).ToList();

        foreach (var joint in confidentJoints)
        {
          if (!this.jointEllipses.ContainsKey(joint.jointType))
          {
            this.jointEllipses[joint.jointType] = MakeEllipse();
          }
          var ellipse = this.jointEllipses[joint.jointType];

          ellipse.Visibility = Visibility.Visible;

          this.PositionCanvasShapeTopLeftAt(
            ellipse,
            (int)joint.image.x,
            (int)joint.image.y);
        }
      }
    }   
    /// <summary>
    /// Updates the writeable bitmap with the latest frame captured.
    /// </summary>
    /// <param name="colorFrameBitmap"></param>
    internal void RenderBitmap(WriteableBitmap colorFrameBitmap)
    {
      colorFrameBitmap.WritePixels(
        this.colorFrameDimensions,
        this.colorFrameBuffer,
        this.colorFrameDimensions.Width * 4,
        0);
    }
    internal WriteableBitmap CreateBitmap()
    {
      return (new WriteableBitmap(
        this.colorFrameDimensions.Width,
        this.colorFrameDimensions.Height,
        96,
        96,
        PixelFormats.Bgra32,
        null));
    }
    void PositionCanvasShapeTopLeftAt(
      Shape shape,
      int x,
      int y)
    {
      Canvas.SetLeft(
        shape,
        this.ScaleXCameraValue(x, this.canvas.ActualWidth) - (shape.Width / 2));

      Canvas.SetTop(
        shape,
        this.ScaleYCameraValue(y, this.canvas.ActualHeight) - (shape.Height / 2));
    }
    void PositionCanvasShapeForBoundingBox(
      Shape shape,
      int x,
      int y,
      int width,
      int height)
    {
      if (width == 0)
      {
        shape.Visibility = Visibility.Hidden;
      }
      else
      {
        shape.Visibility = Visibility.Visible;

        shape.Width = this.ScaleXCameraValue(width, this.canvas.ActualWidth);

        shape.Height = this.ScaleYCameraValue(height, this.canvas.ActualHeight);

        Canvas.SetLeft(
          shape,
          this.ScaleXCameraValue(x, this.canvas.ActualWidth));

        Canvas.SetTop(
          shape,
          this.ScaleYCameraValue(y, this.canvas.ActualHeight));
      }
    }
    double ScaleXCameraValue(double value, double maximumValue)
    {
      return (value * this.xScaleMultiplierCameraToCanvas * maximumValue);
    }
    double ScaleYCameraValue(double value, double maximumValue)
    {
      return (value * this.yScaleMultiplierCameraToCanvas * maximumValue);
    }
    Ellipse MakeEllipse()
    {
      var ellipse = new Ellipse()
      {
        Fill = whiteBrush,
        Width = ELLIPSE_DIAMETER,
        Height = ELLIPSE_DIAMETER
      };
      this.canvas.Children.Add(ellipse);
      return (ellipse);
    }
    Rectangle MakeRectangle(Color rectangleColor)
    {
      var rectangle = new Rectangle()
      {
        Stroke = new SolidColorBrush(rectangleColor),
        StrokeThickness = 2
      };
      this.canvas.Children.Add(rectangle);
      return (rectangle);
    }
    SolidColorBrush whiteBrush;
    Dictionary<JointType, Ellipse> jointEllipses;
    PXCMPersonTrackingData.PersonJoints.SkeletonPoint[] currentJoints;
    double xScaleMultiplierCameraToCanvas;
    double yScaleMultiplierCameraToCanvas;
    PXCMRectI32 headBox;
    PXCMRectI32 bodyBox;
    Int32Rect colorFrameDimensions;
    byte[] colorFrameBuffer;
    int busyFlag;
    Canvas canvas;
    Rectangle headRectangle;
    Rectangle bodyRectangle;
    const int ELLIPSE_DIAMETER = 20;
  }
}

The code above won’t compile against the SDK as shipped and that’s because I spent quite a long time running this and hitting this error;

image

and this was coming from the call into QueryJoints in the SDK as you can see below (I changed my code back to be able to repro the error);

image

This is what caused me to reference the .NET SDK as source rather than as a library so that I could debug it better and I spent some time scratching my head over these 2 functions (copied from the Intel SDK file pxcmpersontrackingdata.cs);

    [DllImport(DLLNAME)]
    [return: MarshalAs(UnmanagedType.Bool)]
    internal static extern Boolean PXCMPersonTrackingData_PersonJoints_QueryJoints(IntPtr instance, IntPtr joints);

    internal static Boolean QueryJointsINT(IntPtr instance, out SkeletonPoint[] joints)
    {
      int njoints = PXCMPersonTrackingData_PersonJoints_QueryNumJoints(instance);

      IntPtr joints2 = Marshal.AllocHGlobal(Marshal.SizeOf(typeof(SkeletonPoint)) * njoints);

      Boolean sts = PXCMPersonTrackingData_PersonJoints_QueryJoints(instance, joints2);

      if (sts)
      {
        joints = new SkeletonPoint[njoints];

        for (int i = 0, j = 0; i < njoints; i++, j += Marshal.SizeOf(typeof(SkeletonPoint)))
        {
          joints[i] = new SkeletonPoint();
          Marshal.PtrToStructure(new IntPtr(joints2.ToInt64() + j), joints[i]);
        }
      }
      else
      {
        joints = null;
      }
      Marshal.FreeHGlobal(joints2);
      return sts;
    }

before deciding that I couldn’t see how the marshalling layer would know how to deal with this type SkeletonPoint (also copied from the same file in the SDK);

  [Serializable]
        [StructLayout(LayoutKind.Sequential)]
        public class SkeletonPoint
        {
            public JointType jointType;
            public Int32 confidenceImage;
            public Int32 confidenceWorld;
            public PXCMPoint3DF32 world;
            public PXCMPointF32 image;
            public Int32[] reserved;

            public SkeletonPoint()
            {
                reserved = new Int32[10];
            }
        };

and, specifically, how it would handle that array called reserved which doesn’t have any size on it. In the native SDK, I saw that it was defined as;

struct SkeletonPoint
		{
			JointType jointType;
			pxcI32 confidenceImage;
			pxcI32 confidenceWorld;
			PXCPoint3DF32 world;
			PXCPointF32   image;
			pxcI32 reserved[10];
		};

and so I figured that I’d make my own change to the definition so as to add;

            [MarshalAs(UnmanagedType.ByValArray, SizeConst = 10)]
            public Int32[] reserved;

and that seemed to help a lot except that I still struggled with this function which is part of the class PersonJoints (also taken from the same file in the SDK);

public Boolean QueryJoints(SkeletonPoint[] joints)
			{
				return QueryJointsINT(instance, out joints);
			}

because it passes the parameter joints as an out parameter to the QueryJointsINT function but joints isn’t passed as an out parameter into QueryJoints itself and so it wasn’t clear to me how the newly allocated array was supposed to get back to the caller and so I changed that definition to;

public Boolean QueryJoints(out SkeletonPoint[] joints)
			{
				return QueryJointsINT(instance, out joints);
			}

and then things started to work a bit better.

Note – I’m fairly certain that work could be done in the declaration of the external function that is called via PInvoke here in order to let the marshalling layer do more of the work around this array but, for now, I’ve tried to make the minimal change to fix the problem that I was seeing and, hopefully, the SDK will get updated and this error will go away.

As an aside, this post on the RealSense forums seems to have hit the same issue.

With that issue out of the way, I did manage to spark up the code here and get it working ‘reasonably’ with the caveats of;

    1. My performance isn’t great but I suspect I could do quite a bit to improve that.
    2. I don’t seem to get too many joints reported from the SDK – usually left hand, right hand, mid spine and head.
    3. I don’t often get the bounding box for the head reported, it seems quite sporadic.

but the bounding box for the torso seems rock solid.

I should also say that I’ve played around a few times with the SkeletonMode parameter that is passed to SetTrackingArea as part of the configuration here – I’ve varied it between its 4 options (upper body, upper body rough, full body, full body rough) to try and see which works best and I haven’t worked that out just yet.

Here’s a quick screen capture of it ‘mostly working’ Smile

I’m going to keep experimenting with this SDK as there’s a lot more data that can be obtained from it in terms of person tracking and, hopefully, more info and samples will come out around how it’s meant to work as I’m working somewhat in the dark with this at the moment but I thought I’d share this early experiment as an indication of what’s coming and so that it might help anyone else that’s experimenting with the preview too.