Windows 10, 1607, UWP and Experimenting with the Kinect for Windows V2 Update

I was really pleased to see this blog post;

Kinect demo code and new driver for UWP now available

announcing a new driver which provides more access to the functionality of the Kinect for Windows V2 into Windows 10 including for the UWP developer.

I wrote a little about this topic in this earlier post around 10 months ago when some initial functionality became available for the UWP developer;

Kinect V2, Windows Hello and Perception APIs

and so it’s great to see that more functionality has become available and, specifically, that skeletal data is being surfaced.

I plugged my Kinect for Windows V2 into my Surface Pro 3 and had a look at the driver being used for Kinect.

image

and I attempted to do an update but didn’t seem to see one but it’s possible that the version of the driver which I have;

image

is the latest driver as it seems to be a week or two old. At the time of writing, I haven’t confirmed this driver version but I went on to download the C++ sample from GitHub;

Camera Stream Correlation Sample

and ran it up on my Surface Pro 3 where it initially displayed the output of the rear webcam;

image

and so I pressed the ‘Next Source’ button and it attempted to work with the RealSense camera on my machine;

image

and so I pressed the ‘Next Source’ button and things seemed to hang. I’m unsure of the status of my RealSense drivers on this machine and so I disabled the RealSense virtual camera driver;

image

and then re-ran the sample and, sure enough, I could use the ‘Next Source’ button to move to the Kinect for Windows V2 sensor and then I used the ‘Toggle Depth Fading’ button to turn that option off and the ‘Toggle Skeletal Overlay’ button to switch that option on and, sure enough, I’ve got a (flat) skeletal overlay on the colour frames and it’s delivering very smooth performance here;

image

and so that’s great to see working. Given that the sample seemed to be C++ code, I wondered what this might look like for a C# developer working with the UWP and so I set about seeing if I could reproduce some of the core of what the sample is doing here.

Getting Skeletal Data Into a C# UWP App

Rather than attempting to ‘port’ the C++ sample, I started by lifting pieces of the code that I’d written for that earlier blog post into a new project.

I made a blank app targeting SDK 14393, made sure that it had access to webcam and microphone and then added in win2d.uwp as a NuGet package and added a little UI;

<Page
    x:Class="KinectTestApp.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:KinectTestApp"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:w2d="using:Microsoft.Graphics.Canvas.UI.Xaml"
    mc:Ignorable="d">

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <TextBlock
            FontSize="36"
            HorizontalAlignment="Center"
            VerticalAlignment="Center"
            TextAlignment="Center"
            Text="No Cameras" />
        <w2d:CanvasControl
            x:Name="canvasControl"
            Visibility="Collapsed"
            SizeChanged="OnCanvasControlSizeChanged"
            Draw="OnDraw"/>
    </Grid>
</Page>

From there, I wanted to see if I could get a basic render of the colour frame from the camera along with an overlay of some skeletal points.

I’d spotted that the official samples include a project which builds out a WinRT component that is then used to interpret the custom data that comes from the Kinect via a MediaFrameReference and so I included a reference to this project into my solution so that I could use it in my C# code. That project is here and looks to stand independent of the surrounding sample. I made my project reference as below;

image

and then set about trying to see if I could write some code that got colour data and skeletal data onto the screen.

I wrote a few, small, supporting classes and named them all with an mt* prefix to try and make it more obvious which code here is mine rather than in the framework or the sample. This simple class delivers a SoftwareBitmap containing the contents of the colour frame to be fired as an event;

namespace KinectTestApp
{
  using System;
  using Windows.Graphics.Imaging;

  class mtSoftwareBitmapEventArgs : EventArgs
  {
    public SoftwareBitmap Bitmap { get; set; }
  }
}

whereas this class delivers the data that I’ve decided I need in order to draw a subset of the skeletal data onto the screen;

namespace KinectTestApp
{
  using System;

  class mtPoseTrackingFrameEventArgs : EventArgs
  {
    public mtPoseTrackingDetails[] PoseEntries { get; set; }
  }
}

and it’s a simple array which will be populated with one of these types below for each user being tracked by the sensor;

namespace KinectTestApp
{
  using System;
  using System.Linq;
  using System.Numerics;
  using Windows.Foundation;
  using Windows.Media.Devices.Core;
  using WindowsPreview.Media.Capture.Frames;

  class mtPoseTrackingDetails
  {
    public Guid EntityId { get; set; }
    public Point[] Points { get; set; }

    public static mtPoseTrackingDetails FromPoseTrackingEntity(
      PoseTrackingEntity poseTrackingEntity,
      CameraIntrinsics colorIntrinsics,
      Matrix4x4 depthColorTransform)
    {
      mtPoseTrackingDetails details = null;

      var poses = new TrackedPose[poseTrackingEntity.PosesCount];
      poseTrackingEntity.GetPoses(poses);

      var points = new Point[poses.Length];

      colorIntrinsics.ProjectManyOntoFrame(
        poses.Select(p => Multiply(depthColorTransform, p.Position)).ToArray(),
        points);

      details = new mtPoseTrackingDetails()
      {
        EntityId = poseTrackingEntity.EntityId,
        Points = points
      };
      return (details);
    }
    static Vector3 Multiply(Matrix4x4 matrix, Vector3 position)
    {
      return (new Vector3(
        position.X * matrix.M11 + position.Y * matrix.M21 + position.Z * matrix.M31 + matrix.M41,
        position.X * matrix.M12 + position.Y * matrix.M22 + position.Z * matrix.M32 + matrix.M42,
        position.X * matrix.M13 + position.Y * matrix.M23 + position.Z * matrix.M33 + matrix.M43));
    }
  }
}

which would be a simple class containing a GUID to identify the tracked person and an array of Points representing their tracked joints except that I wanted those 2D Points to be in the colour space which means having to map them from the depth space that the sensor presents them in and so the FromPoseTrackingEntity() method takes a PoseTrackingEntity which is one of the types from the referenced C++ project and;

  1. Extracts the ‘poses’ (i.e. joints in my terminology)
  2. Uses the CameraIntrinsics from the colour camera to project them onto its frame having first transformed them using a matrix which maps from depth space to colour space.

Step 2 is code that I largely duplicated from the original C++ sample after trying a few other routes which didn’t end well for me Smile

I then wrote this class which wraps up a few areas;

namespace KinectTestApp
{
  using System;
  using System.Linq;
  using System.Threading.Tasks;
  using Windows.Media.Capture;
  using Windows.Media.Capture.Frames;

  class mtMediaSourceReader
  {
    public mtMediaSourceReader(
      MediaCapture capture, 
      MediaFrameSourceKind mediaSourceKind,
      Action<MediaFrameReader> onFrameArrived,
      Func<MediaFrameSource, bool> additionalSourceCriteria = null)
    {
      this.mediaCapture = capture;
      this.mediaSourceKind = mediaSourceKind;
      this.additionalSourceCriteria = additionalSourceCriteria;
      this.onFrameArrived = onFrameArrived;
    }
    public bool InitialiseWithMediaCapture()
    {
      this.mediaSource = this.mediaCapture.FrameSources.FirstOrDefault(
        fs =>
          (fs.Value.Info.SourceKind == this.mediaSourceKind) &&
          ((this.additionalSourceCriteria != null) ? 
            this.additionalSourceCriteria(fs.Value) : true)).Value;   

      return (this.mediaSource != null);
    }
    public async Task OpenReaderAsync()
    {
      this.frameReader =
        await this.mediaCapture.CreateFrameReaderAsync(this.mediaSource);

      this.frameReader.FrameArrived +=
        (s, e) =>
        {
          this.onFrameArrived(s);
        };

      await this.frameReader.StartAsync();
    }
    Func<MediaFrameSource, bool> additionalSourceCriteria;
    Action<MediaFrameReader> onFrameArrived;
    MediaFrameReader frameReader;
    MediaFrameSource mediaSource;
    MediaCapture mediaCapture;
    MediaFrameSourceKind mediaSourceKind;
  }
}

This type takes a MediaCapture and a MediaSourceKind and can then report via the Initialise() method whether that media source kind is available on that media capture. It can also apply some additional criteria if they are provided in the constructor. This class can also create a frame reader and redirect its FrameArrived events into the method provided to the constructor. There should be some way to stop this class as well but I haven’t written that yet.

With those classes in place, I added the following mtKinectColorPoseFrameHelper;

namespace KinectTestApp
{
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Numerics;
  using System.Threading.Tasks;
  using Windows.Media.Capture;
  using Windows.Media.Capture.Frames;
  using Windows.Media.Devices.Core;
  using Windows.Perception.Spatial;
  using WindowsPreview.Media.Capture.Frames;

  class mtKinectColorPoseFrameHelper
  {
    public event EventHandler<mtSoftwareBitmapEventArgs> ColorFrameArrived;
    public event EventHandler<mtPoseTrackingFrameEventArgs> PoseFrameArrived;

    public mtKinectColorPoseFrameHelper()
    {
      this.softwareBitmapEventArgs = new mtSoftwareBitmapEventArgs();
    }
    internal async Task<bool> InitialiseAsync()
    {
      bool necessarySourcesAvailable = false;

      // Find all possible source groups.
      var sourceGroups = await MediaFrameSourceGroup.FindAllAsync();

      // We try to find the Kinect by asking for a group that can deliver
      // color, depth, custom and infrared. 
      var allGroups = await GetGroupsSupportingSourceKindsAsync(
        MediaFrameSourceKind.Color,
        MediaFrameSourceKind.Depth,
        MediaFrameSourceKind.Custom,
        MediaFrameSourceKind.Infrared);

      // We assume the first group here is what we want which is not
      // necessarily going to be right on all systems so would need
      // more care.
      var firstSourceGroup = allGroups.FirstOrDefault();

      // Got one that supports all those types?
      if (firstSourceGroup != null)
      {
        this.mediaCapture = new MediaCapture();

        var captureSettings = new MediaCaptureInitializationSettings()
        {
          SourceGroup = firstSourceGroup,
          SharingMode = MediaCaptureSharingMode.SharedReadOnly,
          StreamingCaptureMode = StreamingCaptureMode.Video,
          MemoryPreference = MediaCaptureMemoryPreference.Cpu
        };
        await this.mediaCapture.InitializeAsync(captureSettings);

        this.mediaSourceReaders = new mtMediaSourceReader[]
        {
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Color, this.OnFrameArrived),
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Depth, this.OnFrameArrived),
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Custom, this.OnFrameArrived,
            DoesCustomSourceSupportPerceptionFormat)
        };

        necessarySourcesAvailable = 
          this.mediaSourceReaders.All(reader => reader.Initialise());

        if (necessarySourcesAvailable)
        {
          foreach (var reader in this.mediaSourceReaders)
          {
            await reader.OpenReaderAsync();
          }
        }
        else
        {
          this.mediaCapture.Dispose();
        }
      }
      return (necessarySourcesAvailable);
    }
    void OnFrameArrived(MediaFrameReader sender)
    {
      var frame = sender.TryAcquireLatestFrame();

      if (frame != null)
      {
        switch (frame.SourceKind)
        {
          case MediaFrameSourceKind.Custom:
            this.ProcessCustomFrame(frame);
            break;
          case MediaFrameSourceKind.Color:
            this.ProcessColorFrame(frame);
            break;
          case MediaFrameSourceKind.Infrared:
            break;
          case MediaFrameSourceKind.Depth:
            this.ProcessDepthFrame(frame);
            break;
          default:
            break;
        }
        frame.Dispose();
      }
    }
    void ProcessDepthFrame(MediaFrameReference frame)
    {
      if (this.colorCoordinateSystem != null)
      {
        this.depthColorTransform = frame.CoordinateSystem.TryGetTransformTo(
          this.colorCoordinateSystem);
      }     
    }
    void ProcessColorFrame(MediaFrameReference frame)
    {
      if (this.colorCoordinateSystem == null)
      {
        this.colorCoordinateSystem = frame.CoordinateSystem;
        this.colorIntrinsics = frame.VideoMediaFrame.CameraIntrinsics;
      }
      this.softwareBitmapEventArgs.Bitmap = frame.VideoMediaFrame.SoftwareBitmap;
      this.ColorFrameArrived?.Invoke(this, this.softwareBitmapEventArgs);
    }
    void ProcessCustomFrame(MediaFrameReference frame)
    {
      if ((this.PoseFrameArrived != null) &&
        (this.colorCoordinateSystem != null))
      {
        var trackingFrame = PoseTrackingFrame.Create(frame);
        var eventArgs = new mtPoseTrackingFrameEventArgs();

        if (trackingFrame.Status == PoseTrackingFrameCreationStatus.Success)
        {
          // Which of the entities here are actually tracked?
          var trackedEntities =
            trackingFrame.Frame.Entities.Where(e => e.IsTracked).ToArray();

          var trackedCount = trackedEntities.Count();

          if (trackedCount > 0)
          {
            eventArgs.PoseEntries =
              trackedEntities
              .Select(entity =>
                mtPoseTrackingDetails.FromPoseTrackingEntity(entity, this.colorIntrinsics, this.depthColorTransform.Value))
              .ToArray();
          }
          this.PoseFrameArrived(this, eventArgs);
        }
      }
    }
    async static Task<IEnumerable<MediaFrameSourceGroup>> GetGroupsSupportingSourceKindsAsync(
      params MediaFrameSourceKind[] kinds)
    {
      var sourceGroups = await MediaFrameSourceGroup.FindAllAsync();

      var groups =
        sourceGroups.Where(
          group => kinds.All(
            kind => group.SourceInfos.Any(sourceInfo => sourceInfo.SourceKind == kind)));

      return (groups);
    }
    static bool DoesCustomSourceSupportPerceptionFormat(MediaFrameSource source)
    {
      return (
        (source.Info.SourceKind == MediaFrameSourceKind.Custom) &&
        (source.CurrentFormat.MajorType == PerceptionFormat) &&
        (Guid.Parse(source.CurrentFormat.Subtype) == PoseTrackingFrame.PoseTrackingSubtype));
    }
    SpatialCoordinateSystem colorCoordinateSystem;
    mtSoftwareBitmapEventArgs softwareBitmapEventArgs;
    mtMediaSourceReader[] mediaSourceReaders;
    MediaCapture mediaCapture;
    CameraIntrinsics colorIntrinsics;
    const string PerceptionFormat = "Perception";
    private Matrix4x4? depthColorTransform;
  }
}

This is essentially doing;

  1. InitialiseAsync
    1. Using the MediaFrameSourceGroup type to try and find a source group that looks like it is Kinect by searching for Infrared+Color+Depth+Custom source kinds. This isn’t a complete test and it might be better to make it more complete. Also, there’s an assumption that the first group found is the best which isn’t likely to always hold true.
    2. Initialising a MediaCapture for the group found in step 1 above.
    3. Initialising three of my mtMediaSourceReader types for the Color/Depth/Custom source kinds and adding some extra criteria for the Custom source type to try and make sure that it supports the ‘Perception’ media format – this code is essentially lifted from the original sample.
    4. Opening frame readers on those three items and handling the events as frame arrives.
  2. OnFrameArrived simply passes the frame on to sub-functions based on type and this could have been done by deriving specific mtMediaSourceReaders.
  3. ProcessDepthFrame tries to get a transformation from depth space to colour space for later use.
  4. ProcessColorFrame fires the ColorFrameArrived event with the SoftwareBitmap that has been received.
  5. ProcessCustomFrame handles the custom frame by;
    1. Using the PoseTrackingFrame.Create() method from the referenced C++ project to interpret the raw data that comes from the custom sensor.
    2. Determining how many bodies are being tracked by the data.
    3. Converts the data types from the referenced C++ project to my own data types which include less of the data and which try to map the positions of joints given using 3D depth points to their respective 2D colour space points.

Lastly, there’s some code-behind which tries to glue this into the UI;

namespace KinectTestApp
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Graphics.Canvas.UI.Xaml;
  using System.Numerics;
  using System.Threading;
  using Windows.Foundation;
  using Windows.Graphics.Imaging;
  using Windows.UI;
  using Windows.UI.Core;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += this.OnLoaded;
    }
    void OnCanvasControlSizeChanged(object sender, SizeChangedEventArgs e)
    {
      this.canvasSize = new Rect(0, 0, e.NewSize.Width, e.NewSize.Height);
    }
    async void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.helper = new mtKinectColorPoseFrameHelper();

      this.helper.ColorFrameArrived += OnColorFrameArrived;
      this.helper.PoseFrameArrived += OnPoseFrameArrived;

      var suppported = await this.helper.InitialiseAsync();

      if (suppported)
      {
        this.canvasControl.Visibility = Visibility.Visible;
      }
    }
    void OnColorFrameArrived(object sender, mtSoftwareBitmapEventArgs e)
    {
      // Note that when this function returns to the caller, we have
      // finished with the incoming software bitmap.
      if (this.bitmapSize == null)
      {
        this.bitmapSize = new Rect(0, 0, e.Bitmap.PixelWidth, e.Bitmap.PixelHeight);
      }

      if (Interlocked.CompareExchange(ref this.isBetweenRenderingPass, 1, 0) == 0)
      {
        this.lastConvertedColorBitmap?.Dispose();

        // Sadly, the format that comes in here, isn't supported by Win2D when
        // it comes to drawing so we have to convert. The upside is that 
        // we know we can keep this bitmap around until we are done with it.
        this.lastConvertedColorBitmap = SoftwareBitmap.Convert(
          e.Bitmap,
          BitmapPixelFormat.Bgra8,
          BitmapAlphaMode.Ignore);

        // Cause the canvas control to redraw itself.
        this.InvalidateCanvasControl();
      }
    }
    void InvalidateCanvasControl()
    {
      // Fire and forget.
      this.Dispatcher.RunAsync(CoreDispatcherPriority.High, this.canvasControl.Invalidate);
    }
    void OnPoseFrameArrived(object sender, mtPoseTrackingFrameEventArgs e)
    {
      // NB: we do not invalidate the control here but, instead, just keep
      // this frame around (maybe) until the colour frame redraws which will 
      // (depending on race conditions) pick up this frame and draw it
      // too.
      this.lastPoseEventArgs = e;
    }
    void OnDraw(CanvasControl sender, CanvasDrawEventArgs args)
    {
      // Capture this here (in a race) in case it gets over-written
      // while this function is still running.
      var poseEventArgs = this.lastPoseEventArgs;

      args.DrawingSession.Clear(Colors.Black);

      // Do we have a colour frame to draw?
      if (this.lastConvertedColorBitmap != null)
      {
        using (var canvasBitmap = CanvasBitmap.CreateFromSoftwareBitmap(
          this.canvasControl,
          this.lastConvertedColorBitmap))
        {
          // Draw the colour frame
          args.DrawingSession.DrawImage(
            canvasBitmap,
            this.canvasSize,
            this.bitmapSize.Value);

          // Have we got a skeletal frame hanging around?
          if (poseEventArgs?.PoseEntries?.Length > 0)
          {
            foreach (var entry in poseEventArgs.PoseEntries)
            {
              foreach (var pose in entry.Points)
              {
                var centrePoint = ScalePosePointToDrawCanvasVector2(pose);

                args.DrawingSession.FillCircle(
                  centrePoint, circleRadius, Colors.Red);
              }
            }
          }
        }
      }
      Interlocked.Exchange(ref this.isBetweenRenderingPass, 0);
    }
    Vector2 ScalePosePointToDrawCanvasVector2(Point posePoint)
    {
      return (new Vector2(
        (float)((posePoint.X / this.bitmapSize.Value.Width) * this.canvasSize.Width),
        (float)((posePoint.Y / this.bitmapSize.Value.Height) * this.canvasSize.Height)));
    }
    Rect? bitmapSize;
    Rect canvasSize;
    int isBetweenRenderingPass;
    SoftwareBitmap lastConvertedColorBitmap;
    mtPoseTrackingFrameEventArgs lastPoseEventArgs;
    mtKinectColorPoseFrameHelper helper;
    static readonly float circleRadius = 10.0f;
  }
}

I don’t think there’s too much in there that would require explanation other than that I took a couple of arbitrary decisions;

  1. That I essentially process one colour frame at a time using a form of ‘lock’ to try and drop any colour frames that arrive while I am still in the process of drawing the last colour frame and that ‘drawing’ involves both the method OnColorFrameArrived and the async call to OnDraw it causes.
  2. That I don’t force a redraw when a ‘pose’ frame arrives. Instead, the data is held until the next OnDraw call which comes from handling the colour frames.It’s certainly possible that the various race conditions involved there might cause that frame to be dropped and another to replace it in the meantime.

Even though there’s a lot of allocations going on in that code as it stands, here’s a screenshot of it running and the performance isn’t bad at all running it from my Surface Pro 3 and I’m particularly pleased with the red nose that I end up with here Smile

image

The code is quite rough and ready as I was learning as I went along and some next steps might be to;

  1. Draw joints that are inferred in a different colour to those that are properly tracked.
  2. Draw the skeleton rather than just the joints.
  3. Do quite a lot of optimisations as the code here allocates a lot.
  4. Do more tracking around entities arriving/leaving based on their IDs and handle multiple people with different colours.
  5. Refactor to specialise the mtMediaSourceReader class to have separate types for Color/Depth/Custom and thereby tidy up the code which uses this type.

but, for now, I was just trying to get some basics working.

Here’s the code on GitHub if you want to try things out and note that you’d need that additional sample code from the official samples to make it work.

5 thoughts on “Windows 10, 1607, UWP and Experimenting with the Kinect for Windows V2 Update

  1. Do you know if it is possible to use this to scan a room and save it so that save room data can be used in the hololens emulator?

  2. Hi,
    after updating, my kinect driver is still dated 18/10/2014…
    how can i do? i have to download something?

    the sample program “CameraFrames” from Windows built but running it, it says: “no surces..”

    Thanks
    Paolo

Comments are closed.