Kinect V2, Windows Hello and Perception APIs

The Kinect team mailed their mailing list this morning to say that they have preview support for Kinect support in Windows 10 so I thought I’d give it a try. This post represents my first, tentative steps towards using those bits for the first time so be aware that I’m just trying to figure this out myself.

I should say that this support involves installing a preview device driver for the Kinect on Windows 10 and the details of how to do that are in the email that the Kinect team sent out to their list, I haven’t reproduced those details here so if you don’t have that mail and that driver then this won’t work for you.

I’ve got a couple of RealSense cameras and so I’ve been able to use Windows Hello for quite a while now but I wanted to try out Kinect both as a Windows Hello device but I’m also interested in another aspect of what’s going on here which I’ll come back to in a moment.

Kinect as a Windows Hello Device

The details of how to enable the preview driver for the Kinect are in the email that was sent out and it’s not too hard a thing to do but on my first attempt I found that Windows Hello wouldn’t run on the Kinect camera.

That is, I’d get to the stage of clicking this button;

image

and then Windows would stop and say;

“Couldn’t turn on the camera”

this was even with the updated driver that’s referenced from the email that the Kinect folks sent out.

I struggled with this for quite a while, finally deciding that I’d try the old trick of;

  1. Removing the device from device manager (leaving the driver).
  2. Rebooting.
  3. Reinstalling the device.

and that sorted it out for me such that I can now point the Kinect 4 Windows camera at my ugly mug and use it to log in;

and, so far, that works fine for me but I’m a bit more interested in the second part of what might be available here.

Kinect as a Perception Device

If you’d been following along with Kinect then you’d know that there is an SDK out there which covers;

  • Desktop Applications
  • WinRT Applications

and that the SDK is smart in the sense that the API set across the two is pretty much identical which makes it ‘easy’ to move code between the two environments – specifically, I’ve got applications that use shared projects in order to build an app for both WPF and WinRT at the same time from the same code.

However, that SDK doesn’t go beyond Windows 8.1 – while it’s possible to run these 8.1 apps on Windows 10, there’s not a version of that SDK for Windows 10 UWP and, as always, I have no special ‘insider’ visibility of what’s going on in that area and what the future does or does not hold.

However, for a little while now there have been these APIs living in Windows.Perception and Windows.Devices.Perception that look very similar to some of the APIs in the Kinect for Windows V2 SDK and which haven’t, up until now as far as I know, really done anything.

With this new driver, it feels like those APIs are starting to light up at least for the Kinect device.

Hello World – What Providers Do I Have?

I wrote some ‘Hello World’ code looking at classes within Windows.Devices.Perception. I just made a blank UWP app that had access to microphone and camera and I wrote this simple user control;

<UserControl
  x:Class="App279.DeviceList"
  xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
  xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
  xmlns:local="using:App279"
  xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
  xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
  mc:Ignorable="d"
  d:DesignHeight="300"
  d:DesignWidth="400">

  <Grid Padding="8">
    <Grid.RowDefinitions>
      <RowDefinition
        Height="Auto" />
      <RowDefinition
        Height="Auto" />
      <RowDefinition
        Height="*" />
    </Grid.RowDefinitions>
    <TextBlock
      Text="{Binding Name}" />
    <TextBox
      Grid.Row="1"
      IsReadOnly="True"
      Header="Allowed Access?"
      Text="{Binding AccessStatus}" />
    <ListView
      Grid.Row="2"
      ItemsSource="{Binding Devices}">
      <ListView.ItemTemplate>
        <DataTemplate>
          <StackPanel Orientation="Horizontal">
            <StackPanel.Resources>
                <Style
                  TargetType="TextBlock">
                  <Setter
                    Property="Margin"
                    Value="4,0,0,0" />
                </Style>
            </StackPanel.Resources>
            <TextBox
              Header="Name"
              Text="{Binding DisplayName}" />
            <TextBox
              Header="Active"
              Text="{Binding Active}" />
            <TextBox
              Header="Available"
              Text="{Binding Available}" />
            <TextBox
              Header="Width"
              Text="{Binding CameraIntrinsics.ImageWidth}" />
            <TextBox
              Header="Height"
              Text="{Binding CameraIntrinsics.ImageHeight}" />
          </StackPanel>
        </DataTemplate>
      </ListView.ItemTemplate>
    </ListView>
  </Grid>
</UserControl>

and then put 3 instances of this control onto my MainPage;

<Page
    x:Class="App279.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:App279"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d">

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
    <Grid.RowDefinitions>
      <RowDefinition />
      <RowDefinition />
    </Grid.RowDefinitions>
    <Grid.ColumnDefinitions>
      <ColumnDefinition />
      <ColumnDefinition />
    </Grid.ColumnDefinitions>
    <local:DeviceList
      x:Name="ctlColor" />
    <local:DeviceList
      x:Name="ctlDepth"
      Grid.Column="1" />
    <local:DeviceList
      x:Name="ctlIR"
      Grid.Row="1" />
  </Grid>
</Page>

and wrote some tiny code behind (this is all very hacky) to set up 3 DataContexts;

 public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += OnLoaded;
    }
    async void OnLoaded(object sender, RoutedEventArgs e)
    {
      var color = new
      {
        Name = "Color Sources",
        AccessStatus = await PerceptionColorFrameSource.RequestAccessAsync(),
        Devices = await PerceptionColorFrameSource.FindAllAsync()
      };
      this.ctlColor.DataContext = color;

      var depth = new
      {
        Name = "Depth Sources",
        AccessStatus = await PerceptionDepthFrameSource.RequestAccessAsync(),
        Devices = await PerceptionDepthFrameSource.FindAllAsync()
      };
      this.ctlDepth.DataContext = depth;

      var infraRed = new
      {
        Name = "IR Sources",
        AccessStatus = await PerceptionInfraredFrameSource.RequestAccessAsync(),
        Devices = await PerceptionInfraredFrameSource.FindAllAsync()
      };
      this.ctlIR.DataContext = infraRed;
    }
  }

using anonymous types is just laziness on my part here. With the Kinect 4 Windows V2 plugged in, I can run this and I see;

image

and so the Kinect is showing up as a colour, a depth and an infra-red camera.

As an aside, if I plug in my Intel RealSense (Front, not Rear) camera then it does show up as 2 separate colour cameras;

image

but it does not show up as a depth or infra-red camera – I wasn’t expecting it to right now, I was just checking and wondering.

Getting Some Data off the Camera

I thought I’d try and grab some IR frames from the camera. How might that look? It feels very much like Kinect SDK programming and I got something up and running quite quickly but I need to revisit this and have a think about;

  1. The bitmap conversions.
  2. The actions that occur on the dispatcher thread versus the actions that occur on other threads.

as I suspect there’s ways of doing this that are a whole lot better. I changed my UI to just include an image;

  <Grid
    Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
    <Image
      x:Name="myImage" />
  </Grid>

and I wrote some code behind to try and grab infra-red data from the camera and display it in that image (again, this was done in a few minutes, be kind to it Smile);

using System;
using System.Linq;
using System.Runtime.InteropServices;
using Windows.Devices.Perception;
using Windows.Graphics.Imaging;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media.Imaging;

namespace App279
{
  [ComImport]
  [Guid("905a0fef-bc53-11df-8c49-001e4fc686da")]
  [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  interface IBufferByteAccess
  {
    unsafe void Buffer(out byte* pByte);
  }
  [ComImport]
  [Guid("5B0D3235-4DBA-4D44-865E-8F1D0E4FD04D")]
  [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
  unsafe interface IMemoryBufferByteAccess
  {
    void GetBuffer(out byte* buffer, out uint capacity);
  }

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += OnLoaded;
    }
    async void OnLoaded(object sender, RoutedEventArgs args)
    {
      var access = await PerceptionInfraredFrameSource.RequestAccessAsync();

      if (access == PerceptionFrameSourceAccessStatus.Allowed)
      {
        var possibleSources =
          await PerceptionInfraredFrameSource.FindAllAsync();

        var firstSource =
          possibleSources.First();

        this.bitmap =
          new WriteableBitmap(
            (int)firstSource.CameraIntrinsics.ImageWidth,
            (int)firstSource.CameraIntrinsics.ImageHeight);

        this.myImage.Source = this.bitmap;

        this.reader = firstSource.OpenReader();
        this.reader.FrameArrived += HandleFrameArrivedAsync;
      }
    }
    async void HandleFrameArrivedAsync(PerceptionInfraredFrameReader sender,
      PerceptionInfraredFrameArrivedEventArgs args)
    {
      // We move the whole thing to the dispatcher thread for now because we need to
      // get back to the writeable bitmap and that's got affinity. We could probably
      // do a lot better here.
      await this.Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.High,
        () =>
        {
          this.HandleFrameArrivedDispatcherThread(args);
        }
      );
    }
    unsafe void HandleFrameArrivedDispatcherThread(PerceptionInfraredFrameArrivedEventArgs args)
    {
      using (var frame = args.TryOpenFrame())
      {
        if (frame != null)
        {
          unsafe
          {
            using (var bufferSource =
              frame.VideoFrame.SoftwareBitmap.LockBuffer(BitmapBufferAccessMode.Read))
            using (var sourceReference =
              bufferSource.CreateReference())
            {
              var sourceByteAccess = sourceReference as IMemoryBufferByteAccess;
              var pSourceBits = (byte*)null;
              uint capacity = 0;
              sourceByteAccess.GetBuffer(out pSourceBits, out capacity);

              var destByteAccess = bitmap.PixelBuffer as IBufferByteAccess;
              var pDestBits = (byte*)null;
              destByteAccess.Buffer(out pDestBits);

              var bufferStart = pSourceBits;

              for (int i = 0; i < (capacity / 2); i++)
              {
                // scaling RGB value of 0...Uint16.Max into 0...byte.MaxValue which
                // probably isn't the best idea but it's a start.
                float pct = (float)(*(UInt16*)pSourceBits / (float)UInt16.MaxValue);
                byte val = (byte)(pct * byte.MaxValue);

                *pDestBits++ = val;
                *pDestBits++ = val;
                *pDestBits++ = val;
                *pDestBits++ = 0xFF;

                pSourceBits += 2; //sizeof(Uint16)
              }
            }
          }
          this.bitmap.Invalidate();
        }
      }
    }
    PerceptionInfraredFrameReader reader;
    WriteableBitmap bitmap;
  }
}

and you can probably see that I’ve tied myself in knots a little over the WriteableBitmap and the UI thread and I suspect I could do better and I’d perhaps look to Win2D to put a ‘pull model’ on top of these video frames and especially as I think Win2D is doing nicer stuff with SoftwareBitmap which would work nicely here too I think.

Regardless, I get IR data off the camera and it displays ok;

I’d like to dig into these APIs quite a bit more now that they are coming to life and I’ll do that in follow-on posts. I’d be particularly interested in how Kinect functionality like skeletal tracking and gesture recognition might be surfaced here – that’s a TBD for me right now.

Update – Drawing with Win2D

Again, this is code that was quickly thrown together but I thought I would add in the NuGet package Win2D.uwp and then replace my UI with;

<Page
  x:Class="App279.MainPage"
  xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
  xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
  xmlns:local="using:App279"
  xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
  xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
  xmlns:w2="using:Microsoft.Graphics.Canvas.UI.Xaml"
  mc:Ignorable="d">

  <Grid
    Background="Black">
    <w2:CanvasControl
      x:Name="canvasControl"
      Draw="OnCanvasControlDraw" />
  </Grid>
</Page>

and then my code behind becomes cleaner in my view;

namespace App279
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Graphics.Canvas.UI.Xaml;
  using System;
  using System.Linq;
  using Windows.Devices.Perception;
  using Windows.Foundation;
  using Windows.Graphics.Imaging;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += OnLoaded;
    }
    async void OnLoaded(object sender, RoutedEventArgs args)
    {
      var access = await PerceptionInfraredFrameSource.RequestAccessAsync();

      if (access == PerceptionFrameSourceAccessStatus.Allowed)
      {
        var possibleSources =
          await PerceptionInfraredFrameSource.FindAllAsync();

        var firstSource =
          possibleSources.First();

        this.drawRect = new Rect(0, 0, firstSource.CameraIntrinsics.ImageWidth,
          firstSource.CameraIntrinsics.ImageHeight);

        this.irFrameReader = firstSource.OpenReader();
      }
    }
    void OnCanvasControlDraw(
      CanvasControl sender,
      CanvasDrawEventArgs args)
    {
      if (this.irFrameReader != null)
      {
        using (var frame = this.irFrameReader.TryReadLatestFrame())
        {
          if (frame != null)
          {
            // TODO: very unsure if this is how I'm supposed to spot
            // unique frames.
            if ((this.lastFrameTime == null) ||
              (this.lastFrameTime != frame.VideoFrame.RelativeTime))
            {
              // According to
              // https://github.com/Microsoft/Win2D/commit/9f03d9be78e05d4a77e34c2426163182cbd6d078
              // CanvasBitmap.CreateFromSoftwareBitmap does not support Grey16.
              // So, we convert to BGRA8.
              var convertedBitmap = SoftwareBitmap.Convert(
                frame.VideoFrame.SoftwareBitmap, BitmapPixelFormat.Bgra8,
                BitmapAlphaMode.Ignore);

              // Then we can make a CanvasBitmap. Not sure whether it makes sense to
              // keep creating new ones and whether the underlying bits do smart
              // things for us or whether we should do smart things for ourselves.

              this.bitmap?.Dispose();

              this.bitmap = CanvasBitmap.CreateFromSoftwareBitmap(
                sender.Device,
                convertedBitmap);

              this.lastFrameTime = frame.VideoFrame.RelativeTime;              
            }
          }
        }
      }
      if (this.bitmap != null)
      {
        args.DrawingSession.DrawImage(this.bitmap, this.drawRect);
      }
      this.canvasControl.Invalidate();
    }
    Rect drawRect;
    CanvasBitmap bitmap;
    PerceptionInfraredFrameReader irFrameReader;
    TimeSpan? lastFrameTime;
  }
}

but I’m still having to convert the software bitmap delivered by the perception APIs in Grey16 format into a format that Win2D is prepared to accept (I went with Bgra8) and I wish that I didn’t have to do that.

I’m also unsure as to whether I should really be re-creating the CanvasBitmap each time around the draw loop here but it performed pretty reasonably in a quick smoke test.