Kinect for Windows V2 SDK: Hello (Color) World

I’ve watched 2 videos from the Kinect series of development on Channel 9;

Programming-Kinect-for-Windows-v2

and as a newcomer to Kinect I’m impressed by a few things;

  1. I like the design of the APIs. I’ve always been a big fan of APIs that emphasise a high level of consistency and I like the approach that the APIs seem to take to;
    1. grabbing hold of a sensor on the Kinect (or possibly multiple ones)
    2. receiving/polling for frames of data from that sensor
    3. acquiring/releasing those frames of data
  2. I like that the APIs have been made as consistent as possible across Native, Managed and WinRT layers
  3. I like that WinRT means that potentially you can code this stuff up for Windows Store apps in C++, C# and JavaScript.

but even after a couple of videos I felt that I wanted to make some kind of “Hello World” just so that it seemed like I was dipping a bit of a toe into the water and trying things out.

First off, I thought I’d play with a WPF application and see if I could make something that used the Kinect as a web cam. I figured that it wasn’t wise to attempt to be too ambitious and, in hindsight, I’m glad that I didn’t try to go too far too soon as I needed to figure some things out Smile

I made a little WPF application and added the reference to the Kinect SDK;

image

and then I made a little UI to display some buttons and an Image that I could poke the bits into from the Kinect camera;

image

which is just a bit of XAML;

<Window x:Class="WpfApplication6.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow"
        Height="350"
        Width="525">
  <Grid>
    <Image x:Name="imgColour"
           HorizontalAlignment="Stretch"
           VerticalAlignment="Stretch" />
    <StackPanel VerticalAlignment="Bottom"
                Orientation="Horizontal"
                HorizontalAlignment="Center">
      <StackPanel.Resources>
        <Style TargetType="Button">
          <Setter Property="Margin"
                  Value="5" />
        </Style>
      </StackPanel.Resources>
      <Button Content="Get Sensor"
              Click="OnGetSensor" />
      <Button Content="Open Reader"
              Click="OnOpenReader" />
      <Button Content="Close Reader"
              Click="OnCloseReader" />
      <Button Content="Close Sensor"
              Click="OnReleaseSensor" />
    </StackPanel>
  </Grid>
</Window>

Initial Attempt

Then I wrote a little class to try and provide a tiny ‘abstraction’ on top of what I wanted to do with the Kinect – i.e. open the sensor, open a reader on the ColorFrameSource and then grab video frames from that source. Version 1 of that class is as below;

  class KinectControl
  {
    KinectSensor sensor;
    ColorFrameReader reader;
    Action<byte[], int, int> frameHandler;
    byte[] frameArray;

    public KinectControl(Action<byte[], int, int> frameHandler)
    {
      this.frameHandler = frameHandler;

      this.frameArray = new byte[
        Constants.Width * Constants.Height * Constants.BytesPerPixel];
    }
    public void GetSensor()
    {
      this.sensor = KinectSensor.GetDefault();
      this.sensor.Open();
    }
    public void OpenReader()
    {
      this.reader = this.sensor.ColorFrameSource.OpenReader();
      this.reader.FrameArrived += OnFrameArrived;
    }
    public void CloseReader()
    {
      this.reader.FrameArrived -= OnFrameArrived;
      this.reader.Dispose();
      this.reader = null;
    }
    void OnFrameArrived(object sender, ColorFrameArrivedEventArgs e)
    {
      using (var frame = e.FrameReference.AcquireFrame())
      {
        if (frame != null)
        {
          frame.CopyConvertedFrameDataToArray(this.frameArray, ColorImageFormat.Bgra);

          this.frameHandler(
            this.frameArray,
            frame.FrameDescription.Width,
            frame.FrameDescription.Height);
        }
      }
    }
    public void ReleaseSensor()
    {
      this.sensor.Close();
      this.sensor = null;
    }
  }

You’ll notice that it essentially takes some “callback” function into its constructor and every time it acquires a frame successfully it attempts to call that callback function with the frame. I then wrote a little code behind my XAML file to use this class to grab the frames and update a WriteableBitmap with them as per below;

using System.Threading.Tasks;
using System.Windows;
using System.Windows.Media;
using System.Windows.Media.Imaging;

namespace WpfApplication6
{
  public partial class MainWindow : Window
  {
    KinectControl controller;

    public MainWindow()
    {
      InitializeComponent();
      this.Loaded += OnLoaded;
    }

    void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.bitmapSource = new WriteableBitmap(
        Constants.Width,
        Constants.Height,
        Constants.Dpi,
        Constants.Dpi,
        PixelFormats.Bgra32,
        null);

      this.imgColour.Source = this.bitmapSource;

      this.controller = new KinectControl(this.OnFrame);
    }
    void OnFrame(byte[] frame, int width, int height)
    {
      this.bitmapSource.Lock();

      this.bitmapSource.WritePixels(INVALID_RECT, frame,
        width * Constants.BytesPerPixel, 0);

      this.bitmapSource.AddDirtyRect(INVALID_RECT);

      this.bitmapSource.Unlock();
    }
    void OnGetSensor(object sender, RoutedEventArgs e)
    {
      this.controller.GetSensor();
    }
    void OnOpenReader(object sender, RoutedEventArgs e)
    {
      this.controller.OpenReader();
    }
    void OnCloseReader(object sender, RoutedEventArgs e)
    {
      this.controller.CloseReader();
    }
    void OnReleaseSensor(object sender, RoutedEventArgs e)
    {
      this.controller.ReleaseSensor();
    }
    static readonly Int32Rect INVALID_RECT =
      new Int32Rect(0, 0, Constants.Width, Constants.Height);

    WriteableBitmap bitmapSource;
  }
}

The basic idea that I was trying to implement here was just to have a single frame buffer that is re-used for every frame passed to me by the Kinect so that I don’t get into lots of allocations and lots of fancy tricks trying to play around with frames.

The constants here are just defined in a simple class and I’m not sure whether they’d be right on every system but they work for me for now;

  static class Constants
  {
    public static readonly int Width = 1920;
    public static readonly int Height = 1080;
    public static readonly int Dpi = 96;
    public static readonly int BytesPerPixel = 4;
  }

Now, running this code does sort of “work” but I wasn’t surprised to find that my code isn’t dealing with the frames thrown at it fast enough and the video displayed gets bogged down very quickly and it all goes a bit wrong.

I wasn’t surprised because I’m dealing with frames of 1920×1080 and so I’d have been surprised if the video was delivered ok with that sort of code.

If I tweak my code to drop a few frames (e.g. only sampling 50% of the frames) by changing my KinectControl.OnFrameArrived handler;

   long frameCount;
    void OnFrameArrived(object sender, ColorFrameArrivedEventArgs e)
    {
      using (var frame = e.FrameReference.AcquireFrame())
      {
        if (frame != null)
        {
          // ignore half of these frames...
          if (++frameCount % 2 == 0)
          {
            frame.CopyConvertedFrameDataToArray(this.frameArray, ColorImageFormat.Bgra);

            this.frameHandler(
              this.frameArray,
              frame.FrameDescription.Width,
              frame.FrameDescription.Height);
          }
        }
      }
    }

then I do get usable video, it’s quite smooth and works quite nicely – the Kinect here is positioned to my right and so has a view of me scowling into my PC and my Mac idly sitting by watching;

image

The thing is that dropping 1 in 2 frames didn’t really feel like the right thing to do so I thought I’d try a different approach.

Second Attempt

As far as I can tell – all of the code that I’m running above is running on my UI thread. That is – I have UI code that calls into my KinectControl.OpenReader code which opens up a reader;

image

and then my event handlers (KinectControl.OnFrameArrived) for the FrameArrived event all seem to have affinity in that they also arrive on my UI thread. So I have this loop of activities running on my UI thread;

  • OnFrameArrived handler
    • Acquires the frame
    • Copies it to an array
    • Calls back to the UI to update the image
      • locks the WriteableBitmap
      • Copies the pixels to it
      • Unlocks the WriteableBitmap
      • Updates the UI

and I figure that I might either try to resize the images before trying to put them on the screen or at least attempt to de-couple the process of acquiring them, converting them and then putting them onto the screen.

However, I didn’t want to end up with multiple buffers so I tried to keep this new scheme relatively simple and re-worked my KinectControl class;

using Microsoft.Kinect;
using System;
using System.Diagnostics;
using System.Threading.Tasks;

namespace WpfApplication6
{
  class KinectControl
  {
    KinectSensor sensor;
    ColorFrameReader reader;
    Func<byte[], int, int, Task> frameHandler;
    byte[] frameArray;

    public KinectControl(Func<byte[], int, int, Task> frameHandler)
    {
      this.frameHandler = frameHandler;

      this.frameArray = new byte[
        Constants.Width * Constants.Height * Constants.BytesPerPixel];
    }
    public void GetSensor()
    {
      this.sensor = KinectSensor.GetDefault();
      this.sensor.Open();
    }
    public void OpenReader()
    {
      this.reader = this.sensor.ColorFrameSource.OpenReader();
      this.reader.FrameArrived += OnFrameArrived;
    }
    public void CloseReader()
    {
      this.reader.FrameArrived -= OnFrameArrived;
      this.reader.Dispose();
      this.reader = null;
    }
    void OnFrameArrived(object sender, ColorFrameArrivedEventArgs e)
    {
      // I don't *think* this event is re-entrant in the sense that I
      // don't think we'll get it fired while we're handling it. Once
      // we return to the caller it can call us again though and that
      // can happen after I've called Task.Run and returned below.
      if (this.bufferIdle)
      {
        this.bufferIdle = false;

        Task.Run(
          async () =>
          {
            var frame = e.FrameReference.AcquireFrame();

            if (frame != null)
            {
              frame.CopyConvertedFrameDataToArray(this.frameArray, ColorImageFormat.Bgra);
              int w = frame.FrameDescription.Width;
              int h = frame.FrameDescription.Height;

              await this.frameHandler(
                this.frameArray,
                w,
                h);

              frame.Dispose();

              this.bufferIdle = true;
            }
            else
            {
              this.bufferIdle = true;
            }
          });
      }
    }
    public void ReleaseSensor()
    {
      this.sensor.Close();
      this.sensor = null;
    }
    volatile bool bufferIdle = true;
  }
}

The only real change here is that in the OnFrameArrived handler, I throw the work off into the threadpool via Task.Run. However, that means that I’ll return back to the caller quite quickly and if I want to maintain just a single buffer then I need some notion of “buffer busy” which I have a simple boolean flag doing for me. If that flag is set when a frame arrives, I simply drop that frame as I’m still busy on the last one.

Moving this work to a threadpool thread has the knock-on effect of meaning that the callback handler that updates the UI would now be called from a non-UI thread and that wouldn’t work so I have made that callback handler an “async” handler such that it can marshal its work back to the dispatcher thread and ‘wait‘ for it to finish. That means that in my calling MainWindow code I now have this slightly different implementation;

    async Task OnFrame(byte[] frame, int width, int height)
    {
      await Dispatcher.InvokeAsync(() =>
        {
          this.bitmapSource.Lock();

          this.bitmapSource.WritePixels(INVALID_RECT, frame,
            width * Constants.BytesPerPixel, 0);

          this.bitmapSource.AddDirtyRect(INVALID_RECT);

          this.bitmapSource.Unlock();

          this.txtFps.Text = ((++this.frameCount) /
            ((DateTime.Now.Ticks - this.startTicks) / 10e6)).ToString();
        });
    }

Clearly, I now have quite a lot more infrastructure going on inside of the application and I’m taking the hit for context-switching between the threads but this delivers smooth video at around 15 frames per second. I’m not 100% sure whether the Kinect would have liked to deliver 30fps to my code or not because when I commented out all of my code apart from the piece that was trying to determine frame rate, I still only saw 15fps which made me wonder whether I was either measuring things incorrectly or whether the Kinect was working in its “low light mode”.

Either way, I’m reasonably happy that I can get video off the sensor.

Coming back to my earlier point about consistency of the API design, I really like that I can just change a few pieces of code and switch from a colour view to a depth view. That is – I change my KinectControl class to;

using Microsoft.Kinect;
using System;
using System.Diagnostics;
using System.Threading.Tasks;

namespace WpfApplication6
{
  class KinectControl
  {
    public KinectControl(Func<byte[], int, int, Task> frameHandler)
    {
      this.frameHandler = frameHandler;

      this.frameArray = new byte[
        Constants.Width * Constants.Height * Constants.BytesPerPixel];

      this.depthArray = new ushort[Constants.Width * Constants.Height];
    }
    public void GetSensor()
    {
      this.sensor = KinectSensor.GetDefault();
      this.sensor.Open();
    }
    public void OpenReader()
    {
      this.reader = this.sensor.DepthFrameSource.OpenReader();
      this.reader.FrameArrived += OnFrameArrived;
    }
    public void CloseReader()
    {
      this.reader.FrameArrived -= OnFrameArrived;
      this.reader.Dispose();
      this.reader = null;
    }
    void OnFrameArrived(object sender, DepthFrameArrivedEventArgs e)
    {
      // I don't *think* this event is re-entrant in the sense that I
      // don't think we'll get it fired while we're handling it. Once
      // we return to the caller it can call us again though and that
      // can happen after I've called Task.Run and returned below.
      if (this.bufferIdle)
      {
        this.bufferIdle = false;

        var frame = e.FrameReference.AcquireFrame();

        if (frame != null)
        {
          frame.CopyFrameDataToArray(this.depthArray);

          Task.Run(async () =>
            {
              for (int i = 0; 
                i < Constants.Width * Constants.Height * Constants.BytesPerPixel; 
                i += 4)
              {
                byte val = (byte)(255 - (this.depthArray[(int)(i / 4)] / Constants.MaxDepth * 255));
                this.frameArray[i + 2] = val;
                this.frameArray[i + 3] = 0xFF;
              }
              await this.frameHandler(
                this.frameArray,
                Constants.Width,
                Constants.Height);
            });

          frame.Dispose();

          this.bufferIdle = true;
        }
        else
        {
          this.bufferIdle = true;
        }
      }
    }
    public void ReleaseSensor()
    {
      this.sensor.Close();
      this.sensor = null;
    }
    KinectSensor sensor;
    DepthFrameReader reader;
    Func<byte[], int, int, Task> frameHandler;
    byte[] frameArray;
    ushort[] depthArray;
    volatile bool bufferIdle = true;
  }
}

and I change the code behind my XAML to;

using System;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Threading;

namespace WpfApplication6
{
  public partial class MainWindow : Window
  {
    KinectControl controller;

    public MainWindow()
    {
      InitializeComponent();
      this.Loaded += OnLoaded;
    }

    void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.bitmapSource = new WriteableBitmap(
        Constants.Width,
        Constants.Height,
        Constants.Dpi,
        Constants.Dpi,
        PixelFormats.Bgra32,
        null);

      this.imgColour.Source = this.bitmapSource;

      this.controller = new KinectControl(this.OnFrame);
    }
    async Task OnFrame(byte[] frame, int width, int height)
    {
      await Dispatcher.InvokeAsync(() =>
        {
          this.bitmapSource.Lock();

          this.bitmapSource.WritePixels(INVALID_RECT, frame,
            width * Constants.BytesPerPixel, 0);

          this.bitmapSource.AddDirtyRect(INVALID_RECT);

          this.bitmapSource.Unlock();

          this.txtFps.Text = ((++this.frameCount) /
            ((DateTime.Now.Ticks - this.startTicks) / 10e6)).ToString();
        });
    }
    void OnGetSensor(object sender, RoutedEventArgs e)
    {
      this.controller.GetSensor();
    }
    void OnOpenReader(object sender, RoutedEventArgs e)
    {
      this.startTicks = DateTime.Now.Ticks;
      this.frameCount = 0;
      this.controller.OpenReader();
    }
    void OnCloseReader(object sender, RoutedEventArgs e)
    {
      this.controller.CloseReader();
    }
    void OnReleaseSensor(object sender, RoutedEventArgs e)
    {
      this.controller.ReleaseSensor();
    }
    static readonly Int32Rect INVALID_RECT =
      new Int32Rect(0, 0, Constants.Width, Constants.Height);

    WriteableBitmap bitmapSource;
    long startTicks;
    long frameCount;
  }
}

and now I’m “viewing” depth frames with brighter red tones indicating something that’s close to the sensor versus being distant from it;

image

I could tidy up that code quite a bit in order to genericise out the different frame types and so on but for the moment I’ve left it as it is.

I really like that consistent model – clearly, the SDK ships with samples that already do these kinds of things but I find it helpful to start from scratch and see how easy/hard it is to get going with this stuff and across colour/depth I’m pretty impressed. I guess next I need to try something else out like skeletal tracking or starting to combine some of these inputs.

Here’s a zip file of the project containing the colour frame data and here’s the one with the depth frame data.

Update 1 – 8th August, 2014

I realised that what I was doing in terms of defining constants to represent the size of the colour/depth frames above wasn’t at all necessary. It’s contained in the various frame descriptions and so I reworked my code to take advantage of that. I won’t dump out all the changes here in the post (it’s not very many) but I basically ended up doing something like this when I open the sensor;

    public void GetSensor()
    {
      this.sensor = KinectSensor.GetDefault();
      this.sensor.Open();

      // New bit - ask the sensor for the sizes rather than define constants.
      var description = this.sensor.ColorFrameSource.FrameDescription;

      this.frameArray = new byte[
        description.Width * description.Height * description.BytesPerPixel * 2]; // TBD on the 2

      this.FrameSize = new Size(
        this.sensor.ColorFrameSource.FrameDescription.Width,
        this.sensor.ColorFrameSource.FrameDescription.Height);
    }

and I droppped the modified source for the colour frame example here. Once again, this is mostly “just for fun” as there are samples in the SDK that do all this stuff but in writing something from scratch I find that I learn more about it.