Kinect for Windows V2 SDK: Hello (BodyIndex, Infra Red) World for the .NET WPF Developer

I’ve been writing these posts about my beginner’s experiments with the Kinect for Windows V2 SDK;

with reference to the official materials up on Channel 9;

Programming-Kinect-for-Windows-v2

I realised that I hadn’t done anything with;

  • The Infra Red data source that comes from the sensor – this one probably speaks for itself and is documented here.
  • The Body Index data source that comes from the sensor – for the (up to) 6 bodies that a sensor is tracking, this source gives you frames that contain a 2D grid where each co-ordinate gives you a simple 0-5 integer representing which body (if any)  the sensor associates with that co-ordinate.

I also realised that I hadn’t tried to do anything where I link together multiple data sources to produce some kind of combined view.

I figured that what I could do is take the IR data and display it but correlate it with the body index data to remove any values that didn’t relate to a tracked body (this is far from an original idea – there’s samples like this everywhere although they sometimes use the colour video frames rather than the infra-red frames).

I stuck with using a bit of the reactive extensions as per my previous post and started off with a little WPF UI to display 2 images;

image

Simple enough stuff – a wallpaper image underneath another image that I have named myImage. From there, I wrote some code to run when the UI has finished loading;

    void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.OpenSensor();
      this.CaptureFrameDimensions();
      this.CreateBitmap();
      this.CreateObservable();

      this.obsFrameData
        .SubscribeOn(TaskPoolScheduler.Default)
        .ObserveOn(SynchronizationContext.Current)
        .Subscribe(
          fd =>
          {
            this.CopyFrameDataToBitmap(fd);
          }
      );
    } 

I’m hoping that this is a fairly “logical” structure. First off, I open up the sensor which I keep around in a member variable;

    void OpenSensor()
    {
      // get the sensor
      this.sensor = KinectSensor.GetDefault();
      sensor.Open();
    }

and then I try and figure out what the dimensions are on the frames that I’m planning to deal with – the body index frames and the infra-red frames and I keep a few things around in member variables again;

    void CaptureFrameDimensions()
    {
      this.irFrameDesc = this.sensor.InfraredFrameSource.FrameDescription;
      this.biFrameDesc = this.sensor.BodyIndexFrameSource.FrameDescription;

      this.frameRect = new Int32Rect(
        0,
        0,
        this.irFrameDesc.Width,
        this.irFrameDesc.Height);
    }

and then I create a WriteableBitmap which can be the source for the Image named myImage in my UI;

    void CreateBitmap()
    {
      this.bitmap = new WriteableBitmap(
        this.irFrameDesc.Width,
        this.irFrameDesc.Height,
        96,
        96,
        PixelFormats.Bgra32,
        null);

      this.myImage.Source = this.bitmap;
    }

and then finally I attempt to create an observable sequence of frames of data which contain both the infra-red frame and the body index frame. To do that, I made a simple class to hold both arrays of data;

    class FrameData
    {
      public ushort[] IrBits { get; set; }
      public byte[] BiBits { get; set; }

      public bool IsValid
      {
        get
        {
          return (this.IrBits != null & this.BiBits != null);
        }
      }
    }

and then I attempted to make an observable sequence of these;

    void CreateObservable()
    {
      this.indexFrameReader = sensor.OpenMultiSourceFrameReader(
        FrameSourceTypes.Infrared | FrameSourceTypes.BodyIndex);

      var events = Observable.FromEventPattern<MultiSourceFrameArrivedEventArgs>(
        handler => this.indexFrameReader.MultiSourceFrameArrived += handler,
        handler => this.indexFrameReader.MultiSourceFrameArrived -= handler);

      // lots of allocations here, going for a simple approach and hoping that 
      // the GC digs me out of the hole I'm making for myself 🙂
      this.obsFrameData = events
         .Select(
           ev => ev.EventArgs.FrameReference.AcquireFrame())
         .Where(
           frame => frame != null)
         .Select(
           frame =>
           {
             ushort[] irBits = null;

             byte[] biBits = null;
             using (InfraredFrame ir = frame.InfraredFrameReference.AcquireFrame())
             {
               using (BodyIndexFrame bi = frame.BodyIndexFrameReference.AcquireFrame())
               {
                 irBits = ir == null ? null : new ushort[this.irFrameDesc.LengthInPixels];
                 biBits = ((bi == null) || (irBits == null)) ? null : new byte[this.biFrameDesc.LengthInPixels];

                 if ((irBits != null) && (biBits != null))
                 {
                   ir.CopyFrameDataToArray(irBits);
                   bi.CopyFrameDataToArray(biBits);
                 }
               }
             }
             return (
               new FrameData
               {
                 IrBits = irBits,
                 BiBits = biBits
               }
             );
           }
         )
         .Where(
          fd => fd.IsValid);
    }

That’s quite a long bit of code Confused smile As in my previous post, I’ve taken the decision to allocate arrays for each frame that I get off the sensor and “live with the consequences” which (so far) has worked out fine running on my meaty i7 laptop with lots of RAM. I’m actually hoping that the GC mostly deals with it for me given how short the lifetime of these arrays is going to be.

So, effectively, this is just trying to use a MultiSourceFrameReader to bring back frames from both the Infrared and BodyIndex data sources at the same time. Where both of those frames can be acquired, this code produces a new instance of my FrameData class with the members IrBits and BiBits containing copies of the data that was present in those frames.

It looks a bit “wordy” but that’s what the intention is and most of the code is really there just to make sure that the code is acquiring the 2 frames that I’m asking the sensor for so it’s a couple of AcquireFrame() calls plus some null reference checks.

Once that observable sequence is set up, it gets stored into a member variable and then I consume it from that OnLoaded() method (in the first code sample above) and that, essentially, is routing the captured frame data through to a method called CopyFrameDataToBitmap which looks like;

    void CopyFrameDataToBitmap(FrameData frameData)
    {
      this.bitmap.Lock();

      unsafe
      {
        var pBackBuffer = this.bitmap.BackBuffer;

        for (int i = 0; i < frameData.IrBits.Length; i++)
        {
          // pixels not related to a body disappear...
          UInt32 colourValue = 0x00000000;

          int bodyIndex = frameData.BiBits[i];
          
          if (bodyIndex < BODY_COUNT)
          {
            // throwing away the lower 8 bits to give a 0..FF 'magnitude'
            UInt32 irTopByte = (UInt32)(frameData.IrBits[i] >> 8);

            // copy that value into red, green, blue slots with FF alpha
            colourValue = 0xFF000000 + irTopByte;
            colourValue |= irTopByte << 8;
            colourValue |= irTopByte << 16;

            // apply a mask to pickup the colour for the particular body.
            colourValue &= colourMasks[bodyIndex];
          }
          UInt32* pBufferValue = (UInt32*)(pBackBuffer + i * 4);
          *pBufferValue = colourValue;
        }
      }
      this.bitmap.AddDirtyRect(this.frameRect);

      this.bitmap.Unlock();
    }

this is making reference to a little static array of colour-based bitmasks;

    static UInt32[] colourMasks = 
    {
      0xFFFF0000, // red
      0xFF00FF00, // green
      0xFF0000FF, // blue
      0xFFFFFF00, // yellow
      0xFF00FFFF, // cyan
      0xFFFF00FF  // purple
    };
    const int BODY_COUNT = 6;

and, essentially, this code is building up the 2D image from the captured frame data by on a per-pixel basis by;

  • checking the value from the body index frame to see whether the “pixel” should be included in the image – i.e. does it correspond to a body being tracked by the sensor?
  • taking the high byte of the Infra Red data and using it as a value 0..255 and then turning that value into a shade of a colour such that each body gets a different colour and such that the ‘depth’ of the colour represents the value from the IR sensor.

and that’s pretty much it. What I really like with respect to this code is that the SDK makes it easy to acquire data from multiple sources using the same pattern as acquiring data from one source – i.e. via the MultiSourceFrameReader and I don’t have to manually do the work myself.

The sort of output that I get from this code looks something like;

image

when I’m a little more distant from the sensor and perhaps a bit more like;

image

for a case where I’m nearer to the sensor and it looks like in this particular case the infrared data is being drawn in yellow.

Again, I’m impressed with the SDK in terms of how it makes this kind of capture relatively easy – if you want the code from this post then it’s here for download. For me, I’ve taken brief looks at getting hold of;

  • video
  • depth
  • infra-red
  • body index
  • skeletal

data from the sensor. I also want to dig into some other bits and pieces and my next step would be to look into some of the controls that the Kinect SDK comes with and how they can be used to “Kinectify” a user interface…