Kinect for Windows V2 SDK: Treating Skeletal Data as a Sequence of Sequences

In experimenting with the Kinect for Windows V2 SDK, something that I keep returning to is trying to come up with a decent way of handling the scenarios where bodies enter/leave the sensor’s tracking.

The SDK team has thought about this and made all the right bits to deal with it but I’ve found myself more than once trying to figure out a way to handle scenarios like;

  1. Body A enters the sensor’s view
  2. Body B enters the sensor’s view
  3. Body A leaves the sensor’s view
  4. Body A returns

and so on – the general idea is that while the sensor is up and running it’s going to see a lot of bodies entering/leaving its field of vision and it’d be nice to come up with a good way of handling that.

When tracking skeletal data, the SDK bits deliver 30 fps into a developer’s code with each frame containing an array of Body instances for 6 bodies although if there’s less than 6 bodies being watched by the sensor a number of those array entries will flag themselves as “not being tracked”.

What’s a bit of a struggle there is that at step 1 above the body may show up in any array element – e.g. slot index 2. Then at step 2 that second body could show up in e.g. slot index 4. Then in step 4 the original body could come back in slot index 0. That means that, as a developer, you have to watch all these array entries to see if there’s body data present in them and then act appropriately.

I wondered whether it would be good to think of all the frames that are acquired from when a body arrives in front of the sensor (step 1) through to when it departs (step 3) as a sequence of Body entries. That is – as an IObservable<Body>.

That seems like a small leap and I made that leap before in an earlier post.

What I hadn’t done in that post though was to take that one level “up” and to think of the sensor as providing a sequence of sequences of body entries. That is – the sensor can be viewed as producing an IObservable<IObservable<Body>>.

I played with this a little with the class below which is essentially managing a Dictionary<ulong,IObservable<Body>> where the ulong is the tracking ID given to a body by the SDK.

The idea of this is that when the SDK reports a new uniquely tracked body, this code creates a new Subject<Body> for it, slots it into the dictionary and then “publishes” all subsequent frames for that tracking ID via that Subject<Body>.

Should the tracking ID in question disappear from a BodyFrame then the code will mark that particular sequence as being completed and remove it from the dictionary.

namespace RecognitionK
{
  using Microsoft.Kinect;
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Reactive.Subjects;
  using System.Threading;

  public class ObservableBodySource : IDisposable
  {
    public ObservableBodySource(bool trackEntireBodies = true)
    {
      this.trackEntireBodies = trackEntireBodies;
      this.lockObject = new object();
    }
    public IObservable&lt;IObservable&lt;Body&gt;&gt; Open()
    {
      if (this.sensor != null)
      {
        throw new InvalidOperationException(&quot;DataSource is already open&quot;);
      }
      this.subBodies = new Subject&lt;IObservable&lt;Body&gt;&gt;();
      this.observables = new Dictionary&lt;ulong, Subject&lt;Body&gt;&gt;();

      this.sensor = KinectSensor.GetDefault();
      this.sensor.Open();

      this.bodies = new Body[this.sensor.BodyFrameSource.BodyCount];
      this.bodyReader = this.sensor.BodyFrameSource.OpenReader();
      this.bodyReader.FrameArrived += this.OnFrameArrived;

      return (this.subBodies);
    }
    void OnFrameArrived(object sender, BodyFrameArrivedEventArgs e)
    {
      if ((e.FrameReference != null) &amp;&amp;
        Monitor.TryEnter(this.lockObject))
      {
        using (var frame = e.FrameReference.AcquireFrame())
        {
          frame.GetAndRefreshBodyData(this.bodies);

          // we need bodies that are tracked and, dependending on the flag, we
          // might want every single joint to report itself as tracked.
          var trackedBodies = this.bodies.Where(
            b =&gt;
              (b.IsTracked) &amp;&amp;
              ((!this.trackEntireBodies) ||
              (b.Joints.All(j =&gt; j.Value.TrackingState == TrackingState.Tracked))));

          this.ProcessNewlyArrivedBodies(trackedBodies);

          this.ProcessNewlyDepartedBodies(trackedBodies);

          this.PublishFramesForTrackedBodies(trackedBodies);
        }
        Monitor.Exit(this.lockObject);
      }
    }
    void PublishFramesForTrackedBodies(IEnumerable&lt;Body&gt; trackedBodies)
    {
      foreach (var body in trackedBodies)
      {
        this.observables[body.TrackingId].OnNext(body);
      }
    }    
    void ProcessNewlyArrivedBodies(IEnumerable&lt;Body&gt; trackedBodies)
    {
      var newBodies = trackedBodies.Where(
        b =&gt; !this.observables.ContainsKey(b.TrackingId));

      foreach (var newBody in newBodies)
      {
        var newSubject = new Subject&lt;Body&gt;();
        this.observables[newBody.TrackingId] = newSubject;
        this.subBodies.OnNext(newSubject);
      }
    }
    void ProcessNewlyDepartedBodies(IEnumerable&lt;Body&gt; trackedBodies)
    {
      var oldBodies = this.observables.Keys
        .Where(
          trackingId =&gt; !trackedBodies.Any(b =&gt; b.TrackingId == trackingId))
        .ToList();

      foreach (var oldBody in oldBodies)
      {
        this.observables[oldBody].OnCompleted();
        this.observables.Remove(oldBody);
      }
    }
    public void Dispose()
    {
      this.Dispose(true);
      GC.SuppressFinalize(this);
    }
    ~ObservableBodySource()
    {
      this.Dispose(false);
    }
    void Dispose(bool disposing)
    {
      if (disposing)
      {
        lock (this.lockObject)
        {
          // NB: not attempting to dispose the sequence that was returned
          // from Open() nor any sequence that has been returned as part
          // of it - ownership of that sequence has to live with calling
          // code.
          if (this.bodyReader != null)
          {
            this.bodyReader.FrameArrived -= this.OnFrameArrived;
            this.bodyReader.Dispose();
            this.bodyReader = null;
          }
          if (this.sensor != null)
          {
            this.sensor.Close();
            this.sensor = null;
          }
        }
      }
    }
    Subject&lt;IObservable&lt;Body&gt;&gt; subBodies;
    Dictionary&lt;ulong, Subject&lt;Body&gt;&gt; observables;
    Body[] bodies;
    KinectSensor sensor;
    BodyFrameReader bodyReader;
    bool trackEntireBodies;
    object lockObject;
  }
}

I’m not sure how generally useful that could be or whether it just happens to suit the particular purpose that I’m currently trying to put it to but I thought I’d share it either way.

I can then consume this code from something like a console app by doing;

namespace TestApp
{
  using RecognitionK;
  using System;
  using System.Collections.Generic;

  class Program
  {
    static void Main(string[] args)
    {
      ObservableBodySource source = new ObservableBodySource(true);

      var bodiesSequence = source.Open();
      var bodySubscriptions = new List&lt;IDisposable&gt;();

      var bodiesSubscription = bodiesSequence.Subscribe(
        bodySequence =&gt;
        {
          Console.WriteLine(&quot;Gained body&quot;);
          IDisposable innerSub = null;

          innerSub = bodySequence.Subscribe(
            body =&gt;
            {
              Console.WriteLine(&quot;Frame from body {0}&quot;, body.TrackingId);
            },
            () =&gt;
            {
              bodySubscriptions.Remove(innerSub);
              innerSub.Dispose();
              Console.WriteLine(&quot;Lost body&quot;);
            });

          bodySubscriptions.Add(innerSub);
        });

      Console.ReadLine();

      foreach (var sub in bodySubscriptions)
      {
        sub.Dispose();
      }
      bodiesSubscription.Dispose();
      
      source.Dispose();
    }
  }
}

and that seems to work nicely enough although, naturally, there might be a few gremlins lurking in there somewhere.