Mike Taulty's Blog
Bits and Bytes from Microsoft UK
Kinect for Windows SDK V2: Mixing with some Reactive Extensions

Blogs

Mike Taulty's Blog

Elsewhere

Archives

I’ve been writing these posts about my beginner’s experiments with the Kinect for Windows V2 SDK;

with reference to the official materials up on Channel 9;

Programming-Kinect-for-Windows-v2

and it prompted Grahame to give me a shout on Twitter;

image

and ask about whether there was support in the Kinect SDK for the Reactive Extensions which made me think “what a great idea”.

The rest of this post is then just a bit of fun/sketches/observations around playing with the Kinect SDK’s skeletal tracking data and a bit of Rx. I don’t think there are a set of Observables just built into the Kinect SDK for the developer to consume but Rx is very flexible in terms of consuming lots of different events of different origin and so I figured it’d be interesting (to me) to experiment with it a little in the context of observing Kinect events.

That said, I doubt that Grahame or myself are the first to think of this and, indeed, I noticed around about the same time that my colleague Pete who’s more of a Kinect-guy than me has been writing some bits over on his blog where he’s looking at face tracking;

Face Tracking – Kinect 4 Windows v2

and where he uses Rx to get that going so that’s worth a read.

I played with a few examples just to see if I could get things going and it certainly prompted me to think about how to generate a sequence of observables from the Kinect SDK. You might have to forgive some of my use of Rx here, I’m a bit rusty.

Example 1 – Anybody There?

Let’s say that I want to build a simple UI that displays different text depending on whether the sensor is on its own;

image

or whether the sensor has friends;

image

I made a blank WPF application with a single line of text to display as above and referenced the Kinect SDK and the Reactive Extensions and then started to try and write some code after the UI had loaded.

The first thing to do seemed to be to get hold of the sensor and open it up;

      // get the sensor
      KinectSensor sensor = KinectSensor.GetDefault();
      sensor.Open();

and then to try and get hold of a reader for the frames of body frames that come from the sensor;

     // get a reader
      BodyFrameReader reader = sensor.BodyFrameSource.OpenReader();

and then to see if I could turn this set of events being delivered into an observable sequence via Rx’s methods for doing just that which seemed ok;

      // make the event into an observable
      var obsFrameEvents = Observable.FromEventPattern<BodyFrameArrivedEventArgs>(
        handler => reader.FrameArrived += handler,
        handler => reader.FrameArrived -= handler);

and then I started to get a bit “itchy”.

The “problem” is that this delivers a set of BodyFrameArrivedEventArgs but in order to make use of them I need to get hold of the BodyFrame itself and call AcquireFrame() on it and that sometimes returns NULL.

How to deal with that from the point of view of a sequence? It didn’t seem too bad;

      // acquire the frames that we can
      var obsRealFrames =
        obsFrameEvents
          .Select(frame => frame.EventArgs.FrameReference.AcquireFrame())
          .Where(frame => frame != null);

but then I need to do two more things here;

  1. Get a populated array of Body[] data from the BodyFrame.
  2. Dispose() of the BodyFrame.

and I got into a bit of a “debate” with myself about when that Dispose call needed to happen and also about whether I wanted to attempt to use one shared Body[] array for all frames along with all the complexity it might introduce around concurrent access and so on or whether I wanted to try and allocate a new Body[] array with the arrival of each frame with all the problems it might introduce around memory pressure.

I decided to try and keep things as simple as I could and so I went with this;

      var obsBodies =
        obsRealFrames
          .SelectMany(frame =>
            {
              Body[] bodies = new Body[frame.BodyCount];
              frame.GetAndRefreshBodyData(bodies);
              frame.Dispose();
              return (bodies);
            });

and so this returns to me a “flattened” sequence of Body instances which I might often want to group together in logical batches of 6. If I then want to use this to answer the original question of whether there are 0 or more bodies being tracked by the sensor then I need to group these back into sets of 6 and count how many of the Body instances report IsTracked;

      var obsBufferedBodies =
        obsBodies
          .Buffer(6)
          .Select(bodies => bodies.Count(b => b.IsTracked));

and I can then track changing values in that sequence and use it to set something in my UI to note people being present/absent in front of the sensor;

      obsBufferedBodies
        .DistinctUntilChanged()
        .Subscribe(
          c =>
          {
            this.txtStatus.Text =
              c == 0 ? "only the lonely" : "everybody's talkin' at me";
          }
      );

In doing that, the app didn’t seem to gobble up too much in the way of CPU time or memory and so I decided it would be unwise to try and improve by trying to be clever with buffers and so on.

Example 2 – Left/Right Hand Positions

I thought I’d try and observe the positions of left and right hands from the sensor. I took the same obsBodies observable that I built up above as the starting point and played with it a little;

      var obsHands = obsBodies
        .Where(
          b => b.IsTracked &&
          b.Joints[JointType.HandLeft].TrackingState != TrackingState.NotTracked &&
          b.Joints[JointType.HandRight].TrackingState != TrackingState.NotTracked)
        .Select(
          b => new
          {
            Id = b.TrackingId,
            Left = b.Joints[JointType.HandLeft].Position,
            Right = b.Joints[JointType.HandRight].Position
          }
        );

Now, to be complete, this is returning data for every LeftHand/RightHand position being tracked by the sensor so my display of it below with this subscription;

      obsHands.Subscribe(
        t =>
        {
          this.txtStatus.Text = string.Format("Left {0}, Right {1}",
            CameraSpacePointToString(t.Left), CameraSpacePointToString(t.Right));
        }
      );

would be pretty confusing if I had more than one person tracked by the sensor but it works ok if it’s just me testing it out.

The CameraSpacePointToString is just a little method;

    static string CameraSpacePointToString(CameraSpacePoint point)
    {
      return (string.Format("[{0:G3},{1:G3},{2:G3}]", point.X, point.Y, point.Z));
    }

Example 3 – Distance Between Hands

I wondered what it’d be liked to track the “distance” between left and right hands so changed the previous example to the code below. Again, this “kind of” works with a single body being tracked. With more, the output would be ‘confusing’ to say the least;

      var obsHands = obsBodies
        .Where(
          b => b.IsTracked &&
          b.Joints[JointType.HandLeft].TrackingState != TrackingState.NotTracked &&
          b.Joints[JointType.HandRight].TrackingState != TrackingState.NotTracked)
        .Select(
          b => new
          {
            Id = b.TrackingId,
            Distance = CameraSpaceDistance(
              b.Joints[JointType.HandLeft].Position,
              b.Joints[JointType.HandRight].Position)
          }
        );

      obsHands.Subscribe(
        hands =>
        {
          this.txtStatus.Text = string.Format("apart [{0:G3}]", hands.Distance);
        }
      );

and another little supporting method;

    static double CameraSpaceDistance(CameraSpacePoint p1, CameraSpacePoint p2)
    {
      return (
        Math.Sqrt(
          Math.Pow(p1.X - p2.X, 2) + 
          Math.Pow(p1.Y - p2.Y, 2) +
          Math.Pow(p1.Z - p2.Z, 2)));
    }

Example 4 – A Clap?

Trying to do something a bit more involved, I wondered if I could try and pick up the basics of a gesture such as a clap of the hands. I thought it might be “useful” to compare the current positions of the hands with previous positions continuing on from that obsHands sequence that I produced above;

      var obsHandsPrevious = obsHands
        .Skip(TimeSpan.FromSeconds(2));

      var obsHandsZipped = obsHands
        .Zip(
          obsHandsPrevious,
          (then, now) => new { Then = then, Now = now }
      );

I think this will give me a sequence { Now, Then } with the distances between the hands now and the distance around 2 seconds ago. Again, this will break completely with more than one person in front of the sensor because of the way that I’ve flattened all 6 Body entries into one sequence.

But, in the case of a single person, I could then try and see whether the distance has gone from some notion of “wide” to “narrow” over that two second period;

      var obsHandsWideToNarrow = obsHandsZipped
        .Where(
          p => (p.Then > 0.75) && (p.Now < 0.05));

      obsHandsWideToNarrow.Subscribe(
        _ =>
        {
          this.txtStatus.Text = string.Format(
            "clap at {0:hh:mm:ss}", DateTime.Now);
        }
      ); 

and that does seem to work reasonably well (although I can get one or two false positives if I ‘clap’ too quickly). Note that the 0.75 and 0.05 are just arbitrary values I got by playing around with the sensor in the configuration that I’m using it in.

Example 5 – A Swipe?

I wondered whether I could take a similar approach to a single hand moving left to right across the body. Keeping my obsBodies sequence from above, I got right of my obsHands sequence and replaced it with one relating to right hands;

      var obsRightHand = obsBodies
        .Where(
          b => b.IsTracked &&
          b.Joints[JointType.HandRight].TrackingState != TrackingState.NotTracked)
        .Select(b => b.Joints[JointType.HandRight].Position.X);

which is then a sequence of right hand positions which I can zip with a delayed version of itself to get { now, previous } pairings of right hand positions;

      var obsRightHandPrevious = obsRightHand
        .Skip(TimeSpan.FromSeconds(2));

      var obsRightHandZipped = Observable.Zip(
        obsRightHand,
        obsRightHandPrevious,
        (then, now) => new { Then = then, Now = now }
      );

and then applying some arbitrary values I get just by playing with the sensor in my setup to define “reasonably on the left/right side” I can see if the right hand has moved left->right in a 2 second period;

      var obsHandCrossBody = obsRightHandZipped
        .Where(
          p => (p.Then < -0.3) && (p.Now > 0.3));

      obsHandCrossBody.Subscribe(
        _ =>
        {
          this.txtStatus.Text = string.Format(
            "swipe at {0:hh:mm:ss}", DateTime.Now);
        }
      );  

and that seems to work reasonably well although, again, I can get some false positives on it after a swipe has been detected so it could do with improving and, again, it would still suffer from the multiple-body problem. Also, it takes no account of variations on Y,Z axes so you could get some slightly odd movements that would still qualify as a “swipe” here.

What About Handling Multiple Bodies?

I’ve struggled a little with this one. Most of the code I’ve played with above would blow up when multiple bodies were being tracked by the sensor because I’ve been producing one, flattened list of all tracked body data coming from the sensor and so I’d be mixing up data from one body with data from another which, clearly, won’t work.

I was doing this because I hadn’t managed to get my head around a good way of thinking about the data and one of the things that I kept debating was around scenarios where I might want to know;

  • When something interesting happens to data coming from a specific body (e.g. user1 moves left)
  • When something interesting happens to data coming from any body (e.g. any user moves left or all users move left).

and also the problem of scenarios where I might want to track the first 1 or 2 bodies in front of the sensor and their positions change change – e.g.

  • Jim is in front of the sensor and is tracked as body 0.
  • Jane comes along and is tracked as body 1.
  • Jim leaves the frame.
  • Is Jane now body 0 or body 1?
  • And what happens when Jim comes back?

and so on. So far, what I ended up with was using similar techniques as above to set a sequence of arrays of Body[] data from the sensor (with lots of allocations on my part);

      // get the sensor
      KinectSensor sensor = KinectSensor.GetDefault();
      sensor.Open();
      
      // get a reader
      BodyFrameReader reader = sensor.BodyFrameSource.OpenReader();

      // make the event into an observable
      var obsFrameEvents = Observable.FromEventPattern<BodyFrameArrivedEventArgs>(
        handler => reader.FrameArrived += handler,
        handler => reader.FrameArrived -= handler);

      // acquire the frames that we can
      var obsBodies = obsFrameEvents
        .Select(frame => frame.EventArgs.FrameReference.AcquireFrame())
        .Where(frame => frame != null)
        .Select(frame =>
          {
            Body[] bodies = new Body[frame.BodyCount];
            frame.GetAndRefreshBodyData(bodies);
            frame.Dispose();
            return (bodies);
          }
        );

and then trying to figure out which of the bodies in that array are actually being tracked by the sensor. In building that picture, I make an assumption that a body first tracked at some point T in time will be given a larger trackingId than one first tracked at some point earlier than T. That’s just my assumption and it could be wrong. That gives me;

      var obsTrackedBodies = obsBodies
          .Select(
            bodies =>
            {
              return (bodies
                .Where(b => b.IsTracked)
                .OrderBy(b => b.TrackingId) // assumption that this increases as bodies arrive (TBD!)
                .Select((b, i) => new { BodyNo = i, Body = b }));
            }
        );

I kept this as an array to make it easy to (e.g.) figure out how many bodies are currently being tracked. I added a text block to my UI to display this;

      var obsBodyCount = obsTrackedBodies
        .Select(bodies => bodies.Count());

      obsBodyCount
        .Subscribe(
          c =>
          {
            this.txtCount.Text = c.ToString();
          }
        );

I could equally have stamped some “frame number” style ID onto each element to associate them with each other and there’s probably countless other (and no doubt better) ways of doing it.

If I want to pick up the data for a particular body then one way might be to flatten the sequence of arrays into a sequence of Body instances as I did previously but then pick out those that identify themselves as being for a particular body – for example;

      var obsFirstBody = obsTrackedBodies
        .SelectMany(b => b)
        .Where(b => b.BodyNo == 0);

      // e.g. now extract the left hand position
      var obsLeftHandXPosition = obsFirstBody
        .Where(b => b.Body.Joints[JointType.HandLeft].TrackingState == TrackingState.Tracked)
        .Select(b => b.Body.Joints[JointType.HandLeft].Position.X);

      obsLeftHandXPosition
        .Subscribe(
          x =>
          {
            this.txtXPosition.Text = x.ToString();
          }
        );

( I added another text block to display the X position on screen ).

I find this sequence to be an “interesting” one in that it’s trying to pick up the X co-ordinate of the left hand position of the “first” body that the sensor knows about but that can change over time even by a simple act like a person wandering into and then out of the range of the sensor as per my Jim/Jane scenario above.

At the time of writing, I’m not sure whether it’s a good/bad way to think about the data, I’m going to ponder that more over time and see whether it proves to be useful or whether it needs revisiting…definitely fun to play with though Smile


Posted Fri, Aug 22 2014 12:09 PM by mtaulty

Comments

Mike Taulty's Blog wrote Kinect for Windows V2 SDK: Hello (BodyIndex, Infra Red) World for the .NET WPF Developer
on Tue, Aug 26 2014 6:09 PM

I’ve been writing these posts about my beginner’s experiments with the Kinect for Windows V2 SDK; Kinect

Mike Taulty's Blog wrote Kinect for Windows V2 SDK: Hello ‘Custom Gesture’ World for the WPF Developer
on Mon, Oct 20 2014 9:32 AM

Continuing on this series of posts where I’ve been exploring the Kinect for Windows V2 SDK , I have seen