Kinect for Windows V2 SDK: Playing with Faces

In looking across the previous posts that I’d made on Kinect for Windows V2, one of the areas that I hadn’t touched upon was how the Kinect surfaces data about the user’s face and so I thought I’d experiment with that a little here.

There are two functional areas when it comes to working with faces – there’s the regular facial data as represented by the FaceFrameResult class and the various properties hanging off it which can tell you about things like;

  • whether the user is happy (this seems pretty deep to me Smile)
  • whether the eyes are open/closed
  • whether the mouth is open/closed
  • etc.

and that’s introduced on this page in the MSDN documentation.

Then there’s the “high definition” face tracking functionality (introduced here on MSDN) which I’d categorise as more advanced and relates to determining the shape of a user’s face and the way in which the face is moving.

For this post, I’m going to duck the “high definition” functionality and work with the simpler face tracking.

Getting Started – Showing Video

In order to get started, I thought I’d make a Windows Store application and have it display video (or “color”) frames from the sensor and, as I’ve done in some previous posts, I thought I’d use Win2D in order to do my drawing.

Step one then is to create a blank Windows Store app;

image

and then to add in a reference to the Win2D library;

image

and then I can set up a Win2D CanvasControl to draw for me;

image

and handle the initial request from the CanvasControl to draw as below. Subsequent requests will be issued when the control needs to redraw itself or if I manually tell it to invalidate itself;

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
    }
    void OnDraw(CanvasControl sender, CanvasDrawEventArgs args)
    {

    }
  }
}

This is an immediate mode drawing API and so it perhaps makes sense to attempt to pull data from the Kinect sensor rather than wait for it to arrive in an event-based manner and so I’ll go through the usual steps of;

  • Getting the default sensor
  • Getting a data-source from the sensor
  • Getting a data-reader from the data-source
  • Getting a set of frames of data from the data-reader

and I sometimes shorten this down to “Sensor->Reader->Frame” or SRF for the standard way in which this pattern applies to getting data out of the SDK bits.

Aside: if you happened to be reading this post and hadn’t been through any of the previous ones and wanted a quick boost on programming with the Kinect for Windows V2 then I recently spoke at NDC London on the topic and you can find that video below;

A Lap Around the Kinect for Windows V2 SDK (Pete Daukintis, Mike Taulty)

The next step is to add a reference to the Kinect for Windows V2 SDK bits;

image

and then set about writing some code which does something in the CanvasControl:: OnDraw so as to paint some data from the Kinect sensor.

To provide a little bit of structure, I came up with the interface below to represent an object which goes through an Initialise() phase followed by repeated Update/Render phases to draw via Win2D;

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using WindowsPreview.Kinect;

  interface IRenderKinectFrames
  {
    void Initialise(KinectSensor sensor);
    void Update(ICanvasResourceCreator resourceCreator);
    void Render(CanvasDrawingSession session);
  }
}

and then I implemented this interface for the ColorFrameSource of the KinectSensor so as to attempt to draw video frames as images to the CanvasControl. This would be familiar code if you’ve looked through any of the previous blog posts that I’ve written that cover the ColorFrameSource with the one difference here being that I’m taking a “pull” approach to getting data from the Kinect sensor rather than taking an event-driven approach;

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Graphics.Canvas.DirectX;
  using System;
  using WindowsPreview.Kinect;

  class ColorFrameSourceRenderer : IRenderKinectFrames
  {
    public void Initialise(KinectSensor sensor)
    {
      this.sensor = sensor;

      this.colorReader = this.sensor.ColorFrameSource.OpenReader();

      this.colorFrameDescription = this.sensor.ColorFrameSource.CreateFrameDescription(
        ColorImageFormat.Bgra);

      this.colorBytes = new byte[
        this.colorFrameDescription.BytesPerPixel *
        this.colorFrameDescription.LengthInPixels];

      this.previousColorFrameTimeSpan = TimeSpan.FromSeconds(0);
    }
    public void Update(ICanvasResourceCreator resourceCreator)
    {
      using (ColorFrame colorFrame = this.colorReader.AcquireLatestFrame())
      {
        if ((colorFrame != null) &&
          (colorFrame.RelativeTime != this.previousColorFrameTimeSpan))
        {
          colorFrame.CopyConvertedFrameDataToArray(this.colorBytes, ColorImageFormat.Bgra);

          if (this.canvasBitmap != null)
          {
            this.canvasBitmap.Dispose();
          }
          this.canvasBitmap = CanvasBitmap.CreateFromBytes(
            resourceCreator,
            this.colorBytes,
            this.colorFrameDescription.Width,
            this.colorFrameDescription.Height,
            DirectXPixelFormat.B8G8R8A8UIntNormalized,
            CanvasAlphaBehavior.Premultiplied);

          this.previousColorFrameTimeSpan = colorFrame.RelativeTime;
        }
      }
    }
    public void Render(CanvasDrawingSession session)
    {
      if (this.canvasBitmap != null)
      {
        session.DrawImage(this.canvasBitmap);
      }
    }
    KinectSensor sensor;
    FrameDescription colorFrameDescription;
    CanvasBitmap canvasBitmap;
    TimeSpan previousColorFrameTimeSpan;
    byte[] colorBytes;
    ColorFrameReader colorReader;
  }
}

With that written, I made a basic “framework” in my code behind that wires any number of these IRenderKinectFrames implementations into the Draw event of the CanvasControl as below.

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using System.Collections.Generic;
  using Windows.UI.Xaml.Controls;
  using WindowsPreview.Kinect;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();

      this.renderList = new List<IRenderKinectFrames>()
      {
        new ColorFrameSourceRenderer()
      };
    }
    void Initialise()
    {
      if (!this.initialised)
      {
        this.sensor = KinectSensor.GetDefault();
        this.sensor.Open();

        foreach (var renderer in this.renderList)
        {
          renderer.Initialise(this.sensor);
        }

        this.initialised = true;
      }
    }
    void OnDraw(CanvasControl sender, CanvasDrawEventArgs args)
    {
      this.Initialise();

      using (CanvasDrawingSession drawSession = args.DrawingSession)
      {
        foreach (var renderer in this.renderList)
        {
          renderer.Update(sender.Device);
          renderer.Render(drawSession);
        }
      }
      sender.Invalidate();
    }
    List<IRenderKinectFrames> renderList;
    KinectSensor sensor;
    bool initialised;
  }
}

and that seems relatively clean, simple and something that I can hopefully build on to add in facial data in some manner.

A quick note on that code – the idea is that the Win2D drawing control will constantly invalidate itself and in its Draw event handler, it has a list of renderers (only one at present) that it calls to Update their data from the KinectSensor and then to Render themselves to the CanvasControl.

The ColorFrameSourceRenderer here attempts to draw whatever “most recent” color frame it has managed to retrieve from the Kinect sensor so there’s a possibility that it will “freeze” on a particular frame if (e.g.) the sensor was to stop producing frames for some reason (e.g. perhaps it got disconnected).

With that in place, I can now start to maybe add something specifically around facial data.

Adding Faces

The KinectSensor class does not have a property containing a data source for facial data, unlike the properties that it has for color, infra-red, depth, skeletal data and so on.

For me, that makes the discoverability of the FaceFrameSource class a little bit more tricky in that it’s off in a separate component that you need to reference;

image

The FaceFrameSource then works in a similar way to the VisualGestureBuilderFrameSource (which I talked about in this previous post).

That is, in order to use the FaceFrameSource you need to have code that is first handling skeletal tracking in order to obtain skeletal tracking IDs and you can then use those tracking IDs with the FaceFrameSource in order to get data frames of facial data that you can code against.

If you’re tracking 6 bodies then you’d need 6 instances of FaceFrameSource to keep track of the facial data for those 6 bodies and, naturally, you need to try and cope with the idea that bodies/faces can dynamically enter/leave the sensor’s field of vision as your app is running.

It’s not too hard though and I can fairly quickly implement my IRenderKinectFrames interface in order to capture the position of faces in the scene and (as an example) draw an ellipse to blank them out of the video;

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Kinect.Face;
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Numerics;
  using Windows.Foundation;
  using Windows.UI;
  using WindowsPreview.Kinect;
  using TrackingIdLostEventHandler =
          Windows.Foundation.TypedEventHandler<Microsoft.Kinect.Face.FaceFrameSource, Microsoft.Kinect.Face.TrackingIdLostEventArgs>;

  class FaceFrameSourceRenderer : IRenderKinectFrames
  {
    class FacialState
    {
      static readonly FaceFrameFeatures FACE_FEATURES =
        FaceFrameFeatures.BoundingBoxInColorSpace;

      public FaceFrameSource Source;
      public FaceFrameReader Reader;
      public RectI? Rectangle;

      public FacialState(KinectSensor sensor, ulong trackingId, TrackingIdLostEventHandler handler)
      {
        this.Source = new FaceFrameSource(sensor, trackingId, FACE_FEATURES);
        this.Source.TrackingIdLost += handler;
        this.Reader = this.Source.OpenReader();
      }
    }

    public void Initialise(KinectSensor sensor)
    {
      this.sensor = sensor;
      this.bodies = new Body[this.sensor.BodyFrameSource.BodyCount];
      this.bodyReader = this.sensor.BodyFrameSource.OpenReader();
      this.previousBodyFrameTimeSpan = TimeSpan.FromSeconds(0);

      this.facialState = new Dictionary<ulong, FacialState>();
    }
    public void Update(ICanvasResourceCreator resourceCreator)
    {
      using (BodyFrame bodyFrame = this.bodyReader.AcquireLatestFrame())
      {
        if ((bodyFrame != null) &&
          (bodyFrame.RelativeTime != this.previousBodyFrameTimeSpan))
        {
          bodyFrame.GetAndRefreshBodyData(this.bodies);

          var trackedBodies = this.bodies.Where(b => b.IsTracked);

          foreach (var body in trackedBodies)
          {
            if (!this.facialState.ContainsKey(body.TrackingId))
            {
              FacialState newStateEntry = new FacialState(this.sensor, body.TrackingId,
                this.OnTrackingIdLost);

              this.facialState[body.TrackingId] = newStateEntry;
            }
          }
          this.previousBodyFrameTimeSpan = bodyFrame.RelativeTime;
        }
      }
      // NB: the code above is attempting to avoid bothering if it sees the same frame
      // twice whereas this code is not doing that. it could be processing the same
      // frames twice and so the facial frames here could be "stale" with respect
      // to the body frame.
      foreach (var stateEntry in this.facialState.Values)
      {
        using (var faceFrame = stateEntry.Reader.AcquireLatestFrame())
        {
          if ((faceFrame != null) && (faceFrame.FaceFrameResult != null))
          {
            stateEntry.Rectangle = faceFrame.FaceFrameResult.FaceBoundingBoxInColorSpace;
          }
        }
      }
    }
    void OnTrackingIdLost(FaceFrameSource sender, TrackingIdLostEventArgs args)
    {
      if (this.facialState.ContainsKey(args.TrackingId))
      {
        this.facialState[args.TrackingId].Reader.Dispose();
        this.facialState.Remove(args.TrackingId);
      }
    }
    public void Render(CanvasDrawingSession session)
    {
      foreach (var stateEntry in this.facialState.Values.Where(v => v.Rectangle.HasValue))
      {
        var recti = stateEntry.Rectangle.Value;

        session.FillEllipse(
          new Vector2(
            recti.Left + ((recti.Right - recti.Left) / 2),
            recti.Top + ((recti.Bottom - recti.Top) / 2)),
            (recti.Right - recti.Left) / 2,
            (recti.Bottom - recti.Top) / 2,
            Colors.Red);
      }
    }
    TimeSpan previousBodyFrameTimeSpan;
    Dictionary<ulong, FacialState> facialState;
    BodyFrameReader bodyReader;
    Body[] bodies;
    KinectSensor sensor;
  }
}

and so this code is using the Update implementation to request body data from the sensor and whenever a newly tracked body arrives the code;

  • creates a new FaceFrameSource for the tracked body
  • opens a new FaceFrameReader to read facial data for the body
  • adds this state into a dictionary (facialState) keyed off the tracking ID for the body
  • handles the OnTrackingIdLost event in order to remove the entry from the dictionary if the body is lost

The code then queries each maintained FaceFrameReader in order to query the position of the box surrounding the face that it represents – via the FaceBoundingBoxInColorSpace property on line 77 above.

With that work all done as part of the Update cycle, by the time the code gets to Render it’s a simple matter of drawing whatever rectangles have been built up at the Update phase.

To bring this new renderer into use just required a one line change to my MainPage.xaml.cs;

    public MainPage()
    {
      this.InitializeComponent();

      this.renderList = new List<IRenderKinectFrames>()
      {
        new ColorFrameSourceRenderer(),
        new FaceFrameSourceRenderer()
      };
    }

Here’s a little screen capture of that code running;

but that’s relying purely on one piece of information about the face – the bounding box in the color (video) space. There’s more information to be gained here and I wanted to see if I could display some more info.

Replacing the Head with an Image – the Frank-O-Matic…

Rather than drawing an ellipse, I thought it might be fun to see if I could draw an image that replaced the user’s head and the most obvious head that sprung to mind was Frank Sidebottom.

If you’re not familiar with Frank then Wikipedia can help you out;

Chris Sievey/Frank Sidebottom

and with a quick addition of an image to the app and a slight modification of the code that is doing the rendering;

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Kinect.Face;
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Numerics;
  using Windows.ApplicationModel;
  using Windows.Foundation;
  using Windows.Storage;
  using Windows.UI;
  using WindowsPreview.Kinect;
  using TrackingIdLostEventHandler =
              Windows.Foundation.TypedEventHandler<Microsoft.Kinect.Face.FaceFrameSource, Microsoft.Kinect.Face.TrackingIdLostEventArgs>;

  class FaceFrameSourceRenderer : IRenderKinectFrames
  {
    class FacialState
    {
      static readonly FaceFrameFeatures FACE_FEATURES =
        FaceFrameFeatures.BoundingBoxInColorSpace;

      public FaceFrameSource Source;
      public FaceFrameReader Reader;
      public RectI? Rectangle;

      public FacialState(KinectSensor sensor, ulong trackingId, TrackingIdLostEventHandler handler)
      {
        this.Source = new FaceFrameSource(sensor, trackingId, FACE_FEATURES);
        this.Source.TrackingIdLost += handler;
        this.Reader = this.Source.OpenReader();
      }
    }

    public void Initialise(KinectSensor sensor)
    {
      this.sensor = sensor;
      this.bodies = new Body[this.sensor.BodyFrameSource.BodyCount];
      this.bodyReader = this.sensor.BodyFrameSource.OpenReader();
      this.previousBodyFrameTimeSpan = TimeSpan.FromSeconds(0);

      this.facialState = new Dictionary<ulong, FacialState>();
    }
    public void Update(ICanvasResourceCreator resourceCreator)
    {
      using (BodyFrame bodyFrame = this.bodyReader.AcquireLatestFrame())
      {
        if ((bodyFrame != null) &&
          (bodyFrame.RelativeTime != this.previousBodyFrameTimeSpan))
        {
          bodyFrame.GetAndRefreshBodyData(this.bodies);

          var trackedBodies = this.bodies.Where(b => b.IsTracked);

          foreach (var body in trackedBodies)
          {
            if (!this.facialState.ContainsKey(body.TrackingId))
            {
              FacialState newStateEntry = new FacialState(this.sensor, body.TrackingId,
                this.OnTrackingIdLost);

              this.facialState[body.TrackingId] = newStateEntry;
            }
          }
          this.previousBodyFrameTimeSpan = bodyFrame.RelativeTime;
        }
      }
      // NB: the code above is attempting to avoid bothering if it sees the same frame
      // twice whereas this code is not doing that. it could be processing the same
      // frames twice and so the facial frames here could be "stale" with respect
      // to the body frame.
      foreach (var stateEntry in this.facialState.Values)
      {
        using (var faceFrame = stateEntry.Reader.AcquireLatestFrame())
        {
          if ((faceFrame != null) && (faceFrame.FaceFrameResult != null))
          {
            stateEntry.Rectangle = faceFrame.FaceFrameResult.FaceBoundingBoxInColorSpace;
          }
        }
      }
    }
    void OnTrackingIdLost(FaceFrameSource sender, TrackingIdLostEventArgs args)
    {
      if (this.facialState.ContainsKey(args.TrackingId))
      {
        this.facialState[args.TrackingId].Reader.Dispose();
        this.facialState.Remove(args.TrackingId);
      }
    }
    public void Render(CanvasDrawingSession session)
    {
      foreach (var stateEntry in this.facialState.Values.Where(v => v.Rectangle.HasValue))
      {
        var recti = stateEntry.Rectangle.Value;

        session.DrawImage(this.frankBitmap,
          recti.Left + ((recti.Right - recti.Left) / 2) - (int)(this.frankBitmap.SizeInPixels.Width / 2),
          recti.Top + ((recti.Bottom - recti.Top) / 2) - (int)(this.frankBitmap.SizeInPixels.Height / 2));
      }
    }
    public async void CreateResources(CanvasControl control)
    {
      var frankFile = await StorageFile.GetFileFromApplicationUriAsync(
        new Uri("ms-appx:///Assets/Frank.png"));

      var frankStream = await frankFile.OpenReadAsync();

      this.frankBitmap = await CanvasBitmap.LoadAsync(
        control.Device, frankStream);
    }
    TimeSpan previousBodyFrameTimeSpan;
    Dictionary<ulong, FacialState> facialState;
    BodyFrameReader bodyReader;
    Body[] bodies;
    KinectSensor sensor;
    CanvasBitmap frankBitmap;
  }
}

and in order to get that working I also modified the interface that I’m using to represent a renderer slightly in order to add a “CreateResources” phase;

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using WindowsPreview.Kinect;

  interface IRenderKinectFrames
  {
    void Initialise(KinectSensor sensor);
    void CreateResources(CanvasControl canvasControl);
    void Update(ICanvasResourceCreator resourceCreator);
    void Render(CanvasDrawingSession session);
  }
}

which I call on all renderers when the CanvasControl fires its own CreateResources event so that each renderer gets a chance to initialise bitmaps etc if it needs to.

That gives me a “Frank-sized” head that follows the user around around as they stand in front of the sensor as per the video below;

Getting a Few More Facial Features

So far, I’ve relied purely on obtaining the bounding box of the face represented in the color space co-orindates but there’s a lot more information that facial tracking gives me. Specifically;

  • The bounding box of the face in infra-red space
  • The positions of the facial features in either color or infra-red space
    • Left eye, right eye, nose, mouth left corner, mouth right corner.
  • Whether the eyes are closed or open
  • Whether the mouth is closed/open
  • Whether the face is wearing glasses
  • Whether the face is engaged
  • Whether the face is looking away
  • The rotation of the face

and so I could do a lot more here.

It’s probably easier to do something with all those features using some kind of XAML-based user control where visual states can be used to toggle various pieces of a face on/off but given that I’d used Win2D up to this point I wondered whether I could just add a few more images to my project that represented overlays on the face above. So, I made some images in Paint.NET;

image

and added them in to my project;

image

and then modified my code such that it was bringing back more facial features and overlaying the right images based on what it was being told. All of the modified class is as below;

namespace App13
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Kinect.Face;
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Numerics;
  using System.Threading.Tasks;
  using Windows.ApplicationModel;
  using Windows.Foundation;
  using Windows.Storage;
  using Windows.UI;
  using WindowsPreview.Kinect;
  using TrackingIdLostEventHandler =
    Windows.Foundation.TypedEventHandler<Microsoft.Kinect.Face.FaceFrameSource, Microsoft.Kinect.Face.TrackingIdLostEventArgs>;

  class FaceFrameSourceRenderer : IRenderKinectFrames
  {
    class FacialState
    {
      static readonly FaceFrameFeatures FACE_FEATURES =
        FaceFrameFeatures.BoundingBoxInColorSpace |
        FaceFrameFeatures.Glasses |
        FaceFrameFeatures.LeftEyeClosed |
        FaceFrameFeatures.MouthOpen |
        FaceFrameFeatures.RightEyeClosed;

      public FaceFrameSource Source;
      public FaceFrameReader Reader;
      public FaceFrameResult FaceResult;

      public FacialState(KinectSensor sensor, ulong trackingId, TrackingIdLostEventHandler handler)
      {
        this.Source = new FaceFrameSource(sensor, trackingId, FACE_FEATURES);
        this.Source.TrackingIdLost += handler;
        this.Reader = this.Source.OpenReader();
      }
    }
    static FaceFrameSourceRenderer()
    {
      // Being lazy using tuples here when I should write a class but essentially the data-structure is
      // Item1: FACIAL PROPERTY TO CHECK
      // Item2: RESULT TO CHECK AGAINST
      // Item3: IMAGE TO DISPLAY IF IT MATCHES
      // Item4: IMAGE TO DISPLAY IF IT DOESN'T
      // and the first entry is a hack to make sure the background is always displayed
      imageOverlays = new Tuple<FaceProperty,DetectionResult,string,string>[] 
      {
        Tuple.Create(FaceProperty.Engaged, DetectionResult.Yes, "Background.png", "Background.png"),
        Tuple.Create(FaceProperty.LeftEyeClosed, DetectionResult.Yes, "LeftEyeClosed.png", "LeftEyeOpen.png"),
        Tuple.Create(FaceProperty.RightEyeClosed, DetectionResult.Yes, "RightEyeClosed.png", "RightEyeOpen.png"),
        Tuple.Create(FaceProperty.MouthOpen, DetectionResult.Yes, "MouthOpen.png", "MouthClosed.png"),
        Tuple.Create(FaceProperty.WearingGlasses, DetectionResult.Yes, "Glasses.png", (string)null)
      };
      bitmaps = new Dictionary<string, CanvasBitmap>();
    }
    public void Initialise(KinectSensor sensor)
    {
      this.sensor = sensor;
      this.bodies = new Body[this.sensor.BodyFrameSource.BodyCount];
      this.bodyReader = this.sensor.BodyFrameSource.OpenReader();
      this.previousBodyFrameTimeSpan = TimeSpan.FromSeconds(0);

      this.facialState = new Dictionary<ulong, FacialState>();
    }
    public void Update(ICanvasResourceCreator resourceCreator)
    {
      using (BodyFrame bodyFrame = this.bodyReader.AcquireLatestFrame())
      {
        if ((bodyFrame != null) &&
          (bodyFrame.RelativeTime != this.previousBodyFrameTimeSpan))
        {
          bodyFrame.GetAndRefreshBodyData(this.bodies);

          var trackedBodies = this.bodies.Where(b => b.IsTracked);

          foreach (var body in trackedBodies)
          {
            if (!this.facialState.ContainsKey(body.TrackingId))
            {
              FacialState newStateEntry = new FacialState(this.sensor, body.TrackingId,
                this.OnTrackingIdLost);

              this.facialState[body.TrackingId] = newStateEntry;
            }
          }
          this.previousBodyFrameTimeSpan = bodyFrame.RelativeTime;
        }
      }
      // NB: the code above is attempting to avoid bothering if it sees the same frame
      // twice whereas this code is not doing that. it could be processing the same
      // frames twice and so the facial frames here could be "stale" with respect
      // to the body frame.
      foreach (var stateEntry in this.facialState.Values)
      {
        using (var faceFrame = stateEntry.Reader.AcquireLatestFrame())
        {
          if ((faceFrame != null) && (faceFrame.FaceFrameResult != null))
          {
            stateEntry.FaceResult = faceFrame.FaceFrameResult;
          }
        }
      }
    }
    void OnTrackingIdLost(FaceFrameSource sender, TrackingIdLostEventArgs args)
    {
      if (this.facialState.ContainsKey(args.TrackingId))
      {
        this.facialState[args.TrackingId].Reader.Dispose();
        this.facialState.Remove(args.TrackingId);
      }
    }
    public void Render(CanvasDrawingSession session)
    {
      foreach (var stateEntry in this.facialState.Values.Where(v => v.FaceResult != null))
      {
        foreach (var item in imageOverlays)
        {
          var result = stateEntry.FaceResult.FaceProperties[item.Item1];
          string key = result == item.Item2 ? item.Item3 : item.Item4;

          if (!string.IsNullOrEmpty(key) && bitmaps.ContainsKey(key))
          {
            CanvasBitmap image = bitmaps[key];
            RectI recti = stateEntry.FaceResult.FaceBoundingBoxInColorSpace;

            session.DrawImage(image,
              recti.Left + ((recti.Right - recti.Left) / 2) - (int)(image.SizeInPixels.Width / 2),
              recti.Top + ((recti.Bottom - recti.Top) / 2) - (int)(image.SizeInPixels.Height / 2));
          }
        }
      }
    }
    public async void CreateResources(CanvasControl control)
    {
      await LoadBitmapsAsync(control.Device);
    }
    static async Task LoadBitmapAsync(CanvasDevice device, string fileName)
    {
      CanvasBitmap bitmap = null;

      var file = await StorageFile.GetFileFromApplicationUriAsync(
        new Uri(string.Format(IMAGE_URI_FORMAT_STRING, fileName)));

      var stream = await file.OpenReadAsync();

      bitmap = await CanvasBitmap.LoadAsync(device, stream);

      bitmaps[fileName] = bitmap;
    }
    static async Task LoadBitmapsAsync(CanvasDevice device)
    {
      foreach (var item in imageOverlays)
      {
        await LoadBitmapAsync(device, item.Item3);

        if (!string.IsNullOrEmpty(item.Item4))
        {
          await LoadBitmapAsync(device, item.Item4);
        }
      }
    }
    TimeSpan previousBodyFrameTimeSpan;
    Dictionary<ulong, FacialState> facialState;
    BodyFrameReader bodyReader;
    Body[] bodies;
    KinectSensor sensor;

    static readonly string IMAGE_URI_FORMAT_STRING = "ms-appx:///Assets/{0}";
    static Dictionary<string, CanvasBitmap> bitmaps;
    static Tuple<FaceProperty, DetectionResult, string, string>[] imageOverlays;
  }
}

and now I have a few image overlays to better represent what the face is actually doing as in the little screen capture below;

 

A Little More Facial Detail

There’s still facial information that I’m not making use of so I thought I’d add markers to indicate where the sensor thinks the eyes, nose, mouth are positioned and I thought I’d simply draw them with red circles.

This is a matter of making sure that I request the feature FaceFrameFeatures.PointsInColorSpace which in my current code means just changing this FACE_FEATURES variable to include it;

      static readonly FaceFrameFeatures FACE_FEATURES =
        FaceFrameFeatures.BoundingBoxInColorSpace |
        FaceFrameFeatures.Glasses |
        FaceFrameFeatures.LeftEyeClosed |
        FaceFrameFeatures.MouthOpen |
        FaceFrameFeatures.RightEyeClosed |
        FaceFrameFeatures.PointsInColorSpace;

and then I can modify the Render code to draw some red circles for each of these features with a hard-coded radius of 10.0;

        foreach (var item in stateEntry.FaceResult.FacePointsInColorSpace.Keys)
        {
          var featurePosition = stateEntry.FaceResult.FacePointsInColorSpace[item];
          session.FillCircle(
            new Vector2((float)featurePosition.X, (float)featurePosition.Y),
            10.0f,
            Colors.Red);
        }

and then I can see the underlying eyes, nose, mouth as in the video below;

That’s All Smile

I’m still not using all the facial capabilities here – e.g. I’m not doing anything with the “happiness” value that the SDK gives back to me but, for the moment, I’m going to leave the post at this.

If you want the code to play around with then it’s here to download.

I’d note that the image of Frank that I messed around with was not mine – it was taken from here so be aware of that if you plan to re-use it.

Enjoy!