Intel RealSense Camera (F200): Face–Alerts

Adding to this set of posts, I thought I’d retreat from any complexity that I’d introduced in the last post and build out another console application that used the SDK’s ability to fire ‘alerts’ around facial events.

From the point of view of the SDK, an ‘alert’ is an event of type;

    public enum AlertType
    {
      ALERT_NEW_FACE_DETECTED = 1,
      ALERT_FACE_OUT_OF_FOV = 2,
      ALERT_FACE_BACK_TO_FOV = 3,
      ALERT_FACE_OCCLUDED = 4,
      ALERT_FACE_NO_LONGER_OCCLUDED = 5,
      ALERT_FACE_LOST = 6,
    }

and so the facial detection module (PXCMFaceModule) can be asked via its configuration type (PXCMFaceConfiguration) to either EnableAlert() with a specific alert or to EnableAllAlerts() to switch them all on.

That allows me to write simple console code like this;

namespace ConsoleApplication1
{
  using System;
  using System.Collections.Generic;
  using System.Linq;

  class PXCMStatusException : Exception
  {
    public PXCMStatusException(pxcmStatus status)
    {
      this.Status = status;
    }
    public pxcmStatus Status { get; private set; }
  }
  static class PXCMStatusExtensions
  {
    public static void ThrowOnFail(this pxcmStatus status)
    {
      if (!status.Succeeded())
      {
        throw new PXCMStatusException(status);
      }
    }
    public static bool Succeeded(this pxcmStatus status)
    {
      return (status == pxcmStatus.PXCM_STATUS_NO_ERROR);
    }
  }
  class Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Hit a key to end...");

      using (PXCMSenseManager senseManager = PXCMSenseManager.CreateInstance())
      {
        senseManager.EnableFace();

        var face = senseManager.QueryFace();

        using (var config = face.CreateActiveConfiguration())
        {
          config.EnableAllAlerts();

          config.SubscribeAlert(alert =>
          {
            Console.WriteLine("An alert has arrived from face {0} with detail {1}", alert.faceId, alert.label);
          });

          config.ApplyChanges().ThrowOnFail();
        }
        senseManager.Init();

        senseManager.StreamFrames(false);

        Console.ReadKey();

        face.Dispose();
      }
     }
  }
}

and then swing my ugly mug back and forth in front of the screen to drive the output like this;

consoleFaceCapture

and so it’s pretty easy to build some kind of embedded kiosk with the RealSense that can tell you whether there’s one (or more) human faces in front of the kiosk.

Naturally, it can also go much further as in my previous post where I was displaying the location data from those faces. In the next post, I’ll look into whether we can recognise those faces which seems to me to open/widen the door to a tonne of scenarios;

  • the automatic photo booth to automatically frame the subjects, know when they are smiling without having to ask Smile
  • the automatic security system that lets you through those locked doors at work purely based on your face
  • the automatic car park system which knows how long you’ve stayed because it recognises you from when you left (and perhaps can even tell you where the heck you left your car)
  • the automatic computer logon system which stops asking you for a password (oh, wait…. Winking smile)
  • etc.

Intel RealSense Camera (F200): Face–Landmarks, Expressions, Emotions & Pulse

Adding to this set of posts, I thought that I’d see if I could do something with the facial capabilities of the F200 camera although I’m not sure that I’m ready to yet explore the recognition aspects of that – I’ll leave those to a later post.

In order to do this, I reworked a WPF UI so that it became a ‘container’ for a number of controls. That UI is as below;

<Window x:Class="WpfApplication2.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:controls="clr-namespace:WpfApplication2.Controls"
        Title="MainWindow"
        Height="350"
        Width="525">
  <Grid x:Name="parentGrid">
    <Grid.ColumnDefinitions>
      <ColumnDefinition />
      <ColumnDefinition Width="Auto" />
      <ColumnDefinition />
    </Grid.ColumnDefinitions>
    <Grid.RowDefinitions>
      <RowDefinition />
      <RowDefinition Height="Auto"/>
      <RowDefinition />
    </Grid.RowDefinitions>
    <controls:ColorVideoControl Grid.Row="1" Grid.Column="1"/>
    <controls:EmotionControl Grid.Row="1" Grid.Column="1"/>
    <controls:FaceControl Grid.Row="1"
                          Grid.Column="1" />
  </Grid>
</Window>

I’ve taken an admittedly simple and not too performant approach here of ‘layering’ data from the RealSense camera such that I have 3 controls that pick up and display that data without much awareness of each other (hence the possibly poor performance);

  • ColorVideoControl
  • EmotionContol
  • FaceControl

and those are just simple UserControls in WPF that implement an interface;

namespace WpfApplication2
{
  interface ISampleRenderer
  {
    void Initialise(PXCMSenseManager senseManager);
    void ProcessSampleWorkerThread(PXCMCapture.Sample sample);
    void RenderUI(PXCMCapture.Sample sample);
    int ModuleId { get;  }
  }
}

with the idea being that the grid in my MainWindow can contain any number of these controls and can talk to them via this interface in order to feed them data from a single PXCMSenseManager instance and get them to display various aspects of that data.

The code behind the main window then becomes quite generic;

namespace WpfApplication2
{
  using System;
  using System.Collections.Generic;
  using System.Windows;
  using System.Linq;

  public partial class MainWindow : Window
  {
    public MainWindow()
    {
      InitializeComponent();
      this.Loaded += OnLoaded;
    }
    void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.senseManager = PXCMSenseManager.CreateInstance();

      this.senseManager.captureManager.SetRealtime(false);

      this.senseManager.captureManager.FilterByStreamProfiles(
        PXCMCapture.StreamType.STREAM_TYPE_COLOR, 1280, 720, 0);

      this.InitialiseRenderers();

      // this will fail enless we have at least one control
      // in the renderer list which switches on some kind
      // of modular data.
      this.senseManager.Init(
        new PXCMSenseManager.Handler()
        {
          onModuleProcessedFrame = this.OnModuleProcessedFrame
        }).ThrowOnFail();

      this.senseManager.StreamFrames(false);
    }
    void InitialiseRenderers()
    {
      this.renderers = this.BuildRenderers();

      foreach (var renderer in this.renderers)
      {
        renderer.Initialise(this.senseManager);
      }
    }
    void ForAllRenderers(int moduleId, Action<ISampleRenderer> action)
    {
      foreach (var renderer in this.renderers.Where(
        r => (r.ModuleId == -1) || (r.ModuleId == moduleId)))
      {
        action(renderer);
      }
    }
    List<ISampleRenderer> BuildRenderers()
    {
      List<ISampleRenderer> list = new List<ISampleRenderer>();

      foreach (var control in this.parentGrid.Children)
      {
        ISampleRenderer renderer = control as ISampleRenderer;
        if (renderer != null)
        {
          list.Add(renderer);
        }
      }
      return (list);
    }
    pxcmStatus OnModuleProcessedFrame(int mid, PXCMBase module, PXCMCapture.Sample sample)
    {
      ForAllRenderers(
        mid,
        r => r.ProcessSampleWorkerThread(sample));

      Dispatcher.InvokeAsync(() =>
        {
          ForAllRenderers(mid, r => r.RenderUI(sample));
        }
      );
      return (pxcmStatus.PXCM_STATUS_NO_ERROR);
    }
    IEnumerable<ISampleRenderer> renderers;
    PXCMSenseManager senseManager;
  }
}

and so this code follows a fairly simple pattern;

  • Initialise PXCMSenseManager
  • Ensure that the color stream is going to be at least 1280×720.
  • Take an event based approach to the data by handling the OnModuleProcessedFrame ‘event’.
    • If you’ve looked at any of my previous posts, you’d know that this one is new to me and comes from me using modules to process the data rather than just gathering the raw streams which arrive via the OnNewSample event. What I liked about this approach is that (it seems) that the module data also carries the color data so these frames are sync’d.
  • Build a list of the controls that are parented by my parentGrid Grid that implement my ISampleRenderer interface and ask them to initialise themselves.
  • As data arrives into my OnModuleProcessedFrame method, interrogate any underlying modules that match the passed moduleId and pass the data to them in two ways;
    • Once on the calling thread by using ProcessSampleWorkerThread
    • Once on the UI thread by using RenderUI

I’d have to say that I’m not at all sure at the time of writing that I have the ‘re-entrancy/threading’ aspects of this 2-phase approach inside of my OnModuleProcessedFrame method right. It’s a work in progress because, clearly, I’m taking a ‘fire and forget’ approach to this call to RenderUI and I suspect that it’s more than possible that a 2nd frame arrives while I’m processing the first one and I need to more than likely revisit that code. There’s also the question of the various modules delivering frames on different frequencies so this more than likely all needs more work.

Layered on top of this I have my ColorVideoControl which simply contains an Image;

<UserControl x:Class="WpfApplication2.Controls.ColorVideoControl"
             xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
             xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
             xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
             xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
             mc:Ignorable="d"
             d:DesignHeight="300"
             d:DesignWidth="300">
  <Grid>
    <Image x:Name="displayImage" />
  </Grid>
</UserControl>

and its implementation of ISampleRenderer which is really just a re-working of code I’ve used in the previous posts;

namespace WpfApplication2.Controls
{
  using System.Windows;
  using System.Windows.Controls;
  using System.Windows.Media;
  using System.Windows.Media.Imaging;

  public partial class ColorVideoControl : UserControl, ISampleRenderer
  {
    public ColorVideoControl()
    {
      InitializeComponent();
    }
    public int ModuleId
    {
      get
      {
        return (-1);
      }
    }
    public void Initialise(PXCMSenseManager senseManager)
    {
    }
    public void ProcessSampleWorkerThread(PXCMCapture.Sample sample)
    {
      this.currentColorImage = null;

      PXCMImage.ImageData colorImage;

      if (sample.color.AcquireAccess(PXCMImage.Access.ACCESS_READ,
        PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32, out colorImage).Succeeded())
      {
        this.InitialiseImageDimensions(sample.color);

        this.currentColorImage = colorImage;
      }
    }
    public void RenderUI(PXCMCapture.Sample sample)
    {
      if (this.currentColorImage != null)
      {
        this.InitialiseImage();

        this.writeableBitmap.WritePixels(
          this.imageDimensions,
          this.currentColorImage.planes[0],
          this.imageDimensions.Width * this.imageDimensions.Height * 4,
          this.imageDimensions.Width * 4);

        sample.color.ReleaseAccess(this.currentColorImage);
        this.currentColorImage = null;
      }
    }
    void InitialiseImageDimensions(PXCMImage image)
    {
      if (!this.imageDimensions.HasArea)
      {
        this.imageDimensions.Width = image.info.width;
        this.imageDimensions.Height = image.info.height;
      }
    }
    void InitialiseImage()
    {
      if (this.writeableBitmap == null)
      {
        this.writeableBitmap = new WriteableBitmap(
          this.imageDimensions.Width,
          this.imageDimensions.Height,
          96,
          96,
          PixelFormats.Bgra32,
          null);

        this.displayImage.Source = this.writeableBitmap;
      }
    }
    PXCMImage.ImageData currentColorImage;
    Int32Rect imageDimensions;
    WriteableBitmap writeableBitmap;
  }
}

so there’s nothing new there in this post that wasn’t in the previous posts except that I’m now receiving this data as part of a OnModuleProcessedFrame handler rather than an OnNewSample handler.

Capturing Emotion

What is new is the addition of what my EmotionControl does for me. This one is just a big TextBlock;

<UserControl x:Class="WpfApplication2.Controls.EmotionControl"
             xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
             xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
             xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
             xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
             mc:Ignorable="d"
             d:DesignHeight="300"
             d:DesignWidth="300">
  <Grid>
    <TextBlock Foreground="White"
               HorizontalAlignment="Left"
               VerticalAlignment="Top"
               FontSize="48"
               x:Name="txtEmotion" 
               FontFamily="Segoe UI"
               Margin="10"/>
    <Canvas x:Name="canvas" />
  </Grid>
</UserControl>

and then the implementation of ISampleRenderer that lives with that ‘UI’;

namespace WpfApplication2.Controls
{
  using System.Collections.Generic;
  using System.Windows.Controls;
  using System.Linq;
  using System.Windows.Shapes;
  using System.Windows.Media;
  using WpfApplication2.Utility;

  public partial class EmotionControl : UserControl, ISampleRenderer
  {
    public EmotionControl()
    {
      InitializeComponent();
    }
    public int ModuleId
    {
      get
      {
        return (PXCMEmotion.CUID);
      }
    }
    public void Initialise(PXCMSenseManager senseManager)
    {
      this.senseManager = senseManager;
      this.senseManager.EnableEmotion();
    }
    public void ProcessSampleWorkerThread(PXCMCapture.Sample sample)
    {
      this.emotions = new Dictionary<int, PXCMEmotion.EmotionData>();

      using (var emotion = this.senseManager.QueryEmotion())
      {
        if (emotion != null)
        {
          var faceCount = emotion.QueryNumFaces();

          for (int face = 0; face < faceCount; face++)
          {
            PXCMEmotion.EmotionData[] emotionData;

            if (emotion.QueryAllEmotionData(face, out emotionData).Succeeded())
            {
              var candidate = 
                emotionData
                .Where(
                  e => ((e.eid <= PXCMEmotion.Emotion.EMOTION_PRIMARY_SURPRISE) &&
                        (e.evidence > MIN_EVIDENCE_VALUE))
                )
                .OrderBy(e => e.evidence)
                .FirstOrDefault();
              
              if (candidate != null)
              {
                this.emotions[face] = candidate;
              }              
            }
          }
        }
      }
    }
    public void RenderUI(PXCMCapture.Sample sample)
    {
      List<string> displayItems = new List<string>();
      this.txtEmotion.Text = string.Empty;

      // Only going to do something with the first face for the moment.
      if (this.emotions.ContainsKey(0))
      {
        var emotion = this.emotions[0];

        this.txtEmotion.Text = string.Format("{0}, intensity ({1:N2})",
          emotion.GetName(),
          emotion.intensity);

        Rectangle rectangle;
        if (this.canvas.Children.Count > 0)
        {
          rectangle = (Rectangle)this.canvas.Children[0];
        }
        else
        {
          rectangle = new Rectangle()
            {
              StrokeThickness = 1,
              Stroke = Brushes.White
            };
          this.canvas.Children.Add(rectangle);
        }
        rectangle.Width = emotion.rectangle.w;
        rectangle.Height = emotion.rectangle.h;
        Canvas.SetLeft(rectangle, emotion.rectangle.x);
        Canvas.SetTop(rectangle, emotion.rectangle.y);
      }
      else
      {
        this.canvas.Children.Clear();
      }
    }
    const int MIN_EVIDENCE_VALUE = 0;
    PXCMSenseManager senseManager;
    Dictionary<int, PXCMEmotion.EmotionData> emotions;
  }
}

and so this code queries the PXCMEmotion module from the PXCMSenseManager and then calls QueryAllEmotionData() which returns a set of PXCMEmotion.EmotionData;

  public class EmotionData
  {
    public PXCMEmotion.Emotion eid;
    public PXCMEmotion.Emotion emotion;
    public int evidence;
    public int fid;
    public float intensity;
    public PXCMRectI32 rectangle;
    public long timeStamp;

    public EmotionData();
  }

and the PXCMEmotion.Emotion is a bit mask;

  {
    EMOTION_PRIMARY_ANGER = 1,
    EMOTION_PRIMARY_CONTEMPT = 2,
    EMOTION_PRIMARY_DISGUST = 4,
    EMOTION_PRIMARY_FEAR = 8,
    EMOTION_PRIMARY_JOY = 16,
    EMOTION_PRIMARY_SADNESS = 32,
    EMOTION_PRIMARY_SURPRISE = 64,
    EMOTION_SENTIMENT_POSITIVE = 65536,
    EMOTION_SENTIMENT_NEGATIVE = 131072,
    EMOTION_SENTIMENT_NEUTRAL = 262144,

where I think that (as the names suggest) the values up to SURPRISE are the primary emotions and then the top 3 values can be combined with those in order to give some sense of ‘sentiment’. I’m not entirely sure how you have NEUTRAL/FEAR and so on but there you go.

The RenderUI function simply chooses the emotions associated with the first face and displays them in a text block. It also uses the rectangle that’s provided (which seems to be in 2D co-ords to match the image) to draw a white rectangle around the face co-ordinates as well.

So, a tiny bit of code with the emotion module and I can tell whether someone is expressing ‘disgust’ at my software and I can find their face in the video frame in order to direct my remote-controlled robot arm to squirt them with a water pistol Smile

Capturing Face

I also wrote this little FaceControl user control which introduces yet another XAML Canvas to draw on (hey, why use one Canvas when you can use many? Winking smile).

Here’s the UI portion;

<UserControl x:Class="WpfApplication2.Controls.FaceControl"
             xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
             xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
             xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
             xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
             mc:Ignorable="d"
             d:DesignHeight="300"
             d:DesignWidth="300">
  <Grid>
    <Canvas x:Name="canvas" />
    <TextBlock x:Name="txtPulse"
               FontSize="48"
               HorizontalAlignment="Left"
               VerticalAlignment="Bottom"
               Foreground="Red"
               FontFamily="Segoe UI"
               Margin="10"/>
    <TextBlock x:Name="txtExpressions"
               FontSize="24"
               HorizontalAlignment="Right"
               VerticalAlignment="Top"
               Foreground="Red"
               FontFamily="Segoe UI"
               Margin="10" 
               TextAlignment="Right"/>
  </Grid>
</UserControl>

and so it’s just a Canvas and a bunch of text blocks and here’s the code that deals with the data;

namespace WpfApplication2.Controls
{
  using System;
  using System.Linq;
  using System.Text;
  using System.Windows.Controls;
  using System.Windows.Media;
  using System.Windows.Shapes;
  using WpfApplication2.Utility;

  public partial class FaceControl : UserControl, ISampleRenderer
  {
    public FaceControl()
    {
      InitializeComponent();
    }
    public int ModuleId
    {
      get { return (PXCMFaceModule.CUID); }
    }
    public void Initialise(PXCMSenseManager senseManager)
    {
      this.senseManager = senseManager;

      // TODO: I'm not so sure about which objects I'm meant to keep around here
      // From experimentation it seemed that I needed to keep a handle on the
      // PXCMFaceData otherwise I don't seem to get any data when I query later
      // on (i.e. I can't keep creating/disposing it). I'm unsure about the
      // other objects but I found that I seem to be able to dispose the
      // config once I'm done with it.
      this.senseManager.EnableFace();
      this.faceModule = this.senseManager.QueryFace();

      using (var config = faceModule.CreateActiveConfiguration())
      {
        config.detection.isEnabled = true;
        config.detection.maxTrackedFaces = 1;
        config.landmarks.isEnabled = true;
        config.landmarks.maxTrackedFaces = 1;

        var pulseConfig = config.QueryPulse();
        pulseConfig.properties.maxTrackedFaces = 1;
        pulseConfig.Enable();

        var expressionConfig = config.QueryExpressions();
        expressionConfig.EnableAllExpressions();
        expressionConfig.Enable();

        config.ApplyChanges().ThrowOnFail();
      }
      this.faceData = this.faceModule.CreateOutput();
    }
    public void ProcessSampleWorkerThread(PXCMCapture.Sample sample)
    {
      this.heartRate = 0.0f;
      this.landmarks = null;
      this.expressionDescription = string.Empty;

      if (this.faceData.Update().Succeeded())
      {
        var faces = this.faceData.QueryFaces();
        var first = faces.FirstOrDefault();

        if (first != null)
        {
          // pulse
          this.heartRate = first.QueryPulse().QueryHeartRate();

          // facial landmarks
          PXCMFaceData.LandmarkPoint[] localLandmarks;          
          
          var landmarks = first.QueryLandmarks();

          if ((landmarks != null) && landmarks.QueryPoints(out localLandmarks))
          {
            this.landmarks = localLandmarks;
          }

          // facial expressions
          var expressions = first.QueryExpressions();
          
          if (expressions != null)
          {
            PXCMFaceData.ExpressionsData.FaceExpressionResult result;
            StringBuilder builder = new StringBuilder();

            foreach (PXCMFaceData.ExpressionsData.FaceExpression value in 
              Enum.GetValues(typeof(PXCMFaceData.ExpressionsData.FaceExpression)))
            {
              if (expressions.QueryExpression ( value, out result ) &&
                (result.intensity > MIN_INTENSITY))
              {
                builder.AppendFormat(
                  "{0}{1} ({1:G2})", 
                  builder.Length == 0 ? string.Empty : Environment.NewLine,
                  value.GetName(), 
                  result.intensity);
              }
            }
            this.expressionDescription = builder.ToString();
          }
        }
      }
    }
    public void RenderUI(PXCMCapture.Sample sample)
    {
      this.canvas.Children.Clear();

      this.txtPulse.Text =
        string.Format("{0} bpm", this.heartRate > 0 ? this.heartRate.ToString() : "N/A");

      if (this.landmarks != null)
      {
        foreach (var landmark in this.landmarks.Where(l => l.confidenceImage > MIN_CONFIDENCE))
        {
          this.canvas.Children.Add(this.MakeEllipseAtImagePoint(landmark.image));
        }
      }

      this.txtExpressions.Text = this.expressionDescription;
    }
    Ellipse MakeEllipseAtImagePoint(PXCMPointF32 point)
    {
      Ellipse ellipse = new Ellipse()
      {
        Width = LANDMARK_ELLIPSE_WIDTH,
        Height = LANDMARK_ELLIPSE_WIDTH,
        Fill = Brushes.Red
      };
      Canvas.SetLeft(ellipse, point.x);
      Canvas.SetTop(ellipse, point.y);
      return (ellipse);
    }
    const int MIN_INTENSITY = 80;
    const int MIN_CONFIDENCE = 50;
    const int LANDMARK_ELLIPSE_WIDTH = 3;
    string expressionDescription;
    PXCMFaceData.LandmarkPoint[] landmarks;
    float heartRate;
    PXCMFaceData faceData;
    PXCMFaceModule faceModule;
    PXCMSenseManager senseManager;
  }
}

I felt a lot more shaky on the object model here because it seems that I have to deal with the PXCMFaceModule but then there’s also some PXCMFaceConfiguration which I attempt to set up to;

  • ask for face detection of 1 face
  • ask for facial ‘landmarks’ (i.e. interesting facial points) to be captured
  • ask for a pulse/heartRate estimate to be captured
  • ask for all facial expressions to be captured

Once that is set up, it seems that I need to get hold of a PXCMFaceData instance by calling the CreateOutput() member on the PXCMFaceModule and then, as frames arrive, it seems the order of the day is to called an Update() method on that object.

This wasn’t very intuitive to me and I’m not at all sure that I have it right beyond “it seems to work”. It didn’t seem quite in step with way I’ve gone about getting data so far but, nonetheless, you can see that in my ProcessSampleWorkerThread method I essentially;

  • Use the PXCMFaceData.QueryPulse method to see if I can get an estimate of the pulse rate (of the first face)
  • Use QueryLandmarks to return what seem to be ~80 points of interest on the face
  • Use QueryExpressions to return which of the following expressions are visible and with what intensity;
    public enum FaceExpression
    {
      EXPRESSION_BROW_RAISER_LEFT = 0,
      EXPRESSION_BROW_RAISER_RIGHT = 1,
      EXPRESSION_BROW_LOWERER_LEFT = 2,
      EXPRESSION_BROW_LOWERER_RIGHT = 3,
      EXPRESSION_SMILE = 4,
      EXPRESSION_KISS = 5,
      EXPRESSION_MOUTH_OPEN = 6,
      EXPRESSION_EYES_CLOSED_LEFT = 7,
      EXPRESSION_EYES_CLOSED_RIGHT = 8,
      EXPRESSION_HEAD_TURN_LEFT = 9,
      EXPRESSION_HEAD_TURN_RIGHT = 10,
      EXPRESSION_HEAD_UP = 11,
      EXPRESSION_HEAD_DOWN = 12,
      EXPRESSION_HEAD_TILT_LEFT = 13,
      EXPRESSION_HEAD_TILT_RIGHT = 14,
      EXPRESSION_EYES_TURN_LEFT = 15,
      EXPRESSION_EYES_TURN_RIGHT = 16,
      EXPRESSION_EYES_UP = 17,
      EXPRESSION_EYES_DOWN = 18,
      EXPRESSION_TONGUE_OUT = 19,
      EXPRESSION_PUFF_RIGHT = 20,
      EXPRESSION_PUFF_LEFT = 21,
    }

the RenderUI function then draws ellipses to the screen for each of the facial landmarks and updates some text blocks with the details of the captured pulse rate and expressions.

Bringing that Together

Pulling together that sketchy code then gives me a screen that displays colour video, emotional data, expression data, a bounding rectangle for a single recognised face along with ~80 landmarks around that face.

It looks like this where the white items are coming from the EmotionControl and the red items are coming from the FaceControl and this was my attempt to do a ‘FEAR face’ which seemed to turn into more of a ‘STUPID face’ Winking smile

Screeny

and you’ll spot that the SDK things that I have my left and right brows raised and so on.

The code is here is pretty rough and ready but I’m impressed that I can get quite a lot of data here from a pretty small amount of code and, naturally, it’d be possible to tidy this up.

In the meantime, the code’s here for download

Intel RealSense Camera (F200): A Bit of Depth

I’ve been playing around with the RealSense SDK in these previous four posts;

but I haven’t really done anything with the data as of yet other than to just ask the SDK for it as RGB data and then to hand it over to some image in WPF to get it rendered.

I thought I’d see what the depth data looks like and, as such, I reverted back to the approach that I took in the ‘Part 2’ post above in that I went back to working with the PXCMSenseManager object as it seems to bring many things together in a lot fewer lines of code than the approach that I was taking in Parts 3/4.

From the docs, the natural form of the DEPTH data is in 2-byte integer form where the values are depth values in millimetres from the camera.

I thought I’d experiment with a simple console application that could try to tell me whether something was within (e.g.) 100mm of the sensor and came up with;

namespace ConsoleApplication1
{
  using System;
  using System.Collections.Generic;
  using System.Linq;

  class PXCMStatusException : Exception
  {
    public PXCMStatusException(pxcmStatus status)
    {
      this.Status = status;
    }
    public pxcmStatus Status { get; private set; }
  }
  static class PXCMStatusExtensions
  {
    public static void ThrowOnFail(this pxcmStatus status)
    {
      if (!status.Succeeded())
      {
        throw new PXCMStatusException(status);
      }
    }
    public static bool Succeeded(this pxcmStatus status)
    {
      return (status == pxcmStatus.PXCM_STATUS_NO_ERROR);
    }
  }
  class Program
  {
    static bool UnsafeScanForMinimumDistanceMillimetres(
      PXCMImage.ImageData imageData,
      UInt16 minimumDistanceMm,
      ulong length)
    {
      bool found = false;

      unsafe
      {
        UInt16* ptr = (UInt16*)imageData.planes[0].ToPointer();

        for (ulong i = 0; ((i < length) && !found); i++, ptr++)
        {
          found = (*ptr > 0) && (*ptr < minimumDistanceMm);
        }
      }
      return (found);
    }
    static void Main(string[] args)
    {
      const int minimumDistance = 300; // mm

      Console.WriteLine("Hit a key to end...");

      using (PXCMSenseManager senseManager = PXCMSenseManager.CreateInstance())
      {
        // I don't mind dropping frames that arrive while I'm processing the current frame -
        // that is, the system can drop frames that arrive in between my AquireFrame() ->
        // ReleaseFrame() calls, I'll live with it. Only learnt about this 'realtime'
        // mode which I should have perhaps known about previously! 🙂
        senseManager.captureManager.SetRealtime(false);

        senseManager.EnableStream(PXCMCapture.StreamType.STREAM_TYPE_DEPTH, 0, 0).ThrowOnFail();

        senseManager.Init().ThrowOnFail();

        while (!Console.KeyAvailable)
        {
          if (senseManager.AcquireFrame().Succeeded())
          {
            PXCMCapture.Sample sample = senseManager.QuerySample();
            PXCMImage.ImageData imageData;

            sample.depth.AcquireAccess(
              PXCMImage.Access.ACCESS_READ, 
              PXCMImage.PixelFormat.PIXEL_FORMAT_DEPTH, 
              out imageData).ThrowOnFail();

            if (UnsafeScanForMinimumDistanceMillimetres(
              imageData, 
              minimumDistance,
              (ulong)(sample.depth.info.width * sample.depth.info.height)))
            {
              Console.WriteLine("{0:HH:mm:ss:ff}, Saw something within 100mm of the camera, PANIC!",
                DateTime.Now);
            }

            sample.depth.ReleaseAccess(imageData);

            senseManager.ReleaseFrame();
          }
        }
      }
    }
  }
}

and this is going through the steps of;

  1. Make a PXCMSenseManager instance
  2. Ask it to enable the depth stream
  3. Initialise it
  4. Wait for a frame of data via AcquireFrame (not 100% sure on this step)
  5. Access the depth image data
  6. Attempt to run a quick filter over it looking for anything that claims to be within 100mm of the camera.

and I can wave my hand around in front of the camera to see this in action;

consoleshot1

and it all seems to work quite nicely. I wanted to then make this graphical again and so I conjured up another WPF application with a simple UI of an image and a slider;

<Window x:Class="WpfApplication2.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow"
        Height="350"
        Width="525">
  <Grid>
    <Grid.RowDefinitions>
      <RowDefinition />
      <RowDefinition Height="Auto" />
    </Grid.RowDefinitions>
    <Image x:Name="displayImage" />
    <StackPanel Grid.Row="1"
                Margin="10">
      <Slider x:Name="slider"
              Minimum="100"
              Maximum="1000"
              Margin="10" 
              SmallChange="10"
              LargeChange="50"
              ValueChanged="OnSliderValueChanged"/>
      <TextBlock HorizontalAlignment="Center"
                 TextAlignment="Center">
        <Run>Maximum Distance (mm) - currently </Run>
        <Run Text="{Binding ElementName=slider,Path=Value}" />
        <Run>mm</Run>
      </TextBlock>
    </StackPanel>
  </Grid>
</Window>

and then added a couple of little helper classes again so that I could deal with the pxcmStatus values that the SDK loves so much Smile

namespace WpfApplication2
{
  using System;

  class PXCMStatusException : Exception
  {
    public PXCMStatusException(pxcmStatus status)
    {
      this.Status = status;
    }
    public pxcmStatus Status { get; private set; }
  }
  static class PXCMStatusExtensions
  {
    public static void ThrowOnFail(this pxcmStatus status)
    {
      if (!status.Succeeded())
      {
        throw new PXCMStatusException(status);
      }
    }
    public static bool Succeeded(this pxcmStatus status)
    {
      return (status == pxcmStatus.PXCM_STATUS_NO_ERROR);
    }
  }
}

and then put some code behind my UI such that it displays the data coming from the depth camera but it filters it out to only include the data that is within a certain range (defined in mm) of the camera. I ended up with;

namespace WpfApplication2
{
  using System;
  using System.Windows;
  using System.Windows.Media;
  using System.Windows.Media.Imaging;

  public partial class MainWindow : Window
  {
    public MainWindow()
    {
      InitializeComponent();
      this.writeableBitmap = new Lazy<WriteableBitmap>(OnCreateWriteableBitmap);

      this.Loaded += OnLoaded;
    }
    void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.senseManager = PXCMSenseManager.CreateInstance();

      senseManager.captureManager.SetRealtime(false);

      senseManager.EnableStream(PXCMCapture.StreamType.STREAM_TYPE_DEPTH, 0, 0).ThrowOnFail();

      senseManager.Init(
        new PXCMSenseManager.Handler()
        {
          onNewSample = this.OnNewSample
        }).ThrowOnFail();

      senseManager.StreamFrames(false);
    }
    pxcmStatus OnNewSample(int mid, PXCMCapture.Sample sample)
    {
      // this is not the UI thread.
      PXCMImage.ImageData imageData;

      if (sample.depth.AcquireAccess(
        PXCMImage.Access.ACCESS_READ_WRITE, 
        PXCMImage.PixelFormat.PIXEL_FORMAT_DEPTH, 
        out imageData)
        .Succeeded())
      {
        if (!this.imageDimensions.HasArea)
        {
          this.imageDimensions.Width = sample.depth.info.width;
          this.imageDimensions.Height = sample.depth.info.height;
        }

        this.FilterAndScale(
          imageData, 
          this.minimumDistanceMm,
          (ulong)(this.imageDimensions.Width * this.imageDimensions.Height));

        Dispatcher.InvokeAsync(() =>
          {
            this.writeableBitmap.Value.WritePixels(
              this.imageDimensions,
              imageData.planes[0],
              this.imageDimensions.Width * this.imageDimensions.Height * 2,
              this.imageDimensions.Width * 2);   

            // tbh - ok to release this from dispatcher thread when I acquired it
            // on a different thread?
            sample.depth.ReleaseAccess(imageData);
          }
        );
      }
      return (pxcmStatus.PXCM_STATUS_NO_ERROR);
    }
    void FilterAndScale(PXCMImage.ImageData imageData, 
      UInt16 filterMinimumValueMm,
      ulong length)
    {
      unsafe
      {
        UInt16* ptr = (UInt16*)imageData.planes[0].ToPointer();

        for (ulong i = 0; (i < length); i++, ptr++)
        {
          if (*ptr >= filterMinimumValueMm)
          {
            *ptr = 0;
          }
          else if (*ptr != 0)
          {
            *ptr = Math.Min(MAX_DISTANCE_CLAMP_MM, *ptr);
            *ptr = (UInt16)((double)*ptr / (double)MAX_DISTANCE_CLAMP_MM * UInt16.MaxValue);
          }
        }
      }
    }
    void OnSliderValueChanged(object sender, RoutedPropertyChangedEventArgs<double> e)
    {
      this.minimumDistanceMm = (UInt16)e.NewValue;
    }
    WriteableBitmap OnCreateWriteableBitmap()
    {
      var bitmap = new WriteableBitmap(
          this.imageDimensions.Width,
          this.imageDimensions.Height,
          96,
          96,
          PixelFormats.Gray16,
          null);

      this.displayImage.Source = bitmap;

      return (bitmap);
    }
    const UInt16 MAX_DISTANCE_CLAMP_MM = 1000;
    UInt16 minimumDistanceMm;
    Int32Rect imageDimensions;
    Lazy<WriteableBitmap> writeableBitmap;
    PXCMSenseManager senseManager;
  }
}

and this is (attempting) to do more or less the same thing as the console application except that;

  1. it receives data in an event-driven manner via the PXCMSenseManager.Init call rather than trying to call AcquireFrame.
  2. in order to line up with WPF’s image drawing capabilities (and to avoid asking for the depth data in two formats), the code attempts to scale the depth values (which it assumes range from 0 to 1000mm) to fit into a grey-scale spectrum of 0…UInt16.MaxValue while also filtering out any values that exceed the maximum distance threshold as defined by the slider in the UI.
  3. the code has to deal with the hassle of handling UI threads and dispatchers and I made the cheap decision here to transition to the UI thread for every new image frame that I process which might not be the best plan.

I can then run this code and slide the slider from a maximum distance of 100mm to 500mm+ and gradually ‘reveal’ myself as being in front of the camera (presumably, nose first!);

ui2

 

ui3

 

ui4

and that seems to work quite nicely. I’ve dropped the code for the WPF application here in case anyone wants it