Intel RealSense Camera (F200): ‘Hello World’ Part 2

Picking up from my previous post, I thought I'd see if I could get a simple stream of video data from the RealSense camera.

The SDK docs walk you through the architecture of coding against the camera which is done via a series of configured modules;

clip_image002

with the I/O modules pushing data to/from a device and the algorithm modules doing work like facial recognition and so on.

All the modules derive from this PXCMBase class;

public class PXCMBase : IDisposable
{
  public const int CUID = 0;
  public const int WORKING_PROFILE = -1;
  public IntPtr instance;
  protected int refcount;
  protected static Dictionary<Type, int> Type2CUID;
  public virtual void Dispose();
  public TT QueryInstance<TT>() where TT : PXCMBase;
  public PXCMBase QueryInstance(int cuid);
}

and a module has some kind of unique identifier (CUID) although I'm not sure what the 'C' stands for and modules are disposable and there's some way to navigate from one module to another module via the QueryInstance method which feels a little like a specific version of COM's QueryInterface function.

The programming model's a bit bewildering at first because it feels like there are lots of ways of doing the same thing with more/less control and granularity. I think the simplest model is to use the PXCMSenseManager as it seems to provide a higher level programming model where a number of bits are pre-configured for your use and so I started there.

I figured I'd write a little bit of 'UI';

<Window x:Class="HelloRealSense.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow"
        Height="350"
        Width="525">
  <Grid>
    <Image x:Name="screenImage" />
    <StackPanel VerticalAlignment="Bottom"
                HorizontalAlignment="Center"
                Orientation="Horizontal">
      <StackPanel.Resources>
        <Style TargetType="Button">
          <Setter Property="Margin"
                  Value="5" />
          <Setter Property="Padding"
                  Value="5" />
        </Style>
        <Style TargetType="RadioButton">
          <Setter Property="Margin"
                  Value="5" />
        </Style>
      </StackPanel.Resources>
      <Button Content="Start"
              Click="OnStartButton" />
      <Button Content="Stop"
              Click="OnStopButton" />
      <StackPanel Orientation="Horizontal" VerticalAlignment="Center">
        <RadioButton Content="Color" x:Name="radioColor" IsChecked="True" />
        <RadioButton Content="IR" x:Name="radioIR" />
        <RadioButton Content="Depth" x:Name="radioDepth"/>
      </StackPanel>
    </StackPanel>
  </Grid>
</Window>

and that then gives me Start/Stop buttons, an image to display things and some radio buttons via which I can choose to display color/depth/infra-red data.

I then sketched out a bit of code. Here's the start of my class with the implementation of the start button handler;

namespace HelloRealSense
{
  using System;
  using System.Collections.Concurrent;
  using System.Windows;
  using System.Windows.Media;
  using System.Windows.Media.Imaging;
  using System.Linq;

  public partial class MainWindow : Window
  {
    Int32Rect sampleImageDimensions;
    ConcurrentQueue<PXCMCapture.Sample> sampleQueue;
    PXCMSenseManager senseManager;
    WriteableBitmap writeableBitmap;
    bool stopped;

    public MainWindow()
    {
      InitializeComponent();
      this.sampleQueue = new ConcurrentQueue<PXCMCapture.Sample>();
    }
    void OnStartButton(object sender, RoutedEventArgs e)
    {
      this.stopped = false;

      this.senseManager = PXCMSenseManager.CreateInstance();

      var radioButtonsAndStreamTypes = new[]
      {
        Tuple.Create(this.radioColor.IsChecked, PXCMCapture.StreamType.STREAM_TYPE_COLOR),
        Tuple.Create(this.radioDepth.IsChecked, PXCMCapture.StreamType.STREAM_TYPE_DEPTH),
        Tuple.Create(this.radioIR.IsChecked, PXCMCapture.StreamType.STREAM_TYPE_IR)
      };
      var streamType = radioButtonsAndStreamTypes.Single(r => (bool)r.Item1).Item2;

      this.senseManager.EnableStream(streamType, 0, 0);

      this.senseManager.Init(
        new PXCMSenseManager.Handler()
        {
          onNewSample = this.OnNewSample
        }
      );
      this.senseManager.StreamFrames(false);
    } 

The startup code is essentially just trying to create an instance of PXCMSenseManager and then ask it to enable either a STREAM_TYPE_COLOR or DEPTH or IR based on the currently checked radio button and it's not making any requests around the resolution required nor the frame rate (both of which are options in the call to EnableStream above).

It then initialises the PXCMSenseManager and passes it a handler which contains a delegate pointing to a method to be called whenever a new sample arrives (OnNewSample). It’s a bit odd that this isn’t defined as a .NET event, but there you go.

With that set up, the sense manager is told to StreamFrames (without blocking – the false parameter – as I need WPF to go back to its dispatcher loop).

The OnNewSample function is a simple one in that it queues up items onto a concurrent queue once it's checked that things are still running;

    pxcmStatus OnNewSample(int mid, PXCMCapture.Sample sample)
    {
      pxcmStatus status = pxcmStatus.PXCM_STATUS_PARAM_UNSUPPORTED;

      if (this.stopped)
      {
        status = pxcmStatus.PXCM_STATUS_NO_ERROR;
      }
      else if (mid == PXCMCapture.CUID)
      {
        status = pxcmStatus.PXCM_STATUS_NO_ERROR;

        this.sampleQueue.Enqueue(sample);

        this.Dispatcher.InvokeAsync(this.DrainQueueUIThread);
      }
      return (status);
    }

the function also kicks the dispatcher to call the DrainQueueUIThread function to make sure that items that have been queued get processed on the UI thread. I took this route because OnNewSample seemed to have no notion of capturing the synchronization context and there's definitely an optimisation here which would be to not queue up these frames if they can't be consumed in a timely fashion and perhaps just process the last frame that’s arrived.

The function that drains that queue on the UI thread looks like this;

    void DrainQueueUIThread()
    {
      PXCMCapture.Sample sample;

      while (this.sampleQueue.TryDequeue(out sample))
      {
        PXCMImage.ImageData imageData;
        PXCMImage image = PickFirstImageAvailableInSample(sample);

        pxcmStatus status =
          image.AcquireAccess(PXCMImage.Access.ACCESS_READ, PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32, out imageData);

        if (Succeeded(status))
        {
          this.EnsureWriteableBitmapCreated(image.info.width, image.info.height);

          this.writeableBitmap.WritePixels(
            this.sampleImageDimensions,
            imageData.planes[0],
            this.writeableBitmap.PixelWidth * this.writeableBitmap.PixelHeight * 4,
            this.writeableBitmap.PixelWidth * 4);

          image.ReleaseAccess(imageData);
        }
      }
    }

and so it simply tries to drain the queue, acquires access to the image data from the SDK in (convenient) RGB32 format and then copies it across into a WriteableBitmap which we ensure is created in this routine to match the size of whatever image is being displayed;

    void EnsureWriteableBitmapCreated(int width, int height)
    {
      if (this.writeableBitmap == null)
      {
        this.writeableBitmap = new WriteableBitmap(
          width,
          height,
          96,
          96,
          PixelFormats.Bgra32,
          null);

        this.screenImage.Source = this.writeableBitmap;

        this.sampleImageDimensions = new Int32Rect(0, 0, width, height);
      }
    }

Finally, I have another couple of little functions – this one to try and pick out whichever part of a sample is not null whether it be color, depth, IR, etc;

    static PXCMImage PickFirstImageAvailableInSample(PXCMCapture.Sample sample)
    {
      PXCMImage image = sample.color;
      image = image == null ? sample.depth : image;
      image = image == null ? sample.ir : image;
      image = image == null ? sample.left : image;
      image = image == null ? sample.right : image;
      return (image);
    }
and this one to simply check a status code;
    static bool Succeeded(pxcmStatus status)
    {
      return (status == pxcmStatus.PXCM_STATUS_NO_ERROR);
    }

and that's it – naturally, the code's a bit hacky and (e.g.) doesn't enable/disable UI when it needs to but, fundamentally, it grabs colour/depth/IR frames from the camera and displays them – here's the app running displaying depth data;

clip_image004

In taking this approach though I feel like I let a lot of 'higher level' pieces do work on my behalf and so didn't perhaps understand quite what they were doing for me. I'd like to revisit a similar post using the lower level pieces to see if I can figure that out…