Intel RealSense Camera (F200): Segmenting Video for ‘Green Screen’ Scenarios

Continuing the posts that I’ve written recently on the RealSense camera, I noticed a video-related feature in the SDK that I hadn’t experimented with yet which is the idea of segmenting the user or users from their background.

Like a number of areas of the RealSense SDK, this has parallels with what I see in the Kinect v2 for Windows SDK in that both offer this idea of capturing imagery (whether that be from a video, depth or IR source) and then making it pretty easy for the developer to be able to identify which pixels in the image make up one or more of the users and which pixels belong to the background.

In the Kinect v2 SDK, the bits deliver an image to your code alongside a synchronised ‘body index’ pixel map. The map enables you to look up any pixel (x,y) in the image and determine which of the 6 bodies that the sensor can capture might be located at that pixel.

I’m not sure whether the RealSense SDK is quite as flexible as that in that its approach seems to be to hand your code an image where the alpha channel of any pixels that make up the background (rather than the user) are blanked out. It makes it easier to compute/display that image than the Kinect v2 SDK does but it doesn’t quite seem to give you quite the same level of detail.

Either way, I figured I’d try it out with the RealSense SDK and quickly (i.e. < 5 minutes) managed to take the code that I’d worked with in previous posts like this one, strip it back so that it was just displaying video and then change it such that it was displaying segmented video rather than raw video.

In order to highlight the effect, I put the Big Buck Bunny video behind my user control such that it would shine through the blank alpha channel and here’s what that looked like. Here’s me with the bunny;

It’s pretty easy to do. I re-used the ‘framework’ that I’ve used in previous posts and just altered my UI to contain a MediaElement and a new control that follows my pattern for dealing with RealSense SDK data in order to display segmented video;

    <MediaElement Source="bunny.avi"
                  Stretch="Uniform"
                  Grid.Row="1"
                  Grid.Column="1"
                  x:Name="mediaElement"
                  LoadedBehavior="Play" />
    <controls:SegmentedVideoControl Grid.Row="1"
                                    Grid.Column="1" />

and that SegmentedVideoControl is just an Image;

<UserControl x:Class="WpfApplication2.Controls.SegmentedVideoControl"
             xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
             xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
             xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
             xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
             mc:Ignorable="d"
             d:DesignHeight="300"
             d:DesignWidth="300">
  <Grid>
    <Image x:Name="displayImage" />
  </Grid>
</UserControl>

with some code behind it to configure the PXCMSenseManager and, specifically here, to ask it to Enable3DSeg which is the segmented video that I want to display and then the only difference between this code and my previous video display code is that rather than getting hold of the image data from the PXCMCapture.Sample instance there’s a need to call PXCMSenseManager.Query3DSeg() in order to get hold of a PXCM3DSeg instance such that the image can be obtained by calling PXCM3DSeg.AcquireSegmentedImage().

So…getting hold of the image is a little more involved than in the regular video case but it’s just a couple of additional calls and once the image has been grabbed, handling it is the same as in the regular video case. Here’s the code for that control;

namespace WpfApplication2.Controls
{
  using System.Windows;
  using System.Windows.Controls;
  using System.Windows.Media;
  using System.Windows.Media.Imaging;

  public partial class SegmentedVideoControl : UserControl, ISampleRenderer
  {
    public SegmentedVideoControl()
    {
      InitializeComponent();
    }
    public void Initialise(PXCMSenseManager senseManager)
    {
      this.senseManager = senseManager;

      this.senseManager.captureManager.SetRealtime(false);

      this.senseManager.EnableStream(PXCMCapture.StreamType.STREAM_TYPE_COLOR, 1280, 720).ThrowOnFail();

      this.senseManager.Enable3DSeg().ThrowOnFail();
    }
    public void ProcessSampleWorkerThread(PXCMCapture.Sample sample)    
    {
      this.currentSegment = null;
      this.currentSegmentedData = null;

      // between calls to AcquireFrame/ReleaseFrame (as this code should be)
      // the docs say that this call will return a valid module instance
      // that can be used to grab the data.
      using (var seg = this.senseManager.Query3DSeg())
      {
        this.currentSegment = seg.AcquireSegmentedImage();
        {
          PXCMImage.ImageData imageData;

          if (this.currentSegment != null)
          {
            if (this.currentSegment.AcquireAccess(
              PXCMImage.Access.ACCESS_READ,
              PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32,
              out imageData).Succeeded())
            {
              this.InitialiseImageDimensions();

              this.currentSegmentedData = imageData;
            }
          }
        }
      }
    }
    public void RenderUI(PXCMCapture.Sample sample)
    {
      if (this.currentSegment != null)
      {
        this.InitialiseImage();

        this.writeableBitmap.WritePixels(
          this.imageDimensions,
          this.currentSegmentedData.planes[0],
          this.imageDimensions.Width * this.imageDimensions.Height * 4,
          this.imageDimensions.Width * 4);

        this.currentSegment.ReleaseAccess(this.currentSegmentedData);
        this.currentSegmentedData = null;
        this.currentSegment.Dispose();
        this.currentSegment = null;
      }
    }
    void InitialiseImageDimensions()
    {
      if (!this.imageDimensions.HasArea)
      {
        this.imageDimensions.Width = this.currentSegment.info.width;
        this.imageDimensions.Height = this.currentSegment.info.height;

        this.senseManager.captureManager.device.SetMirrorMode(PXCMCapture.Device.MirrorMode.MIRROR_MODE_HORIZONTAL);
      }
    }
    void InitialiseImage()
    {
      if (this.writeableBitmap == null)
      {
        this.writeableBitmap = new WriteableBitmap(
          this.imageDimensions.Width,
          this.imageDimensions.Height,
          96,
          96,
          PixelFormats.Bgra32,
          null);

        this.displayImage.Source = this.writeableBitmap;
      }
    }
    PXCMSenseManager senseManager;
    PXCMImage currentSegment;
    PXCMImage.ImageData currentSegmentedData;
    Int32Rect imageDimensions;
    WriteableBitmap writeableBitmap;
  }
}

and here’s the code for the whole thing (minus the Big Buck Bunny video) in case you want to play around with it.