Intel RealSense Camera (F200): Baby Steps with Object Tracking

Continuing with this set of posts around the RealSense camera, I thought that I’d dip my toe into the area of object tracking.

The RealSense SDK integrates technology from metaio which enables it to track objects as they pass by the camera and offers different approaches;

  • 2D object tracking based on a reference image of the object to be recognised in the scene
  • 3D object tracking based on a CAD model of the object in question leading to edge-based tracking
  • 3D object tracking based on a ‘point cloud’ scan of the object in question leading to feature-based tracking
  • instant 3D object tracking based on a point cloud that’s created from the scene in real time

There’s more on this in the SDK docs.

Not feeling overly brave, I thought I’d start off my experiments with what sounded simplest and have a look at 2D object tracking based on a reference image.

There’s a module in the framework called PXCMTracker which you can use to enable tracking and which you can feed reference images for tracking either by loading them from a file or by passing existing PXCMImage instances.

I figured that I was in a ‘reasonable’ position to be able to grab images from the live video stream based on the code that I’d written for previous posts like this one and so I thought I’d try and write code which;

  1. displayed the video feed from the camera
  2. allowed the user to ‘swipe’ areas of the feed with the mouse and then click a button in order to pass that to the SDK as an image to be tracked
  3. displayed bounding boxes around objects that the SDK then identified as ‘tracked’ based on the images that it’s been told to look out for

You can see that this worked out fairly well in the video below;

Update – Apologies, on migrating this blog I realized that Vimeo ate the video above and it’s now lost to history, sorry!

As I suggest in the video, I’ve had mixed success in trying to get this to work but when it does work it seems to work pretty well in that the object remains tracked – I think it’s all about the initial ‘quality’ of the image that you feed to the algorithm along with light, reflections, noise etc. and there is a reasonable section in the SDK docs about ‘Known Limitations’.

In terms of making this work, it’s not too complex – it runs something like;

  • Ask the PXCMSenseManager to EnableTracking()
  • Grab the PXCMTracker module and (in my case) feed it images to be tracked via its Set2DTrackFromImage method
  • Query the PXCMTracker module as frames come in and call QueryAllTrackingValues (or one of the other query methods) to get hold of TrackingValues which tell you where any of the images that you are tracking have been picked up in the scene. For each image, there’s information including rotation, translation and so on.

I followed the same sort of approach to structuring the code for this as I’ve used in the previous posts (for good/bad) and so my main UI became just this one new control;

<UserControl x:Class="WpfApplication2.Controls.ObjectTrackingVideoControl"
             xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
             xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
             xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
             xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
             mc:Ignorable="d"
             d:DesignHeight="300"
             d:DesignWidth="300">
  <Grid>
    <Image x:Name="displayImage" />

    <Canvas x:Name="canvas" Background="Transparent" />
    <Button x:Name="addButton"
            Content="add"
            HorizontalContentAlignment="Center"
            VerticalContentAlignment="Center"
            FontSize="24"
            HorizontalAlignment="Left"
            VerticalAlignment="Top"
            Margin="20"/>

    <ItemsControl HorizontalAlignment="Stretch"
                  VerticalAlignment="Bottom"
                  Height="96"
                  Opacity="0.5"
                  x:Name="listImages">
      <ItemsControl.ItemsPanel>
        <ItemsPanelTemplate>
          <StackPanel Orientation="Horizontal" />
        </ItemsPanelTemplate>
      </ItemsControl.ItemsPanel>

    </ItemsControl>
  </Grid>
</UserControl>

where the video is displayed on the Image called displayImage and any other content like tracking rectangles and so on are displayed on the Canvas. Additionally, any images that the user has tried to track are added to the ItemsControl at the bottom of the screen just so that I can be confident of what images have been passed to the SDK by my own code.

The code that lives with that control is a bit sketchy, it needs some more thought but it’s below as I have it at the time of writing – it needs refactoring and it’s probably maintaining way too many lists of things so it does need some more thought and tidying but I was prototyping really;

namespace WpfApplication2.Controls
{
  using System.Collections.Generic;
  using System.Linq;
  using System.Windows;
  using System.Windows.Controls;
  using System.Windows.Media;
  using System.Windows.Media.Imaging;
  using System.Windows.Shapes;

  public partial class ObjectTrackingVideoControl : UserControl, ISampleRenderer
  {
    public ObjectTrackingVideoControl()
    {
      InitializeComponent();

      this.mouseTracker = new MouseTracking(this.canvas, this.addButton);

      this.addedTrackingRectangles = new Dictionary<int, Rect>();
      this.drawnTrackingRectangles = new List<Rectangle>();
    }
    public void Initialise(PXCMSenseManager senseManager)
    {
      this.senseManager = senseManager;

      this.senseManager.captureManager.SetRealtime(false);

      this.senseManager.EnableStream(
        PXCMCapture.StreamType.STREAM_TYPE_COLOR, 1280, 720).ThrowOnFail();

      // switch on tracking and then keep a handle to the module for our
      // duration. somewhat surprised to not have to choose what *type*
      // of tracking I'm looking for.
      this.senseManager.EnableTracker().ThrowOnFail();

      this.trackerModule = this.senseManager.QueryTracker();
    }
    public void ProcessSampleWorkerThread(PXCMCapture.Sample sample)
    {
      this.currentImageData = null;

      PXCMImage.ImageData imageData;

      if (sample.color != null)
      {
        if (sample.color.AcquireAccess(
          PXCMImage.Access.ACCESS_READ,
          PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32,
          out imageData).Succeeded())
        {
          this.InitialiseImageDimensions(sample);

          this.currentImageData = imageData;
        }
      }
      this.currentTrackingTranslations = null;

      PXCMTracker.TrackingValues[] trackingValues;

      if (this.trackerModule.QueryAllTrackingValues(out trackingValues).Succeeded() &&
        (trackingValues.Length > 0))
      {
        this.currentTrackingTranslations = trackingValues.ToDictionary(
          tv => tv.cosID, tv => tv.translationImage);
      }
    }
    public void RenderUI(PXCMCapture.Sample sample)
    {
      if (this.currentImageData != null)
      {
        this.InitialiseImage();

        this.writeableBitmap.WritePixels(
          this.imageDimensions,
          this.currentImageData.planes[0],
          this.imageDimensions.Width * this.imageDimensions.Height * 4,
          this.imageDimensions.Width * 4);

        Rect rectangleOfUserAddedImage;

        if (this.mouseTracker.GetUserAddedRectangle(out rectangleOfUserAddedImage))
        {
          this.AddNewTrackingImageByRectangle(rectangleOfUserAddedImage);
        }

        foreach (var rectangle in this.drawnTrackingRectangles)
        {
          this.canvas.Children.Remove(rectangle);
        }
        if (
          (this.addedTrackingRectangles.Count() > 0) &&
          (this.currentTrackingTranslations != null))
        {
          foreach (var trackedTranslation in this.currentTrackingTranslations)
          {
            Rectangle rectangle = new Rectangle()
            {
              Stroke = Brushes.Yellow,
              StrokeThickness = 2,
              Width = this.addedTrackingRectangles[trackedTranslation.Key].Width,
              Height = this.addedTrackingRectangles[trackedTranslation.Key].Height
            };
            Canvas.SetLeft(rectangle, trackedTranslation.Value.x -
              rectangle.Width / 2);
            Canvas.SetTop(rectangle, trackedTranslation.Value.y -
              rectangle.Height / 2);

            this.canvas.Children.Add(rectangle);
            this.drawnTrackingRectangles.Add(rectangle);
          }
        }
        sample.color.ReleaseAccess(this.currentImageData);

        this.currentImageData = null;
      }
    }
    void AddNewTrackingImageByRectangle(Rect trackingRectangle)
    {
        PXCMImage newImage = this.senseManager.session.CreateImage(
          new PXCMImage.ImageInfo()
          {
            format = PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32,
            width = (int)trackingRectangle.Width,
            height = (int)trackingRectangle.Height
          });

        PXCMImage.ImageData newImageData;

        newImage.AcquireAccess(PXCMImage.Access.ACCESS_WRITE, out newImageData);

        WriteableBitmap cropped = this.writeableBitmap.Crop(
          (int)trackingRectangle.Left,
          (int)trackingRectangle.Top,
          (int)trackingRectangle.Width,
          (int)trackingRectangle.Height);

        this.listImages.Items.Add(new Image() { Source = cropped });

        var bits = cropped.ToByteArray();

        newImageData.FromByteArray(0, bits);

        newImage.ReleaseAccess(newImageData);

        int trackingId;

        this.trackerModule.Set2DTrackFromImage(newImage, out trackingId).ThrowOnFail();

        this.addedTrackingRectangles[trackingId] = trackingRectangle;
    }
    void InitialiseImageDimensions(PXCMCapture.Sample sample)
    {
      if (!this.imageDimensions.HasArea)
      {
        this.imageDimensions.Width = sample.color.info.width;
        this.imageDimensions.Height = sample.color.info.height;

        this.senseManager.captureManager.device.SetMirrorMode(
          PXCMCapture.Device.MirrorMode.MIRROR_MODE_HORIZONTAL);
      }
    }
    void InitialiseImage()
    {
      if (this.writeableBitmap == null)
      {
        this.writeableBitmap = new WriteableBitmap(
          this.imageDimensions.Width,
          this.imageDimensions.Height,
          96,
          96,
          PixelFormats.Bgra32,
          null);

        this.displayImage.Source = this.writeableBitmap;
      }
    }
    List<Rectangle> drawnTrackingRectangles;
    Dictionary<int, Rect> addedTrackingRectangles;
    Dictionary<int, PXCMPointF32> currentTrackingTranslations;
    MouseTracking mouseTracker;
    PXCMSenseManager senseManager;
    PXCMImage.ImageData currentImageData;
    Int32Rect imageDimensions;
    WriteableBitmap writeableBitmap;
    PXCMTracker trackerModule;
  }
}

that makes use of a little class which is also fairly hack to track the mouse movements around;

namespace WpfApplication2
{
  using System.Windows;
  using System.Windows.Controls;
  using System.Windows.Media;
  using System.Windows.Shapes;

  class MouseTracking
  {
    public MouseTracking(Canvas canvas, Button addButton)
    {
      this.canvas = canvas;
      this.canvas.MouseDown += this.OnMouseDown;
      this.canvas.MouseUp += this.OnMouseUp;
      this.canvas.MouseMove += this.OnMouseMove;
      addButton.Click += this.OnUserAddsRectangle;
    }
    void OnUserAddsRectangle(object sender, RoutedEventArgs e)
    {
      if ((this.rectangle != null) && (this.rectangle.Width > 0) &&
        (this.rectangle.Height > 0))
      {
        this.rectangleReady = true;
      }
    }
    void RemoveRectangle()
    {
      if (this.rectangle != null)
      {
        this.canvas.Children.Remove(this.rectangle);
        this.rectangle = null;
      }
    }
    void OnMouseMove(object sender, System.Windows.Input.MouseEventArgs e)
    {
      if ((e.LeftButton == System.Windows.Input.MouseButtonState.Pressed) &&
        this.mouseTracking)
      {
        this.rectangle.Width = e.GetPosition(this.canvas).X - Canvas.GetLeft(this.rectangle);
        this.rectangle.Height = e.GetPosition(this.canvas).Y - Canvas.GetTop(this.rectangle);
      }
    }
    void OnMouseUp(object sender, System.Windows.Input.MouseButtonEventArgs e)
    {
      this.mouseTracking = false;
    }
    void OnMouseDown(object sender, System.Windows.Input.MouseButtonEventArgs e)
    {
      if (e.LeftButton == System.Windows.Input.MouseButtonState.Pressed)
      {
        this.RemoveRectangle();

        this.rectangle = new Rectangle()
        {
          Stroke = RECTANGLE_STROKE,
          Fill = RECTANGLE_FILL,
          StrokeThickness = RECTANGLE_STROKE_THICKNESS
        };
        Canvas.SetLeft(this.rectangle, e.GetPosition(this.canvas).X);
        Canvas.SetTop(this.rectangle, e.GetPosition(this.canvas).Y);
        this.canvas.Children.Add(this.rectangle);
        this.mouseTracking = true;
      }
    }
    public bool GetUserAddedRectangle(out Rect rect)
    {
      bool returned = (this.rectangle != null) && this.rectangleReady;
      rect = new Rect();

      if (returned)
      {
        rect = new Rect(
          Canvas.GetLeft(this.rectangle),
          Canvas.GetTop(this.rectangle),
          this.rectangle.Width,
          this.rectangle.Height
        );
        this.rectangleReady = false;
        this.RemoveRectangle();
      }
      return (returned);
    }
    Canvas canvas;
    Rectangle rectangle;
    bool mouseTracking;
    bool rectangleReady;
    static readonly Brush RECTANGLE_STROKE = Brushes.White;
    static readonly int RECTANGLE_STROKE_THICKNESS = 2;
    static readonly Brush RECTANGLE_FILL = new SolidColorBrush(Color.FromArgb(0x55, 0x00, 0x55, 0x55));
  }
}

which could also do with some rework (e.g. passing the Canvas and the Button into the c’tor here just feels plain evil but it’s a lot quicker than coming up with viewmodels and so on when you’re just experimenting).

If you want the code for download then it’s here. I’d like to see if I can do something with the 3D tracking modes if I can find/make suitable 3D models to try out with…