Adding to this set of posts around the RealSense camera and SDK I wanted to try and see what it was like to combine streaming data from multiple sources and so I thought I’d experiment with displaying depth and colour data at the same time.
This scenario feels very much like programming with the Kinect v2 SDK where you use a MultiFrameSource/Reader in order to bring back synchronised data frames from the sensor across sources like colour, IR, depth, body and so on.
Building on what I’d been doing in the previous post I marked out just a most simple XAML ‘UI’ in WPF;
<Window x:Class="WpfApplication2.MainWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="MainWindow" Height="350" Width="525"> <Grid> <Image x:Name="displayImage" /> </Grid> </Window>
just in order to give myself somewhere to render an image into and then I wrote some code behind that window (no data-binding this time around). This code is using the same small set of extension methods that I listed in the previous post so I won’t post them again here;
namespace WpfApplication2 { using System; using System.Windows; using System.Windows.Media; using System.Windows.Media.Imaging; public partial class MainWindow : Window { public MainWindow() { InitializeComponent(); this.Loaded += OnLoaded; } void OnLoaded(object sender, RoutedEventArgs e) { this.senseManager = PXCMSenseManager.CreateInstance(); this.senseManager.captureManager.SetRealtime(false); // asking for the same image size and frame rate range from both // color and depths. var frameRate = new PXCMRangeF32() { min = 30, max = 60 }; var size = new PXCMSizeI32() { width = 640, height = 480 }; var streamDesc = new PXCMVideoModule.StreamDesc() { frameRate = frameRate, sizeMin = size, sizeMax = size }; this.senseManager.EnableStreams( new PXCMVideoModule.DataDesc() { streams = new PXCMVideoModule.StreamDescSet() { color = streamDesc, depth = streamDesc } } ).ThrowOnFail(); this.senseManager.Init( new PXCMSenseManager.Handler() { onNewSample = this.OnNewSample }).ThrowOnFail(); this.projection = this.senseManager.captureManager.device.CreateProjection(); this.senseManager.StreamFrames(false); } pxcmStatus OnNewSample(int mid, PXCMCapture.Sample sample) { // this is not the UI thread. PXCMImage.ImageData depthImage; PXCMImage.ImageData colorImage; var gotDepth = sample.depth.AcquireAccess( PXCMImage.Access.ACCESS_READ, PXCMImage.PixelFormat.PIXEL_FORMAT_DEPTH, out depthImage); if (gotDepth.Succeeded()) { var gotColor = sample.color.AcquireAccess( PXCMImage.Access.ACCESS_READ_WRITE, PXCMImage.PixelFormat.PIXEL_FORMAT_RGB32, out colorImage); if (gotColor.Succeeded()) { this.InitialiseImageDimensions(sample.depth); this.QueryUvMapAndConfidenceValue(sample.depth); this.ApplyDepthToColor(depthImage, colorImage); Dispatcher.InvokeAsync(() => { this.InitialiseImage(); this.writeableBitmap.WritePixels( this.imageDimensions, colorImage.planes[0], this.imageDimensions.Width * this.imageDimensions.Height * 4, this.imageDimensions.Width * 4); sample.color.ReleaseAccess(colorImage); sample.depth.ReleaseAccess(depthImage); } ); } else { sample.depth.ReleaseAccess(depthImage); } } return (pxcmStatus.PXCM_STATUS_NO_ERROR); } void QueryUvMapAndConfidenceValue(PXCMImage depthImage) { if (this.uvMap == null) { this.uvMap = new PXCMPointF32[this.imageDimensions.Width * this.imageDimensions.Height]; this.depthLowConfidenceValue = this.senseManager.captureManager.device.QueryDepthLowConfidenceValue(); } this.projection.QueryUVMap(depthImage, this.uvMap).ThrowOnFail(); } void InitialiseImageDimensions(PXCMImage image) { if (!this.imageDimensions.HasArea) { this.imageDimensions.Width = image.info.width; this.imageDimensions.Height = image.info.height; } } void InitialiseImage() { if (this.writeableBitmap == null) { this.writeableBitmap = new WriteableBitmap( this.imageDimensions.Width, this.imageDimensions.Height, 96, 96, PixelFormats.Bgra32, null); this.displayImage.Source = this.writeableBitmap; } } void ApplyDepthToColor( PXCMImage.ImageData depthData, PXCMImage.ImageData colorData) { unsafe { UInt16* depthPtr = (UInt16*)depthData.planes[0].ToPointer(); UInt32* colorPtr = (UInt32*)colorData.planes[0].ToPointer(); var width = this.imageDimensions.Width; var height = this.imageDimensions.Height; var length = width * height; for (int i = 0; i < length; i++) { colorPtr[i] &= 0x22FFFFFF; } for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) { int pixelIndex = (y * width) + x; if (depthPtr[pixelIndex] != this.depthLowConfidenceValue) { // tbd - not sure I understand this yet but this map takes a // u,v in the depth image and gives a 0..1.0,0..1.0 value // to index into the width,height of the colour image. // sometimes it comes back as > 1.0 though and sometimes // it seems to come back as -1.0? var mapped = this.uvMap[pixelIndex]; mapped.x = Math.Min(1.0f, mapped.x); mapped.y = Math.Min(1.0f, mapped.y); if ((mapped.x >= 0) && (mapped.y >= 0)) { var cx = (int)(mapped.x * width); var cy = (int)(mapped.y * height); colorPtr[(cy * width) + cx] |= 0xCC000000; } } } } } } UInt16 depthLowConfidenceValue; PXCMPointF32[] uvMap; Int32Rect imageDimensions; WriteableBitmap writeableBitmap; PXCMSenseManager senseManager; PXCMProjection projection; } }
In short, what the code’s trying to do should be (hopefully) fairly transparent;
- Create a PXCMSenseManager.
- Configure it to capture both color and depth streams asking for 640×480 at 30-60Hz and to deliver both frames together rather than separately (the SDK seems to call that ‘aligned’).
- Handle these frames in the OnNewSample function which;
- Grabs both images, depth in the PIXEL_FORMAT_DEPTH format and color in RBG32
- Build a map which maps from the depth co-ordinate system to the color one.
- Filter the colour image by taking all depth pixels with a reasonable value, mapping them to their location in the colour image and then altering the alpha channel for those pixels to make them stand out.
The only bit that I spent a little time on here was step 3.2 above because, in the first instance, I hadn’t quite realised that I’d need to map the depth->colour co-ordinates in this way but, much like programming with the Kinect v2 SDK there are multiple cameras in play with multiple co-ordinate systems and the naive approach of not mapping pixel values across left me with a depth image that was very clearly offset from the colour image.
Running the app produces an effect like this – objects near to the sensor are highlighted, others are whitewashed;
I’ve put the code here for download if anyone wants it in the future, in the meantime I want to try some other types of data like facial data or hand data in future posts…