I’ve got quite an old compact camera. It’s probably around 5 years old at this point and I often think that I should update it but, frankly, it does the job and so I continue to make use of it when I’m not just photographing things on my phone.
I remember that at the time that I bought the camera it was one of the first that I’d had which did face detection in the sense that it would draw a little box around the people that it saw in the photos and it would even try and identify them if you did some set up and gave it some names for familiar faces.
I’ve written quite a lot in the past around doing face detection with different technologies. For instance, there was this post which made use of the Kinect for Windows V2 in order to locate a face within video frames and monitor it for facial features such as eyes open/closed and so on. I also wrote this post around working with Intel’s RealSense SDK and analysing facial positions and features there.
But both of those approaches require special hardware and, in the RealSense case, can do quite high-fidelity facial recognition in the sense that depth data can be used to differentiate between a real face and a photograph of a face.
What if you don’t have that hardware? What if you just have a web cam?
I was looking into this in the light of two sets of APIs.
One set of APIs operate ‘on device’ and live in the UWP namespace Windows.Media.FaceAnalysis where there’s a both a FaceDetector and a FaceTracker class.
The other set of APIs operate ‘in the cloud’ and are offered by Project Oxford with it’s Face APIs which offer a range of facial functionality.
I wanted to experiment and so I thought that I’d combine both. I started on the device.
On Device
There are official samples around these APIs ( on github ) but I wanted to start from scratch and so in the first instance I tried to build something that displayed video from the camera on my device in a little UI.
Step 1 – Displaying a Video Stream from a Camera
I created a project, made sure that it had access to the webcam and then wrote a little UI which is essentially a Play/Stop button, a CaptureElement and then a couple of visual states to represent “Playing”/”Stopped”;
<Page x:Class="App183.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:App183" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d"> <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <VisualStateManager.VisualStateGroups> <VisualStateGroup x:Name="FaceProcessingState"> <VisualState x:Name="Stopped" /> <VisualState x:Name="Playing"> <VisualState.Setters> <Setter Target="btnPlay.(UIElement.Visibility)" Value="Collapsed" /> <Setter Target="btnStop.(UIElement.Visibility)" Value="Visible" /> </VisualState.Setters> </VisualState> </VisualStateGroup> </VisualStateManager.VisualStateGroups> <Grid.RowDefinitions> <RowDefinition Height="5*" /> <RowDefinition Height="*" /> <RowDefinition Height="*" /> </Grid.RowDefinitions> <Grid.ColumnDefinitions> <ColumnDefinition Width="*" /> <ColumnDefinition Width="*" /> <ColumnDefinition Width="*" /> </Grid.ColumnDefinitions> <CaptureElement x:Name="captureElement" Stretch="Fill" Grid.ColumnSpan="3" Grid.RowSpan="3" /> <Viewbox Grid.Row="1" Grid.Column="1"> <StackPanel Orientation="Horizontal" Grid.Column="1" Grid.Row="1"> <Button x:Name="btnPlay" Click="OnStart"> <SymbolIcon Symbol="Play" /> </Button> <Button x:Name="btnStop" Click="OnStopAsync" Visibility="Collapsed"> <SymbolIcon Symbol="Stop" /> </Button> </StackPanel> </Viewbox> </Grid> </Page>
and I wrote a little class which I called CameraPreviewManager which takes a CaptureElement and attempts to take the necessary steps to start previewing video from a camera into that element;
using System; using System.Linq; using System.Threading.Tasks; using Windows.Devices.Enumeration; using Windows.Media.Capture; using Windows.Media.MediaProperties; using Windows.UI.Xaml.Controls; class CameraPreviewManager { public CameraPreviewManager(CaptureElement captureElement) { this.captureElement = captureElement; } public async Task<VideoEncodingProperties> StartPreviewToCaptureElementAsync( Func<DeviceInformation, bool> deviceFilter) { var preferredCamera = await this.GetFilteredCameraOrDefaultAsync(deviceFilter); MediaCaptureInitializationSettings initialisationSettings = new MediaCaptureInitializationSettings() { StreamingCaptureMode = StreamingCaptureMode.Video, VideoDeviceId = preferredCamera.Id }; this.mediaCapture = new MediaCapture(); await this.mediaCapture.InitializeAsync(initialisationSettings); this.captureElement.Source = this.mediaCapture; await this.mediaCapture.StartPreviewAsync(); return (this.mediaCapture.VideoDeviceController.GetMediaStreamProperties( MediaStreamType.VideoPreview) as VideoEncodingProperties); } public async Task StopPreviewAsync() { await this.mediaCapture.StopPreviewAsync(); this.captureElement.Source = null; } async Task<DeviceInformation> GetFilteredCameraOrDefaultAsync( Func<DeviceInformation, bool> deviceFilter) { var videoCaptureDevices = await DeviceInformation.FindAllAsync( DeviceClass.VideoCapture); var selectedCamera = videoCaptureDevices.SingleOrDefault(deviceFilter); if (selectedCamera == null) { // we fall back to the first camera that we can find. selectedCamera = videoCaptureDevices.FirstOrDefault(); } return (selectedCamera); } public MediaCapture MediaCapture { get { return (this.mediaCapture); } } public VideoEncodingProperties VideoProperties { get { return (this.mediaCapture.VideoDeviceController.GetMediaStreamProperties( MediaStreamType.VideoPreview) as VideoEncodingProperties); } } CaptureElement captureElement; MediaCapture mediaCapture; }
and I wrote a little code-behind to set that up;
using System; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; public sealed partial class MainPage : Page { public MainPage() { this.InitializeComponent(); } async void OnStart(object sender, RoutedEventArgs e) { VisualStateManager.GoToState(this, "Playing", false); this.cameraPreviewManager = new CameraPreviewManager(this.captureElement); var videoProperties = await this.cameraPreviewManager.StartPreviewToCaptureElementAsync( vcd => vcd.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Front); } async void OnStopAsync(object sender, RoutedEventArgs e) { await this.cameraPreviewManager.StopPreviewAsync(); VisualStateManager.GoToState(this, "Stopped", false); } CameraPreviewManager cameraPreviewManager; }
and that all works nicely in that I get a play/stop button and it displays the video from my front-facing camera (if I have one – otherwise, it’ll default to any camera) and things are good. Here’s a snapshot of a silly face presented to the camera
Ok, so I can set up some previewing from the camera – how do I recognise faces?
Step 2 – Finding Faces in the Video Frames
Now comes an interesting step – I’ve managed to preview video from a camera without ever going anywhere near video frames directly but I’m going to need those video frames if I want to pass them over into the face detection APIs for analysis.
How to do that?
One approach might be to attempt to process every single video frame by somehow plugging into the video pipeline, taking every frame and running facial detection on it – i.e. let the video pipeline push video frames into the facial detection code.
One challenge with that would be that it means that the facial detection has to keep up with the rate at which frames arrive from the video pipeline and that may/may not be a challenge for the facial detection code.
Another approach might be to let the facial detection processing dictate how many video frames can be processed per second – i.e. let the facial detection process pull video frames as and when it can handle them.
I went with the latter and wrote myself a little class to manage that which I called PreviewFrameProcessor;
using System; using System.Threading; using System.Threading.Tasks; using Windows.Foundation; using Windows.Graphics.Imaging; using Windows.Media; using Windows.Media.Capture; using Windows.Media.MediaProperties; abstract class PreviewFrameProcessor<T> { public event EventHandler<PreviewFrameProcessedEventArgs<T>> FrameProcessed; public PreviewFrameProcessor(MediaCapture mediaCapture, VideoEncodingProperties videoEncodingProperties) { this.mediaCapture = mediaCapture; this.videoSize = new Rect(0, 0, videoEncodingProperties.Width, videoEncodingProperties.Height); this.eventArgs = new PreviewFrameProcessedEventArgs<T>(); } public async Task RunFrameProcessingLoopAsync(CancellationToken token) { await Task.Run(async () => { await this.InitialiseForProcessingLoopAsync(); VideoFrame frame = new VideoFrame(this.BitmapFormat, (int)this.videoSize.Width, (int)this.videoSize.Height); TimeSpan? lastFrameTime = null; try { while (true) { token.ThrowIfCancellationRequested(); await this.mediaCapture.GetPreviewFrameAsync(frame); if ((!lastFrameTime.HasValue) || (lastFrameTime != frame.RelativeTime)) { T results = await this.ProcessBitmapAsync(frame.SoftwareBitmap); this.eventArgs.Frame = frame; this.eventArgs.Results = results; // This is going to fire on our thread here. Up to the caller to // 'do the right thing' which is a bit risky really. this.FireFrameProcessedEvent(); } lastFrameTime = frame.RelativeTime; } } finally { frame.Dispose(); } }, token); } protected abstract Task InitialiseForProcessingLoopAsync(); protected abstract Task<T> ProcessBitmapAsync(SoftwareBitmap bitmap); protected abstract BitmapPixelFormat BitmapFormat { get; } void FireFrameProcessedEvent() { var handlers = this.FrameProcessed; if (handlers != null) { handlers(this, this.eventArgs); } } PreviewFrameProcessedEventArgs<T> eventArgs; MediaCapture mediaCapture; Rect videoSize; }
and so this class assumes that it’s ok to share the MediaCapture object onto another thread and it then offers the RunFrameProcessingLoopFunctionAsync which attempts to use a separate task to;
-
Wait for cancellation.
-
Pull video frames from the MediaCapture via GetPreviewFrameAsync().
-
Pass those video frames into some abstract function ProcessBitmapAsync()
-
Publish the results of whatever processing has been done via the FrameProcessed event along with the video frame that they existed in.
The event args type in use here is a simple thing;
using Windows.Media; class PreviewFrameProcessedEventArgs<T> : EventArgs { public PreviewFrameProcessedEventArgs() { } public PreviewFrameProcessedEventArgs( T processingResults, VideoFrame frame) { this.Results = processingResults; this.Frame = frame; } public T Results { get; set; } public VideoFrame Frame { get; set; } }
With those types in play, I can derive a specific PreviewFrameProcessor to run facial detection code and I wrote that as below;
using System; using System.Collections.Generic; using System.Linq; using System.Threading.Tasks; using Windows.Graphics.Imaging; using Windows.Media.Capture; using Windows.Media.FaceAnalysis; using Windows.Media.MediaProperties; class FaceDetectionFrameProcessor : PreviewFrameProcessor<IReadOnlyList<BitmapBounds>> { FaceDetector detector; public FaceDetectionFrameProcessor(MediaCapture capture, VideoEncodingProperties videoProperties) : base(capture, videoProperties) { } protected override async Task InitialiseForProcessingLoopAsync() { this.detector = await FaceDetector.CreateAsync(); } protected override BitmapPixelFormat BitmapFormat { get { var supportedBitmapFormats = FaceDetector.GetSupportedBitmapPixelFormats(); return (supportedBitmapFormats.First()); } } protected async override Task<IReadOnlyList<BitmapBounds>> ProcessBitmapAsync(SoftwareBitmap bitmap) { var faces = await this.detector.DetectFacesAsync(bitmap); return (faces.Select(f => f.FaceBox).ToList().AsReadOnly()); } }
and so the specifics here of facial detection are minimised into these few lines of code.
Consuming this from my main page requires a few changes because I now have the notion of starting this processing loop and then asynchronously requesting that it stop at a later point and so my OnStop handler becomes a matter of cancelling some CancellationToken that has been passed into that RunFrameProcessingLoopFunctionAsync method as per below;
namespace App183 { using Microsoft.ProjectOxford.Face; using Microsoft.ProjectOxford.Face.Contract; using System; using System.Diagnostics; using System.IO; using System.Threading; using System.Threading.Tasks; using Windows.Graphics.Imaging; using Windows.Storage.Streams; using Windows.UI; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; public sealed partial class MainPage : Page { public MainPage() { this.InitializeComponent(); } async void OnStart(object sender, RoutedEventArgs args) { this.requestStopCancellationToken = new CancellationTokenSource(); this.cameraPreviewManager = new CameraPreviewManager(this.captureElement); var videoProperties = await this.cameraPreviewManager.StartPreviewToCaptureElementAsync( vcd => vcd.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Front); this.faceDetectionProcessor = new FaceDetectionFrameProcessor( this.cameraPreviewManager.MediaCapture, this.cameraPreviewManager.VideoProperties); this.faceDetectionProcessor.FrameProcessed += (s, e) => { Debug.WriteLine($"Saw {e.Results.Count} faces"); }; try { await this.faceDetectionProcessor.RunFrameProcessingLoopAsync( this.requestStopCancellationToken.Token); } catch (OperationCanceledException) { } await this.cameraPreviewManager.StopPreviewAsync(); this.requestStopCancellationToken.Dispose(); } void OnStop(object sender, RoutedEventArgs e) { this.requestStopCancellationToken.Cancel(); } FaceDetectionFrameProcessor faceDetectionProcessor; CancellationTokenSource requestStopCancellationToken; CameraPreviewManager cameraPreviewManager; } }
and so this leaves me with a nice debug spew in Visual Studio as I show/hide my face to the camera;
What about drawing rectangles around the faces in question?
Step 3 – Drawing Rectangles Around Faces
There’s many ways in which this might be done – e.g. I could use a simple Canvas overlayed over the CaptureElement but in the end I decided to add Win2D into my solution which, at this point, is as simple as going to NuGet and bringing in Win2D;
Then I can layer a Win2D CanvasControl over the top of my existing CaptureElement;
The trick to drawing with Win2D is to handle its Draw event and this is immediate mode drawing rather than retained mode drawing and so every time the control gets invalidated, you have to draw the whole thing over.
I did wonder whether I might be wise to get rid of the CaptureElement at this point and attempt to hand over the drawing of each video frame and any rectangles highlighting recognised faces entirely to Win2D but I decided that this was probably overkill and that the CaptureElement was already doing a nice job for me so why disturb it?
With that in mind, I wrote a little class to take control of the CanvasControl for me, to Invalidate it whenever it receives new facial data from a caller;
namespace App183 { using Microsoft.Graphics.Canvas.UI.Xaml; using System; using System.Collections.Generic; using System.Threading; using Windows.Foundation; using Windows.Graphics.Imaging; using Windows.Media.MediaProperties; using Windows.UI; static class RectExtensions { public static Rect Inflate(this Rect startRect, Rect containingRect, double inflation) { double newWidth = startRect.Width * inflation; double newHeight = startRect.Height * inflation; double newLeft = Math.Max( containingRect.Left, startRect.Left - (newWidth - startRect.Width) / 2.0d); double newTop = Math.Max( containingRect.Top, startRect.Top - (newHeight - startRect.Height) / 2.0d); return (new Rect( newLeft, newTop, Math.Min(newWidth, (containingRect.Right - newLeft)), Math.Min(newHeight, (containingRect.Bottom - newTop)))); } } class FacialDrawingHandler { public FacialDrawingHandler( CanvasControl drawCanvas, VideoEncodingProperties videoEncodingProperties, Color strokeColour) { this.strokeColour = strokeColour; this.videoSize = new Size(videoEncodingProperties.Width, videoEncodingProperties.Height); this.drawCanvas = drawCanvas; this.drawCanvas.Draw += this.OnDraw; this.syncContext = SynchronizationContext.Current; } void OnDraw(CanvasControl sender, CanvasDrawEventArgs args) { var faces = this.latestFaceLocations; if (faces != null) { foreach (var face in faces) { var scaledBox = this.ScaleVideoBitmapBoundsToDrawCanvasRect(face); args.DrawingSession.DrawRectangle(scaledBox, this.strokeColour); } } } Rect ScaleVideoBitmapBoundsToDrawCanvasRect(BitmapBounds bounds) { Rect rect = new Rect( (((float)bounds.X / this.videoSize.Width) * this.drawCanvas.ActualWidth), (((float)bounds.Y / this.videoSize.Height) * this.drawCanvas.ActualHeight), (((float)bounds.Width) / this.videoSize.Width * this.drawCanvas.ActualWidth), (((float)bounds.Height / this.videoSize.Height) * this.drawCanvas.ActualHeight)); rect = rect.Inflate( new Rect(0, 0, this.drawCanvas.ActualWidth, this.drawCanvas.ActualHeight), INFLATION_FACTOR); return (rect); } internal void SetLatestFrameReceived(IReadOnlyList<BitmapBounds> faceLocations) { this.latestFaceLocations = faceLocations; this.syncContext.Post( _ => { this.drawCanvas.Invalidate(); }, null); } SynchronizationContext syncContext; IReadOnlyList<BitmapBounds> latestFaceLocations; Size videoSize; CanvasControl drawCanvas; Color strokeColour; const double INFLATION_FACTOR = 1.5d; } }
and I can then plug that in to my MainPage by;
namespace App183 { using Microsoft.ProjectOxford.Face; using Microsoft.ProjectOxford.Face.Contract; using System; using System.Diagnostics; using System.IO; using System.Threading; using System.Threading.Tasks; using Windows.Graphics.Imaging; using Windows.Storage.Streams; using Windows.UI; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; public sealed partial class MainPage : Page { public MainPage() { this.InitializeComponent(); } string CurrentVisualState { get { return (this.currentVisualState); } set { if (this.currentVisualState != value) { this.currentVisualState = value; this.ChangeStateAsync(); } } } async Task ChangeStateAsync() { await Dispatcher.RunAsync( Windows.UI.Core.CoreDispatcherPriority.Normal, () => { VisualStateManager.GoToState(this, this.currentVisualState, false); } ); } async void OnStart(object sender, RoutedEventArgs args) { this.CurrentVisualState = "Playing"; this.requestStopCancellationToken = new CancellationTokenSource(); this.cameraPreviewManager = new CameraPreviewManager(this.captureElement); var videoProperties = await this.cameraPreviewManager.StartPreviewToCaptureElementAsync( vcd => vcd.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Front); this.faceDetectionProcessor = new FaceDetectionFrameProcessor( this.cameraPreviewManager.MediaCapture, this.cameraPreviewManager.VideoProperties); this.drawingHandler = new FacialDrawingHandler( this.drawCanvas, videoProperties, Colors.White); this.faceDetectionProcessor.FrameProcessed += (s, e) => { // This event will fire on the task thread that the face // detection processor is running on. this.drawingHandler.SetLatestFrameReceived(e.Results); }; try { await this.faceDetectionProcessor.RunFrameProcessingLoopAsync( this.requestStopCancellationToken.Token); } catch (OperationCanceledException) { } await this.cameraPreviewManager.StopPreviewAsync(); this.requestStopCancellationToken.Dispose(); this.CurrentVisualState = "Stopped"; } void OnStop(object sender, RoutedEventArgs e) { this.requestStopCancellationToken.Cancel(); } string currentVisualState; FaceDetectionFrameProcessor faceDetectionProcessor; CancellationTokenSource requestStopCancellationToken; CameraPreviewManager cameraPreviewManager; FacialDrawingHandler drawingHandler; } }
and that all seems to work out relatively well in that I’ve now got a bounding box drawn around my silly face;
and, so far, we didn’t leave the local machine but, given that crossing the network is both an expensive and risky business, at least the local machine now feels confident about whether it actually has a face in front of it or not to make that journey across the internet a worthwhile idea.
Off Device – Asking Oxford
Now that the app ‘knows’ whether it is looking at one or more faces, maybe it could talk to Project Oxford to learn something more about those faces.
Step 1 – Extra UI for Detected Faces
In the first instance, I’m going to add another button and another Visual State to my UI for the visual state where we are displaying video and we know that there is a face present in those frames and I’m also going to add some controls to display some of the basic results that I might get back from Project Oxford APIs.
The UI expands to include;
<Page x:Class="App183.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:App183" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d"> <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <VisualStateManager.VisualStateGroups> <VisualStateGroup x:Name="FaceProcessingState"> <VisualState x:Name="Stopped" /> <VisualState x:Name="Playing"> <VisualState.Setters> <Setter Target="btnPlay.(UIElement.Visibility)" Value="Collapsed" /> <Setter Target="btnStop.(UIElement.Visibility)" Value="Visible" /> </VisualState.Setters> </VisualState> <VisualState x:Name="PlayingWithFace"> <VisualState.Setters> <Setter Target="btnOxford.(UIElement.Visibility)" Value="Visible" /> </VisualState.Setters> </VisualState> </VisualStateGroup> </VisualStateManager.VisualStateGroups> <Grid.RowDefinitions> <RowDefinition Height="5*" /> <RowDefinition Height="*" /> <RowDefinition Height="*" /> </Grid.RowDefinitions> <Grid.ColumnDefinitions> <ColumnDefinition Width="*" /> <ColumnDefinition Width="*" /> <ColumnDefinition Width="*" /> </Grid.ColumnDefinitions> <CaptureElement x:Name="captureElement" Stretch="Fill" Grid.ColumnSpan="3" Grid.RowSpan="3" /> <wtwod:CanvasControl xmlns:wtwod="using:Microsoft.Graphics.Canvas.UI.Xaml" x:Name="drawCanvas" HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Grid.ColumnSpan="3" Grid.RowSpan="3" /> <Viewbox Grid.Row="1" Grid.Column="1"> <StackPanel Orientation="Horizontal" Grid.Column="1" Grid.Row="1"> <Button x:Name="btnPlay" Click="OnStart"> <SymbolIcon Symbol="Play" /> </Button> <Button x:Name="btnStop" Click="OnStop" Visibility="Collapsed"> <SymbolIcon Symbol="Stop" /> </Button> <Button x:Name="btnOxford" Click="OnSubmitToOxfordAsync" Visibility="Collapsed"> <SymbolIcon Symbol="Camera" /> </Button> </StackPanel> </Viewbox> <Grid Background="#CC000000" Grid.ColumnSpan="3" Grid.RowSpan="3" x:Name="progressIndicator" Visibility="Collapsed"> <ProgressRing VerticalAlignment="Center" HorizontalAlignment="Center" Width="50" Height="50" Foreground="White" IsActive="True" /> </Grid> <StackPanel x:Name="stackpanel" Grid.RowSpan="3" Grid.ColumnSpan="3" HorizontalAlignment="Stretch" Background="Black" VerticalAlignment="Bottom"> <TextBlock FontSize="24" TextAlignment="Center" x:Name="txtGender" /> <TextBlock FontSize="24" TextAlignment="Center" x:Name="txtAge" /> </StackPanel> </Grid> </Page>
and so all I’ve done is added a new button (btnOxford) and a new visual state (PlayingWithFace) which will show that button and then some textboxes which can display Age/Gender.
I can enable that additional button to show up just by modifying my code behind slightly so that the event handler for the FrameProcessed event changes the visual state depending on whether it sees 0 or more faces;
this.faceDetectionProcessor.FrameProcessed += (s, e) => { // This event will fire on the task thread that the face // detection processor is running on. this.drawingHandler.SetLatestFrameReceived(e.Results); this.CurrentVisualState = e.Results.Count > 0 ? "PlayingWithFace" : "Playing"; };
and so now I’ve got a UI where an extra button pops up at the point where the app has recognised that there’s definitely at least one face in front of the camera.
Step 2 – Enabling Oxford
The next step for me was to enable the Project Oxford facial APIs on the Azure marketplace in order to get an access key. The image below jumps to the page with the ‘sign up’ button;
and then, for me, it was a matter of going to the Azure Marketplace on the Azure management portal and searching for, then registering for, the currently free access to the APIs;
with that in place, I’d got some access keys but I needed to know how to make the connection from my app and to do that I searched NuGet and installed the NuGet package called;
MatVelloso.Microsoft.ProjectOxford.Face
which I’m guessing comes from http://www.matvelloso.com/
The next steps largely involve manipulating bitmaps and streams and buffers.
I have my FaceDetectionProcessor firing FaceFameProcessed events frequently from a background thread and those frames contain the VideoFrame containing a SoftwareBitmap.
That bitmap though is in a format that the face detection algorithms locally on the machine can support – it turns out to be Nv12. I have 2 problems;
- The code I’ve written expects that once the event handlers which consume the VideoFrame have completed, the VideoFrame can be overwritten with the next frame. That is – there is no opportunity to do async work (like calling REST APIs over the web) in my FrameProcessed event handlers without changing things dramatically.
- The Project Oxford APIs aren’t going to support Nv12 as a bitmap format – I’m going to need to use a different format in talking to Project Oxford.
- Translating one bitmap format to another is going to involve async work so can’t be done in my FrameProcessed handler.
My solution to this (so far) works something like;
- When the ‘submit to Oxford’ button gets pressed, we set up a task completion source ‘flag’.
- The processing loop will spot that task completion source and will copy the next video frame into a new SoftwareBitmap (synchronously) whenever that flag is set.
- The ‘submit to Oxford’ button handler will then wait for that asynchronous work to complete before transcoding the bitmap into a format that Project Oxford can handle, submitting it to the APIs (via Matt’s code) and displaying the results.
The only minor snag at step 3 is that I found that I couldn’t go straight from Nv12 to (e.g.) PNG or JPEG and I had to go through an intermediate step but, regardless, the code worked out as;
using Microsoft.ProjectOxford.Face; using Microsoft.ProjectOxford.Face.Contract; using System; using System.IO; using System.Threading; using System.Threading.Tasks; using Windows.Graphics.Imaging; using Windows.Storage.Streams; using Windows.UI; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; public sealed partial class MainPage : Page { public MainPage() { this.InitializeComponent(); } string CurrentVisualState { get { return (this.currentVisualState); } set { if (this.currentVisualState != value) { this.currentVisualState = value; this.ChangeStateAsync(); } } } async Task ChangeStateAsync() { await Dispatcher.RunAsync( Windows.UI.Core.CoreDispatcherPriority.Normal, () => { VisualStateManager.GoToState(this, this.currentVisualState, false); } ); } async void OnStart(object sender, RoutedEventArgs args) { this.CurrentVisualState = "Playing"; this.requestStopCancellationToken = new CancellationTokenSource(); this.cameraPreviewManager = new CameraPreviewManager(this.captureElement); var videoProperties = await this.cameraPreviewManager.StartPreviewToCaptureElementAsync( vcd => vcd.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Front); this.faceDetectionProcessor = new FaceDetectionFrameProcessor( this.cameraPreviewManager.MediaCapture, this.cameraPreviewManager.VideoProperties); this.drawingHandler = new FacialDrawingHandler( this.drawCanvas, videoProperties, Colors.White); this.faceDetectionProcessor.FrameProcessed += (s, e) => { // This event will fire on the task thread that the face // detection processor is running on. this.drawingHandler.SetLatestFrameReceived(e.Results); this.CurrentVisualState = e.Results.Count > 0 ? "PlayingWithFace" : "Playing"; this.CopyBitmapForOxfordIfRequestPending(e.Frame.SoftwareBitmap); }; try { await this.faceDetectionProcessor.RunFrameProcessingLoopAsync( this.requestStopCancellationToken.Token); } catch (OperationCanceledException) { } await this.cameraPreviewManager.StopPreviewAsync(); this.requestStopCancellationToken.Dispose(); this.CurrentVisualState = "Stopped"; } void CopyBitmapForOxfordIfRequestPending(SoftwareBitmap bitmap) { if ((this.copiedVideoFrameComplete != null) && (!this.copiedVideoFrameComplete.Task.IsCompleted)) { // We move to RGBA16 because that is a format that we will then be able // to use a BitmapEncoder on to move it to PNG and we *cannot* do async // work here because it'll break our processing loop. var convertedRgba16Bitmap = SoftwareBitmap.Convert(bitmap, BitmapPixelFormat.Rgba16); this.copiedVideoFrameComplete.SetResult(convertedRgba16Bitmap); } } void OnStop(object sender, RoutedEventArgs e) { this.requestStopCancellationToken.Cancel(); } async void OnSubmitToOxfordAsync(object sender, RoutedEventArgs e) { // Because I constantly change visual states in the processing loop, I'm just doing // this with some code rather than with visual state changes because those would // get quickly overwritten while this work is ongoing. this.progressIndicator.Visibility = Visibility.Visible; // We create this task completion source which flags our main loop // to create a copy of the next frame that comes through and then // we pick that up here when it's done... this.copiedVideoFrameComplete = new TaskCompletionSource<SoftwareBitmap>(); var bgra16CopiedFrame = await this.copiedVideoFrameComplete.Task; this.copiedVideoFrameComplete = null; InMemoryRandomAccessStream destStream = new InMemoryRandomAccessStream(); // Now going to JPEG because Project Oxford can accept those. BitmapEncoder encoder = await BitmapEncoder.CreateAsync(BitmapEncoder.JpegEncoderId, destStream); encoder.SetSoftwareBitmap(bgra16CopiedFrame); await encoder.FlushAsync(); FaceServiceClient faceService = new FaceServiceClient(OxfordApiKey); Face[] faces = await faceService.DetectAsync( destStream.AsStream(), true, true, true, true); // We now get a bunch of face data for each face but I'm ignoring most of it // (like facial landmarks etc) and simply displaying the guess of the age // and the gender for the moment. if (faces != null) { txtAge.Text = faces[0].Attributes.Age.ToString(); txtGender.Text = faces[0].Attributes.Gender.ToString(); } else { txtAge.Text = "no age"; txtGender.Text = "no gender"; } this.progressIndicator.Visibility = Visibility.Collapsed; } string currentVisualState; FaceDetectionFrameProcessor faceDetectionProcessor; CancellationTokenSource requestStopCancellationToken; CameraPreviewManager cameraPreviewManager; FacialDrawingHandler drawingHandler; static readonly string OxfordApiKey = "sorry you need to get your own key for Oxford; volatile TaskCompletionSource<SoftwareBitmap> copiedVideoFrameComplete; }
and we can then, finally, see that in action in this little screen capture below;
If you want the code for this one, it’s in a ZIP file here for download – you’ll need your own Project Oxford API key though!