UWP Facial Detection Library

A short post. I took some of the code that I once had in this post around doing face detection using the UWP APIs to do on-device face detection and I built it into a library which I’ve dropped here;

https://github.com/mtaulty/Facial

The idea is to have a simple class named FaceWatcher which gets hold of a camera stream, runs face detection on it and fires an event (InFrameFaceCountChanged) whenever it’s view of the number of faces in front of the camera changes.

I built a simple test app to wrap around that and included it in the repo where the code to create a FaceWatcher on a default camera becomes as simple as;

this.faceWatcher = new FaceWatcher(
  new CameraDeviceFinder(
    deviceInformation =>
    {
        return (deviceInformation.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Front);
    }
  )
);

and the code to then handle the events when the number of faces changes becomes;

this.cancelTokenSource = new CancellationTokenSource();
this.faceWatcher.InFrameFaceCountChanged += OnFaceCountChanged;

try
{
   await this.faceWatcher.CaptureAsync(this.cancelTokenSource.Token);
}
catch (TaskCanceledException)
{
  this.faceWatcher.InFrameFaceCountChanged -= this.OnFaceCountChanged;
}

and the test app supplied simply has buttons for Start/Stop and a TextBlock to display the number of faces on screen.

That’s it – my hope is to couple this with a library which can sit on top of it and use the Cognitive Services facial detection APIs to then run identification such that instead of being able to simply flag how many faces are currently in front of the camera, I can use identification to tag when new faces enter/leave the frame by building a short term memory of the faces that have already been seen.

I think that’s all quite achievable with perhaps the main challenge being the one of figuring out the right time and frequency with which to call that API given that invocation involves an HTTP request which comes with a cost both in terms of the round-trip time and a real financial cost which means that I can only realistically call that type of API so many times per minute/hour.

But that’s for another post…:-)

Grabbing a Photo & Calling Azure Vision API from ‘Pure UWP’ Code

Just using a blog post as a pastie – I had cause to write a function today to take a photo from a camera on a UWP device and send it to the Azure Cognitive Service for Vision to ask for ‘tags’ from the image.

The intention was to call the function from a UWP-specific Unity app running on HoloLens (the code does work on HoloLens). It would need camera, microphone and internet client capabilities to work.

It’s very specific to one task but, clearly, could be wrapped up into some class that made it a lot more general-purpose and exercised more pieces of that API but I just wanted somewhere to put the code in case I want it again in the future. This is what I had…

static async Task<Dictionary<string,double>> TakePhotoAnalyzeAzureForTagsAsync(
            string azureVisionKey,
            string azureVisionBaseEndpoint = "https://westeurope.api.cognitive.microsoft.com")
        {
            var azureVisionApi = "/vision/v1.0/analyze?visualFeatures=Tags";
            Dictionary<String,double> resultDictionary = null;

            // Capture an image from the camera.
            var capture = new MediaCapture();

            await capture.InitializeAsync();

            var stream = new InMemoryRandomAccessStream();

            await capture.CapturePhotoToStreamAsync(
                ImageEncodingProperties.CreateJpeg(), stream);

            stream.Seek(0);

            // Now send that off to Azure for processing
            var httpClient = new HttpClient();
            httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", azureVisionKey);

            var streamContent = new HttpStreamContent(stream);
            streamContent.Headers["Content-Type"] = "application/octet-stream";

            var response = await httpClient.PostAsync(
                new Uri(azureVisionBaseEndpoint + azureVisionApi),
                streamContent);

            if (response.IsSuccessStatusCode)
            {
                var responseString = await response.Content.ReadAsStringAsync();
                JsonObject jsonObject;

                if (JsonObject.TryParse(responseString, out jsonObject))
                {
                    resultDictionary = 
                        jsonObject.GetNamedArray("tags").ToDictionary(
                            tag => tag.GetObject().GetNamedString("name"),
                            tag => tag.GetObject().GetNamedNumber("confidence"));
                }
            }
            return (resultDictionary);
        }

As a related aside, I also revisited this blog post recently as part of looking back at Cognitive Services and its facial APIs and I realised that code needed changing a bit to make it work and so I did that work and dropped it onto github over here;

Example of using both local UWP face detection and Cognitive Service face detection in a UWP app

Hands, Gestures and Popping back to ‘Prague’

Just a short post to follow up on this previous post;

Hands, Gestures and a Quick Trip to ‘Prague’

I said that if I ‘found time’ then I’d revisit that post and that code and see if I could make it work with the Kinect for Windows V2 sensor rather than with the Intel RealSense SR300 which I used in that post.

In all honesty, I haven’t ‘found time’ but I’m revisiting it anyway Smile

I dug my Kinect for Windows V2 and all of its lengthy cabling out of the drawer, plugged it into my Surface Book and … it didn’t work. Instead, I got the flashing white light which usually indicates that things aren’t going so well.

Not to be deterred, I did some deep, internal Microsoft research (ok, I searched the web Smile) and came up with this;

Kinect Sensor is not recognized on a Surface Book

and getting rid of the text value within that registry key sorted out that problem and let me test that my Kinect for Windows V2 was working in the sense that the configuration verifier says;

image

which after many years of experience I have learned to interpret as “Give it a try!” Winking smile and so I tried out a couple of the SDK samples and they worked fine for me and so I reckoned I was in a good place to get started.

However, the Project Prague bits were not so happy and I found they were logging a bunch of errors in the ‘Geek View’ about not being able to connect/initialise to either the SR300 or the Kinect camera.

This seemed to get resolved by me updating my Kinect drivers – I did an automatic update and Windows found new drivers online which took me to this version;

image

which I was surprised that I didn’t have already as it’s quite old but that seemed to make the Project Prague pieces happy and the Geek View is back in business showing output from Kinect;

image

and from the little display window on the left there it felt like this operated at a range of approx 0.5m to 1.0m. I wondered whether I could move further away but that didn’t seem to be the case in the quick experiment that I tried.

The big question for me then was whether the code that I’d previously written and run against the SR300 would “just work” on the Kinect for Windows V2 and, of course, it does Smile Revisit the previous post for the source code if you’re interested but I found my “counting on four fingers” gesture was recognised quickly and reliably here;

image

This is very cool – it’d be interesting to know exactly what ‘Prague’ relies on from the perspective of the camera and also from the POV of system requirements (CPU, RAM, GPU, etc) in order to make this work but it looks like they’ve got a very decent system going for recognising hand gestures across different cameras.