This post is just to share some code that I wrote quite a while ago for fun.
At Microsoft’s UK campus there are some corridors that have gates which allow people to freely walk in one direction but which only allow people with the right ID cards to go in the other.
This set me wondering whether it might be possible to develop code with the Kinect for Windows V2 which monitored people walking towards the camera and;
- Took skeletal measurements as they approached the camera to build up average ‘limb lengths’ based on some set of limbs and measurement counts that can be configured.
- Grabbed a head and shoulders photo of each person based on knowledge of where their head and shoulders are within the camera’s view.
- Stored both of (1) and (2) in the cloud using some configurable table and blob storage.
- Scanned previously stored measurements in the cloud to determine whether a person of very similar ‘shape and size’ has been previously seen based on some configurable tolerance value.
- Ultimately, opened the door based on recognising the user’s skeleton.
Now, in truth, I’ve never really got this to work I’m not sure whether the idea of a ‘skeletal fingerprint’ is a flawed idea in the first place but I got way too many false positives to consider asking Microsoft to use my new system for their security
but I had some fun in putting it together.
However, this was prior to the arrival of Microsoft’s Cognitive Services and I may revisit the code in coming weeks/months to see if I can make a better attempt at it in the light of those new services coming along as perhaps I can combine my not so successful idea of ‘skeletal fingerprint’ with an additional call to Cognitive Services to do facial identification and produce a better result than I previously did.
As an aside, this seems like a reasonable use of Cognitive Service’s facial identification service. I wouldn’t want a security camera constantly sending data to the cloud but, instead, I’d want to pair it with some smart camera which knew when new people entered/exited the video frames and could capture a photo of those people and send it off to the cloud for identification. That avoids lots of (costly) extraneous calls to the cloud.
That’s in the future. For now, I’m just sharing this code in case anyone wants to play with it and develop it.
Where’s the Code?
The code I’m talking about is in this GitHub repository;
What’s with the Weird Code Structure?
The structure of this code is a bit unusual in that I wanted to be able to build out both a WPF application and a Windows application from the same code. The Windows application is a Windows 8.1 app rather than a UWP app because there isn’t a Kinect for Windows V2 SDK for UWP and so this code targets Windows 8.1 but will, naturally, run on Windows 10.
The Kinect for Windows V2 SDK is structured such that, with one or two conditional compilation statements around namespaces, it’s possible to write the same code for WinRT and WPF and so that’s what I set out to do although it does mean that I haven’t (e.g.) used something like Win2D for some drawing because Win2D only exists in WinRT, not in WPF.
If you look at the code structure, you’ll see that there is a project called WpfApp and another called WinRTApp and neither one seems to contain much code because they both share a lot of code by referencing a shared project named SharedApp.
and most of the code for these 2 apps is in that project and is largely identical across WPF and WinRT;
Within that code you’ll find quite a bit of;
but, hopefully, not too much.
Each app project also references 3 library projects named BodyReaders, Measurements and Storage.
and so each of these libraries needs to be built both for WPF and for WinRT in order to pick up the different dependencies across those two platforms.
The same ‘shared project’ approach is taken again such that there are 3 shared library projects of code;
and then there’s a NetFramework folder which contains 3 library projects with each referencing the source from the corresponding shared project and there’s a WinRT folder which does the same.
and, in this way, we have a WinRT App and a WPF app sharing a lot of source and both taking dependencies on 3 libraries which also share common source that then needs to be built for WinRT and .NET Framework respectively.
How Is It Configured?
There are elements of configuration needed here whether running the WinRT app or the WPF app. The default configuration lives in 2 files;
The global.json files configures items as below;
{ "CloudAccount": "your cloud storage account goes here", "CloudKey": "your cloud storage account key goes here", "CloudTable": "skeletalData", "MeasurementTolerance": "0.05", "LeastSquaresTolerance" : "10.0", "MeasurementFrameCount": "50", "IrScaleRange": "200", "IrScaleLow": "30", "CloudBlobContainerName": "skeletalphotos", "CloudRowScanSize" : "5" }
The idea of the values CloudAccount and CloudKey is that they will be fed into the constructor of a StorageCredentials instance in order to talk to Azure storage.
The CloudTable value is the name of a table within the storage account which will be used to store key/value pairs relating to skeletal data captured.
The CloudBlobContainerName value is the name of a blob container within which the app will store photos captured.
The measurements.json file configures items as below;
{ "Name": "Spine", "Start": "SpineBase", "End": "SpineShoulder", "IsPartitionKey": "true" }, { "Name": "Chest", "Start": "ShoulderLeft", "End" : "ShoulderRight" },
This is defining the set of measurements that we want to capture for each user in terms of ‘limbs’. The names of the Start/End values need to correspond to a value from the JointType enumeration from the Kinect SDK.
The values of the Name properties will be used as keys within Azure table storage.
Each of these ‘limb’ measurements will be computed MeasurementFrameCount times before an average value is produced and used.
Note that one limb is defined as IsPartitionKey and the average value of that measurement will be used to partition data within the Azure table. However, the ‘raw’ average value of that measurement is not used. Instead, the value is multiplied by 10 and then its integral part is taken such that a person with a 58cm spine length and another with a 51cm spine length would both live in a partition keyed off the value ‘5’.
When a person is encountered and average measurements have been taken, the app attempts to ‘find’ them in Azure table storage by;
- Matching on the partition key.
- Retrieving up to CloudRowScanSize rows from table storage where the values for each limb measurement that has been computed are within MeasurementTolerance of the measurement captured. It may be that the cloud table storage has more/fewer limb measurements than that currently configured for the app but that shouldn’t matter too much.
Once the app has a candidate set of rows from Azure table storage which may or may not match the individual being matched, it then computes a “sum of least squares” difference across all the average measurements and selects the row with the lowest difference that is also less than the LeastSquaresTolerance value.
The IrScaleLow and IrScaleRange values are used to scale data from the Infra Red frames off the camera to be between the low value and the high value. Note that this only comes into play if the UseInfrared property of the VideoControl user control is set to true which it is not in the code base at the time of writing.
That’s all of the configuration.
What Does Running the App Do?
I can run either the WPF application or the WinRT application and (as long as I’ve got my Kinect for Windows V2 camera plugged in) I’ll get a video (or IR) display like the one below.
Once the app has recognised that a person has arrived in the frame, it will start to do calculations;
Once it has done calculations, it will display a floating panel showing the results;
and it will then visit the cloud to see if it can make a match against an existing table record. If not, the circle will fill orange;
and the person will be given a new GUID and an entry will be added to Azure table storage with the measurements captured as shown below;
and a new blob dropped into the configured photos table with the same GUID;
containing the photo;
and that’s pretty much it.
If the system does recognise the user based on the measurements then it will display a green-filled circle;
which indicates that it will not create a new GUID and a new entry in the Azure table and blob storage as the person’s been recognised.
Wrapping Up
I’ve had this code for a long time but have never written it up so I found a few spare cycles today to do that.
It was built 100% ‘just for fun’ but I thought I’d share in case anyone else was interested or wanted to take pieces of it and improve it
What I’d perhaps like to do next is to extend this by adding in some use of facial identification via Cognitive Services to see if I could build a system that worked a little better by using a combination of skeletal measurements with facial identity such that both mechanisms were used to determine identify.
That would take a little bit of work – I’ll post an update as/when I get there.