Experiments with Shared Holographic Experiences and AllJoyn (Spoiler Alert: this one does not end well!)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

This post is an interesting one in that it represents two sides of a day or so of technical failure! I tried to get something up and running and it didn’t quite work out but I thought it was worth sharing regardless but apply a large pinch of salt as the result isn’t anything that works too well Smile

Background – Catching up with AllJoyn

It’s been a little while since I looked at AllJoyn and the AllSeen Alliance.

In fact, long enough that the landscape has changed with the merger between the OCF (IoTivity) and the AllSeen Alliance (AllJoyn) with the result that (from the website);

Both projects will collaborate to support future versions of the OCF specification in a single IoTivity implementation that combines the best of both technologies into a unified solution. Current devices running on either AllJoyn or IoTivity solutions will be interoperable and backward-compatible. Companies already developing IoT solutions based on either technology can proceed with the confidence that their products will be compatible with the unified IoT standard that the industry has been asking for.

and so I guess that any use of AllJoyn at this point has to be seen against that backdrop.

Scenario – Using AllJoyn to Share Holograms Across Devices

With that said, I still have AllJoyn APIs in Windows 10 and I wanted to see whether I could use those as a basis for the sharing of Holograms across devices.

The idea would be that a HoloLens app becomes both an AllJoyn consumer and producer and so each device on a network could find all the other devices and then share data such that a common co-ordinate system could be established via world anchors and hologram creation/removal and positioning could also be shared.

If you’re not so familiar with this idea of shared holographic experiences then the official documentation lives here;

https://developer.microsoft.com/en-us/windows/mixed-reality/development 

and I’ve written a number of posts around this area of which this one would be the most recent;

Hitchhiking the HoloToolkit-Unity, Leg 13–Continuing with Shared Experiences

My idea for AllJoyn was to avoid any central server and let it do the work of having devices discover each other and connect so that they could form a network where they consume each others’ services as below;

image

Now, I don’t think that I’d tried this previously but there was something nagging at the back of my mind about the AllJoyn APIs not being something that I could use in the way that I wanted to on HoloLens and I should really have taken heed.

Instead, I ploughed ahead…

Making a “Hello World” AllJoyn Interface

I figured that the “Hello World” here might involve a HoloLens app offering an interface whereby a remote caller could create some object (e.g. a cube) at a particular location in world space.

That seemed like a simple enough thing to figure out and so I sketched out a quick interface for AllJoyn to do something like that;

<interface name=”com.taulty.RemoteHolograms”>
   <method name=”CreateCube”>
     <arg name=”x” type=”d” direction=”in” />
     <arg name=”y” type=”d” direction=”in” />
     <arg name=”z” type=”d” direction=”in” />
   </method>
</interface>

I then struggled a little with respect to knowing where to go next in that the tool alljoyncodegen.exe which I used to use to generate UWP code from an AllJoyn interface seemed to have disappeared from the SDKs.

I found this doc page which suggested that the tool was deprecated and for a while I landed on this page which sounded similar but turned out have only been tested on Linux and not to generate UWP code so was actually very different Smile

I gave up on the command line tool and went off to try and find AllJoyn Studio which does seem to still exist but only for Visual Studio 2015 which was ‘kind of ok’ because I still have VS2015 on my system alongside VS2017.

Whether all these changes are because of the AllJoyn/IoTivity merging, I’ve no idea but it certainly foxed me for a little while.

Regardless, I fed AllJoyn Studio in Visual Studio 2015 my little XML interface definition;

image

and it spat out some C++/CX code providing me a bunch of boiler plate AllJoyn implementation code which I retargeted to platform version 14393 as the tool seemed to favour 10586;

image

Writing Some Code for Unity to Consume

At this point, I was a little over-zealous in that I thought that the next step would be to try and write a library which would make it easy for Unity to make use of the code that I’d just had generated.

So, I went off and made a C# library project that referenced the newly generated code and I wrote this code to sit on top of it although I never really got around to figuring out whether this code worked or not as we’ll see in a moment;

namespace AllJoynHoloServer
{
  using com.taulty.RemoteHolograms;
  using System;
  using System.Threading.Tasks;
  using Windows.Devices.AllJoyn;
  using Windows.Foundation;

  public class AllJoynCreateCubeEventArgs : EventArgs
  {
    internal AllJoynCreateCubeEventArgs()
    {
        
    }
    public double X { get; internal set; }
    public double Y { get; internal set; }
    public double Z { get; internal set; }
  }

  public class ServiceDispatcher : IRemoteHologramsService
  {
    public ServiceDispatcher(Action<double, double, double> handler)
    {
      this.handler = handler;
    }
    public IAsyncOperation<RemoteHologramsCreateCubeResult> CreateCubeAsync(
      AllJoynMessageInfo info,
      double x,
      double y,
      double z)
    {
      return (this.CreateCubeAsyncInternal(x, y, z).AsAsyncOperation());
    }
    async Task<RemoteHologramsCreateCubeResult> CreateCubeAsyncInternal(
      double x, double y, double z)
    {
      // Likelihood that the thread we're calling from here is going to
      // break Unity's threading model.
      this.handler?.Invoke(x, y, z);

      return (RemoteHologramsCreateCubeResult.CreateSuccessResult());
    }
    Action<double, double, double> handler;
  }
  public static class AllJoynEventAdvertiser
  {
    public static event EventHandler<AllJoynCreateCubeEventArgs> CubeCreated;

    public static void Start()
    {
      if (busAttachment == null)
      {
        busAttachment = new AllJoynBusAttachment();
        busAttachment.AboutData.DateOfManufacture = DateTime.Now;
        busAttachment.AboutData.DefaultAppName = "Remote Holograms";
        busAttachment.AboutData.DefaultDescription = "Creation and Manipulation of Holograms";
        busAttachment.AboutData.DefaultManufacturer = "Mike Taulty";
        busAttachment.AboutData.ModelNumber = "Number One";
        busAttachment.AboutData.SoftwareVersion = "1.0";
        busAttachment.AboutData.SupportUrl = new Uri("http://www.mtaulty.com");

        producer = new RemoteHologramsProducer(busAttachment);
        producer.Service = new ServiceDispatcher(OnCreateCube);
        producer.Start();
      }
    }
    public static void Stop()
    {
      producer?.Stop();
      producer?.Dispose();
      busAttachment?.Disconnect();
      producer = null;
      busAttachment = null;
    }
    static void OnCreateCube(double x, double y, double z)
    {
      CubeCreated?.Invoke(
        EventArgs.Empty,
        new AllJoynCreateCubeEventArgs()
        {
          X = x,
          Y = y,
          Z = z
        });
    }
    static AllJoynBusAttachment busAttachment;
    static RemoteHologramsProducer producer;
  }
}

There’s nothing particularly exciting or new here, it’s just a static class which hides the underlying AllJoyn code from anything above it and it offers the IRemoteHologramsService over the network and fires a static event whenever some caller remotely invokes the one method on that interface.

I thought that this would be pretty easy for Unity to consume and so I dragged the DLLs into Unity (as per this post) and then added the script below to a blank GameObject to run when the app started up;

#if UNITY_UWP && !UNITY_EDITOR
using AllJoynHoloServer;
#endif
using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Startup : MonoBehaviour
{
  // Use this for initialization
  void Start()
  {
#if UNITY_UWP && !UNITY_EDITOR
    AllJoynEventAdvertiser.CubeCreated += this.OnCubeCreated;
    AllJoynEventAdvertiser.Start();
#endif
  }
  #if UNITY_UWP && !UNITY_EDITOR
  void OnCubeCreated(object sender, AllJoynCreateCubeEventArgs args)
  {

  }
  
#endif

}

Clearly, this isn’t fully formed or entirely thought through but I wanted to just see if I could get something up and running and so I tried to debug this code having, first, made sure that my Unity app was asking for the AllJoyn security capability.

Debugging the AllJoyn Experience

I used the IoT AllJoyn Explorer to try and see if my Unity app from the HoloLens was advertising itself correctly on the local network.

That app still comes from the Windows Store and I’ve used it before and it’s always been pretty good for me.

It took me a little while to remember that I need to think about loopback exemption when it comes to this app so that’s worth flagging here.

I found that when I ran my Unity code on the HoloLens, I didn’t seem to see the service being advertised on the AllJoyn network as displayed by the IoT Explorer. I only ended up seeing a blank screen;

image

in order to sanity check this, I ended up lifting the code that I had built into Unity out of that environment and into a 2D XAML experience to run on my local PC where things lit up as I’d expect – i.e. IoT Explorer then shows;

image

and so the code seemed to be ok and, at this point, I realised that I could have tested out the whole concept in 2D XAML and never dropped into Unity at all – there’s a lesson in there somewhere! Smile

Having proved the point on my PC, I also ran the same code on my phone and saw a similar result – the service showed up on the network.

However, no matter how I went about it I couldn’t get HoloLens to advertise this AllJoyn service and so I have to think that perhaps that part of AllJoyn isn’t present on HoloLens today.

That doesn’t surprise me too much and I’ve tried to confirm it and will update this post if I get a confirmation either way.

If this is the case though, what might be done to achieve my original aim which was to use AllJoyn as the basis of a shared holographic experience across devices.

I decided that there was more than one way to achieve this…

HoloLens as AllJoyn Consumer, not Producer

It wasn’t what I’d originally had in mind but I figured I could change my scenario such that the HoloLens did not offer an AllJoyn service to be consumed but, instead, consumed an AllJoyn service offered by a device (like a PC or Phone) which could offer a service onto the network. The diagram then becomes…

image

and so there’s a need for an extra player here (the PC) and also a need for using events or signals in AllJoyn to inform devices when something has happened on ‘the server’ that they need to know about such as a world anchor being created and uploaded or a new hologram being created.

A New Interface with Signals

I figured that I was onto a winner and so set about implementing this. Firstly, I conjured up a basic interface to try and model the code scenarios of;

  1. Devices join/leave the network and other devices might want to know how many devices they are in a shared experience with.
  2. World anchors get uploaded to the network and other devices might want to import them in order to add a common co-ordinate system.
  3. Holograms get added by a user (I’m assuming that they would be added as a child of a world anchor and so there would be an association there).
  4. Holograms get removed by a user.

In addition, I’d also want to add the ability for the transforms (scale, rotate, translate) of a hologram to be changed but I left that to one side while trying to see if I could get these bits to work.

With this in mind, I created a new AllJoyn interface as below;

AllJoyn Interface on Github

and once again I should perhaps have realised that things weren’t going too well here.
Everything about that interface is probably self-explanatory except a couple of small things;
  1. I used GUIDs to identify world anchors and holograms.
  2. I’m assuming that types of holograms can be represented by simple strings – e.g. device 1 may create a hologram of an Xmas tree and I’m assuming that device 2 will get a string “Xmas tree” and know what to do with it.
  3. I wanted to have a single method which returns the [Id/Name/Type/Position] of a hologram but I found that this broke the code generation tool hence my methods GetHologramIdsAndNames and GetHologramTransforms – these should really be one method.

and a larger thing;

  • My original method for AddWorldAnchor simply took a byte array but I later discovered that AllJoyn seems to have a maximum message size of 128K whereas world anchors can be megabytes and so I added a capability here to “chunk” the world anchor into pieces and that affected this method and also the GetWorldAnchor method.

I should have stopped at this point as this limitation of 128K was flashing a warning light but I ignored it and pressed ahead.

A ‘Server’ App to Implement the Interface

Having generated the code from that new interface (highlighted blue below) I went ahead and generated a UWP app which could act as the ‘server’ (or producer) on the PC (highlighted orange below);

image

That involved implementing the generated IAJHoloServerService interface which I did as below;

AJHoloService implementation on Github

and then I can build this up (with the UI XAML which isn’t really worth listing) and have it run on my desktop;

image

waiting for connections to come in.

A Client Library to make it ‘Easier’ from Unity

I also wanted to have a small client library my life easier in the Unity environment in terms of consuming this service and so I added a 3rd UWP project to my solution;

image

and added a static class which gathered up the various bits of code needed to get the AllJoyn pieces working;

AJHoloServerConnection Implementation on Github

and that relied on this simple callback interface in order to call back into any user of the class which would be my Unity code;

Callback interface on Github

Consuming from Unity

At this point, I should have really tested what it was like to consume this code from a 2D app as I’d have learned a lot while expending little effort but I didn’t do that. Instead, I went and built a largely empty scene in Unity;

image

Where the Parent object is an empty GameObject and the UITextPrefab is straight from the HoloToolkit-Unity with a couple of voice keywords attached to it via the toolkit’s Keyword Manager;

image

I made sure my 2 DLLs (the AllJoyn generated DLL plus my client library) were present in a Plugins folder;

image

with appropriate settings on them;

image

and made sure that my project was requesting the AllJoyn capability. At this point, I realised that I needed to have some capability where I could relatively easily import/export world anchors in Unity without getting into a lot of code with Update() loops and state machines.

Towards that end, I wrote a couple of simple classes which I think function ok outside of the Unity editor in the UWP world but which wouldn’t work in the editor. I wrote a class to help with exporting world anchors;

WorldAnchorExporter implementation on Github

and a class to help with importing world anchors;

WorldAnchorImporter implementation on Github

and they rely on this a class which tries to do a simplistic bridge between the Update() oriented loop world of Unity and the async world of UWP as I wanted to be able to write async/await code which ‘waited’ for the isLocated flag on a WorldAnchor to be set true;

GameObjectUpdateWatcher implementation on Github

With that all done, I could attach a script to my empty GameObject to form the basis of my logic stringing together the handling of the voice keywords with the import/export of the world anchors, creation of a single, simple type of hologram (a cube) and calls to the AllJoyn service on the PC;

Startup script on Github

That took quite a bit of effort, but was it worth it? Possibly not…

Testing – AllJoyn and Large Buffers

In testing, things initially looked good. I’d run the consumer app on the PC, the client app on the HoloLens and I’d see it connect;

image

and disconnect when I shut it down and my voice command of “lock” seemed to go through the motions of creating a world anchor and it was quickly exported from the device and then it started to transfer over the network.

And it transferred…

And it transferred…

And it took a long time…minutes…

I spent quite some time debugging this. My first investigation was to look at the network performance graph from the HoloLens and it looked like this while transferring a 2-3MB world anchor across the network in chunks (64KB) using the AllJoyn interface that I’d written;

image

It’s the classic sawtooth and I moved my same client code onto both a Phone and a PC to see if it showed the same behaviour there and, to some extent, it did although it wasn’t as pronounced.

I played quite a bit with the size of the chunks that were being sent over the network and I could see that the gaps between the bursts of traffic seemed to be directly related to the size of the buffer and some native debugging combined with a quick experiment with the Visual Studio profiler pointed to this function in the generated code as being the cause of all my troubles;

image

and in so far as I can tell this runs some relatively expensive conversion function on every element in the array which burns up a bunch of CPU cycles on the HoloLens prior to the transmission of the data over the network. This seemed to slow down the world anchor transfers dramatically – I would find that it might take minutes for a world anchor to transfer which, clearly, isn’t very practical.

Putting the terrible performance that I’ve created to one side, I hit another problem…

Testing – Unexpected Method Signatures

Once I’d managed to be patient enough to upload a world anchor to my server, I found a particular problem testing out the method which downloads it back to another device. It’s AllJoyn signature in the interface looks like this;

<method name=”GetWorldAnchor”>

    <arg name=”anchorId” type=”s” direction=”in”/>

    <arg name=”byteIndex” type=”u” direction=”in”/>

    <arg name=”byteLength” type=”u” directon=”in”/>

    <arg name=”anchorData” type=”ay” direction=”out”/>

  </method>

and I found that when I called this at runtime from the client then my method’s implementation code on the server side wasn’t getting hit. All I was seeing was a client-side AllJoyn error;

ER_BUS_UNEXPECTED_SIGNATURE = 0x9015

now my method has 3 incoming parameters and a return value but it was curious to me that if I used the IoT Explorer on that interface it was showing up differently;

image

and so IoT Explorer was showing my method as having 2 inward parameters and 2 outgoing parameters or return values which isn’t what the interface specification actually says Confused smile

I wondered whether this was a bug in IoT Explorer or a bug in the runtime pieces and, through debugging the native code, I could use the debugger to cause the code to send 2 parameters rather than 3 and I could see that if I did this then the method call would cross the network and fail on the server side so it seemed that the IoT Explorer was right and my function expected 2 inward parameters and 2 outward parameters.

What wasn’t clear was…why and how to fix?

I spent about an hour before I realised that this problem came down to a type in the XML where the word “direction” had been turned into “directon” and that proved to be causing my problem – there’s a lesson in there somewhere too as the code generation tool didn’t seem to tell me anything about it and just defaulted the parameter to be outgoing.

With that simple typo fixed, I could get my code up and running properly for the first time.

Wrapping Up

Once I’d got past these stumbling blocks, the code as I have it actually seems to work in that I can run through the steps of;

  1. Run the server.
  2. Run the app on my HoloLens.
  3. See the server display that the app has connected.
  4. Use the voice command “lock” to create a world anchor, export it and upload it to the server taking a very long time.
  5. Use the voice command “cube” in a few places to create cubes.
  6. Shutdown the app and repeat 2, 3 above to see the device connect, download the anchor (taking a very long time), import it and recreate the cubes from step 5 where they were previously.

and I’ve placed the source for what I built up here;

https://github.com/mtaulty/AllJoynHoloSharing

but, because of the bad performance on copying the world anchor buffers around, I don’t think this would be a great starting point for implementing a shared holographic server and I’d go back to using the regular pieces from the HoloToolkit rather than trying to take this forward.

That said, I learned some things in playing with it and there’s probably some code in here that I’ll re-use in the future so it was worth while.

Windows 10, 1607, UWP and Experimenting with the Kinect for Windows V2 Update

I was really pleased to see this blog post;

Kinect demo code and new driver for UWP now available

announcing a new driver which provides more access to the functionality of the Kinect for Windows V2 into Windows 10 including for the UWP developer.

I wrote a little about this topic in this earlier post around 10 months ago when some initial functionality became available for the UWP developer;

Kinect V2, Windows Hello and Perception APIs

and so it’s great to see that more functionality has become available and, specifically, that skeletal data is being surfaced.

I plugged my Kinect for Windows V2 into my Surface Pro 3 and had a look at the driver being used for Kinect.

image

and I attempted to do an update but didn’t seem to see one but it’s possible that the version of the driver which I have;

image

is the latest driver as it seems to be a week or two old. At the time of writing, I haven’t confirmed this driver version but I went on to download the C++ sample from GitHub;

Camera Stream Correlation Sample

and ran it up on my Surface Pro 3 where it initially displayed the output of the rear webcam;

image

and so I pressed the ‘Next Source’ button and it attempted to work with the RealSense camera on my machine;

image

and so I pressed the ‘Next Source’ button and things seemed to hang. I’m unsure of the status of my RealSense drivers on this machine and so I disabled the RealSense virtual camera driver;

image

and then re-ran the sample and, sure enough, I could use the ‘Next Source’ button to move to the Kinect for Windows V2 sensor and then I used the ‘Toggle Depth Fading’ button to turn that option off and the ‘Toggle Skeletal Overlay’ button to switch that option on and, sure enough, I’ve got a (flat) skeletal overlay on the colour frames and it’s delivering very smooth performance here;

image

and so that’s great to see working. Given that the sample seemed to be C++ code, I wondered what this might look like for a C# developer working with the UWP and so I set about seeing if I could reproduce some of the core of what the sample is doing here.

Getting Skeletal Data Into a C# UWP App

Rather than attempting to ‘port’ the C++ sample, I started by lifting pieces of the code that I’d written for that earlier blog post into a new project.

I made a blank app targeting SDK 14393, made sure that it had access to webcam and microphone and then added in win2d.uwp as a NuGet package and added a little UI;

<Page
    x:Class="KinectTestApp.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:KinectTestApp"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:w2d="using:Microsoft.Graphics.Canvas.UI.Xaml"
    mc:Ignorable="d">

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <TextBlock
            FontSize="36"
            HorizontalAlignment="Center"
            VerticalAlignment="Center"
            TextAlignment="Center"
            Text="No Cameras" />
        <w2d:CanvasControl
            x:Name="canvasControl"
            Visibility="Collapsed"
            SizeChanged="OnCanvasControlSizeChanged"
            Draw="OnDraw"/>
    </Grid>
</Page>

From there, I wanted to see if I could get a basic render of the colour frame from the camera along with an overlay of some skeletal points.

I’d spotted that the official samples include a project which builds out a WinRT component that is then used to interpret the custom data that comes from the Kinect via a MediaFrameReference and so I included a reference to this project into my solution so that I could use it in my C# code. That project is here and looks to stand independent of the surrounding sample. I made my project reference as below;

image

and then set about trying to see if I could write some code that got colour data and skeletal data onto the screen.

I wrote a few, small, supporting classes and named them all with an mt* prefix to try and make it more obvious which code here is mine rather than in the framework or the sample. This simple class delivers a SoftwareBitmap containing the contents of the colour frame to be fired as an event;

namespace KinectTestApp
{
  using System;
  using Windows.Graphics.Imaging;

  class mtSoftwareBitmapEventArgs : EventArgs
  {
    public SoftwareBitmap Bitmap { get; set; }
  }
}

whereas this class delivers the data that I’ve decided I need in order to draw a subset of the skeletal data onto the screen;

namespace KinectTestApp
{
  using System;

  class mtPoseTrackingFrameEventArgs : EventArgs
  {
    public mtPoseTrackingDetails[] PoseEntries { get; set; }
  }
}

and it’s a simple array which will be populated with one of these types below for each user being tracked by the sensor;

namespace KinectTestApp
{
  using System;
  using System.Linq;
  using System.Numerics;
  using Windows.Foundation;
  using Windows.Media.Devices.Core;
  using WindowsPreview.Media.Capture.Frames;

  class mtPoseTrackingDetails
  {
    public Guid EntityId { get; set; }
    public Point[] Points { get; set; }

    public static mtPoseTrackingDetails FromPoseTrackingEntity(
      PoseTrackingEntity poseTrackingEntity,
      CameraIntrinsics colorIntrinsics,
      Matrix4x4 depthColorTransform)
    {
      mtPoseTrackingDetails details = null;

      var poses = new TrackedPose[poseTrackingEntity.PosesCount];
      poseTrackingEntity.GetPoses(poses);

      var points = new Point[poses.Length];

      colorIntrinsics.ProjectManyOntoFrame(
        poses.Select(p => Multiply(depthColorTransform, p.Position)).ToArray(),
        points);

      details = new mtPoseTrackingDetails()
      {
        EntityId = poseTrackingEntity.EntityId,
        Points = points
      };
      return (details);
    }
    static Vector3 Multiply(Matrix4x4 matrix, Vector3 position)
    {
      return (new Vector3(
        position.X * matrix.M11 + position.Y * matrix.M21 + position.Z * matrix.M31 + matrix.M41,
        position.X * matrix.M12 + position.Y * matrix.M22 + position.Z * matrix.M32 + matrix.M42,
        position.X * matrix.M13 + position.Y * matrix.M23 + position.Z * matrix.M33 + matrix.M43));
    }
  }
}

which would be a simple class containing a GUID to identify the tracked person and an array of Points representing their tracked joints except that I wanted those 2D Points to be in the colour space which means having to map them from the depth space that the sensor presents them in and so the FromPoseTrackingEntity() method takes a PoseTrackingEntity which is one of the types from the referenced C++ project and;

  1. Extracts the ‘poses’ (i.e. joints in my terminology)
  2. Uses the CameraIntrinsics from the colour camera to project them onto its frame having first transformed them using a matrix which maps from depth space to colour space.

Step 2 is code that I largely duplicated from the original C++ sample after trying a few other routes which didn’t end well for me Smile

I then wrote this class which wraps up a few areas;

namespace KinectTestApp
{
  using System;
  using System.Linq;
  using System.Threading.Tasks;
  using Windows.Media.Capture;
  using Windows.Media.Capture.Frames;

  class mtMediaSourceReader
  {
    public mtMediaSourceReader(
      MediaCapture capture, 
      MediaFrameSourceKind mediaSourceKind,
      Action<MediaFrameReader> onFrameArrived,
      Func<MediaFrameSource, bool> additionalSourceCriteria = null)
    {
      this.mediaCapture = capture;
      this.mediaSourceKind = mediaSourceKind;
      this.additionalSourceCriteria = additionalSourceCriteria;
      this.onFrameArrived = onFrameArrived;
    }
    public bool InitialiseWithMediaCapture()
    {
      this.mediaSource = this.mediaCapture.FrameSources.FirstOrDefault(
        fs =>
          (fs.Value.Info.SourceKind == this.mediaSourceKind) &&
          ((this.additionalSourceCriteria != null) ? 
            this.additionalSourceCriteria(fs.Value) : true)).Value;   

      return (this.mediaSource != null);
    }
    public async Task OpenReaderAsync()
    {
      this.frameReader =
        await this.mediaCapture.CreateFrameReaderAsync(this.mediaSource);

      this.frameReader.FrameArrived +=
        (s, e) =>
        {
          this.onFrameArrived(s);
        };

      await this.frameReader.StartAsync();
    }
    Func<MediaFrameSource, bool> additionalSourceCriteria;
    Action<MediaFrameReader> onFrameArrived;
    MediaFrameReader frameReader;
    MediaFrameSource mediaSource;
    MediaCapture mediaCapture;
    MediaFrameSourceKind mediaSourceKind;
  }
}

This type takes a MediaCapture and a MediaSourceKind and can then report via the Initialise() method whether that media source kind is available on that media capture. It can also apply some additional criteria if they are provided in the constructor. This class can also create a frame reader and redirect its FrameArrived events into the method provided to the constructor. There should be some way to stop this class as well but I haven’t written that yet.

With those classes in place, I added the following mtKinectColorPoseFrameHelper;

namespace KinectTestApp
{
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Numerics;
  using System.Threading.Tasks;
  using Windows.Media.Capture;
  using Windows.Media.Capture.Frames;
  using Windows.Media.Devices.Core;
  using Windows.Perception.Spatial;
  using WindowsPreview.Media.Capture.Frames;

  class mtKinectColorPoseFrameHelper
  {
    public event EventHandler<mtSoftwareBitmapEventArgs> ColorFrameArrived;
    public event EventHandler<mtPoseTrackingFrameEventArgs> PoseFrameArrived;

    public mtKinectColorPoseFrameHelper()
    {
      this.softwareBitmapEventArgs = new mtSoftwareBitmapEventArgs();
    }
    internal async Task<bool> InitialiseAsync()
    {
      bool necessarySourcesAvailable = false;

      // Find all possible source groups.
      var sourceGroups = await MediaFrameSourceGroup.FindAllAsync();

      // We try to find the Kinect by asking for a group that can deliver
      // color, depth, custom and infrared. 
      var allGroups = await GetGroupsSupportingSourceKindsAsync(
        MediaFrameSourceKind.Color,
        MediaFrameSourceKind.Depth,
        MediaFrameSourceKind.Custom,
        MediaFrameSourceKind.Infrared);

      // We assume the first group here is what we want which is not
      // necessarily going to be right on all systems so would need
      // more care.
      var firstSourceGroup = allGroups.FirstOrDefault();

      // Got one that supports all those types?
      if (firstSourceGroup != null)
      {
        this.mediaCapture = new MediaCapture();

        var captureSettings = new MediaCaptureInitializationSettings()
        {
          SourceGroup = firstSourceGroup,
          SharingMode = MediaCaptureSharingMode.SharedReadOnly,
          StreamingCaptureMode = StreamingCaptureMode.Video,
          MemoryPreference = MediaCaptureMemoryPreference.Cpu
        };
        await this.mediaCapture.InitializeAsync(captureSettings);

        this.mediaSourceReaders = new mtMediaSourceReader[]
        {
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Color, this.OnFrameArrived),
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Depth, this.OnFrameArrived),
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Custom, this.OnFrameArrived,
            DoesCustomSourceSupportPerceptionFormat)
        };

        necessarySourcesAvailable = 
          this.mediaSourceReaders.All(reader => reader.Initialise());

        if (necessarySourcesAvailable)
        {
          foreach (var reader in this.mediaSourceReaders)
          {
            await reader.OpenReaderAsync();
          }
        }
        else
        {
          this.mediaCapture.Dispose();
        }
      }
      return (necessarySourcesAvailable);
    }
    void OnFrameArrived(MediaFrameReader sender)
    {
      var frame = sender.TryAcquireLatestFrame();

      if (frame != null)
      {
        switch (frame.SourceKind)
        {
          case MediaFrameSourceKind.Custom:
            this.ProcessCustomFrame(frame);
            break;
          case MediaFrameSourceKind.Color:
            this.ProcessColorFrame(frame);
            break;
          case MediaFrameSourceKind.Infrared:
            break;
          case MediaFrameSourceKind.Depth:
            this.ProcessDepthFrame(frame);
            break;
          default:
            break;
        }
        frame.Dispose();
      }
    }
    void ProcessDepthFrame(MediaFrameReference frame)
    {
      if (this.colorCoordinateSystem != null)
      {
        this.depthColorTransform = frame.CoordinateSystem.TryGetTransformTo(
          this.colorCoordinateSystem);
      }     
    }
    void ProcessColorFrame(MediaFrameReference frame)
    {
      if (this.colorCoordinateSystem == null)
      {
        this.colorCoordinateSystem = frame.CoordinateSystem;
        this.colorIntrinsics = frame.VideoMediaFrame.CameraIntrinsics;
      }
      this.softwareBitmapEventArgs.Bitmap = frame.VideoMediaFrame.SoftwareBitmap;
      this.ColorFrameArrived?.Invoke(this, this.softwareBitmapEventArgs);
    }
    void ProcessCustomFrame(MediaFrameReference frame)
    {
      if ((this.PoseFrameArrived != null) &&
        (this.colorCoordinateSystem != null))
      {
        var trackingFrame = PoseTrackingFrame.Create(frame);
        var eventArgs = new mtPoseTrackingFrameEventArgs();

        if (trackingFrame.Status == PoseTrackingFrameCreationStatus.Success)
        {
          // Which of the entities here are actually tracked?
          var trackedEntities =
            trackingFrame.Frame.Entities.Where(e => e.IsTracked).ToArray();

          var trackedCount = trackedEntities.Count();

          if (trackedCount > 0)
          {
            eventArgs.PoseEntries =
              trackedEntities
              .Select(entity =>
                mtPoseTrackingDetails.FromPoseTrackingEntity(entity, this.colorIntrinsics, this.depthColorTransform.Value))
              .ToArray();
          }
          this.PoseFrameArrived(this, eventArgs);
        }
      }
    }
    async static Task<IEnumerable<MediaFrameSourceGroup>> GetGroupsSupportingSourceKindsAsync(
      params MediaFrameSourceKind[] kinds)
    {
      var sourceGroups = await MediaFrameSourceGroup.FindAllAsync();

      var groups =
        sourceGroups.Where(
          group => kinds.All(
            kind => group.SourceInfos.Any(sourceInfo => sourceInfo.SourceKind == kind)));

      return (groups);
    }
    static bool DoesCustomSourceSupportPerceptionFormat(MediaFrameSource source)
    {
      return (
        (source.Info.SourceKind == MediaFrameSourceKind.Custom) &&
        (source.CurrentFormat.MajorType == PerceptionFormat) &&
        (Guid.Parse(source.CurrentFormat.Subtype) == PoseTrackingFrame.PoseTrackingSubtype));
    }
    SpatialCoordinateSystem colorCoordinateSystem;
    mtSoftwareBitmapEventArgs softwareBitmapEventArgs;
    mtMediaSourceReader[] mediaSourceReaders;
    MediaCapture mediaCapture;
    CameraIntrinsics colorIntrinsics;
    const string PerceptionFormat = "Perception";
    private Matrix4x4? depthColorTransform;
  }
}

This is essentially doing;

  1. InitialiseAsync
    1. Using the MediaFrameSourceGroup type to try and find a source group that looks like it is Kinect by searching for Infrared+Color+Depth+Custom source kinds. This isn’t a complete test and it might be better to make it more complete. Also, there’s an assumption that the first group found is the best which isn’t likely to always hold true.
    2. Initialising a MediaCapture for the group found in step 1 above.
    3. Initialising three of my mtMediaSourceReader types for the Color/Depth/Custom source kinds and adding some extra criteria for the Custom source type to try and make sure that it supports the ‘Perception’ media format – this code is essentially lifted from the original sample.
    4. Opening frame readers on those three items and handling the events as frame arrives.
  2. OnFrameArrived simply passes the frame on to sub-functions based on type and this could have been done by deriving specific mtMediaSourceReaders.
  3. ProcessDepthFrame tries to get a transformation from depth space to colour space for later use.
  4. ProcessColorFrame fires the ColorFrameArrived event with the SoftwareBitmap that has been received.
  5. ProcessCustomFrame handles the custom frame by;
    1. Using the PoseTrackingFrame.Create() method from the referenced C++ project to interpret the raw data that comes from the custom sensor.
    2. Determining how many bodies are being tracked by the data.
    3. Converts the data types from the referenced C++ project to my own data types which include less of the data and which try to map the positions of joints given using 3D depth points to their respective 2D colour space points.

Lastly, there’s some code-behind which tries to glue this into the UI;

namespace KinectTestApp
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Graphics.Canvas.UI.Xaml;
  using System.Numerics;
  using System.Threading;
  using Windows.Foundation;
  using Windows.Graphics.Imaging;
  using Windows.UI;
  using Windows.UI.Core;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += this.OnLoaded;
    }
    void OnCanvasControlSizeChanged(object sender, SizeChangedEventArgs e)
    {
      this.canvasSize = new Rect(0, 0, e.NewSize.Width, e.NewSize.Height);
    }
    async void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.helper = new mtKinectColorPoseFrameHelper();

      this.helper.ColorFrameArrived += OnColorFrameArrived;
      this.helper.PoseFrameArrived += OnPoseFrameArrived;

      var suppported = await this.helper.InitialiseAsync();

      if (suppported)
      {
        this.canvasControl.Visibility = Visibility.Visible;
      }
    }
    void OnColorFrameArrived(object sender, mtSoftwareBitmapEventArgs e)
    {
      // Note that when this function returns to the caller, we have
      // finished with the incoming software bitmap.
      if (this.bitmapSize == null)
      {
        this.bitmapSize = new Rect(0, 0, e.Bitmap.PixelWidth, e.Bitmap.PixelHeight);
      }

      if (Interlocked.CompareExchange(ref this.isBetweenRenderingPass, 1, 0) == 0)
      {
        this.lastConvertedColorBitmap?.Dispose();

        // Sadly, the format that comes in here, isn't supported by Win2D when
        // it comes to drawing so we have to convert. The upside is that 
        // we know we can keep this bitmap around until we are done with it.
        this.lastConvertedColorBitmap = SoftwareBitmap.Convert(
          e.Bitmap,
          BitmapPixelFormat.Bgra8,
          BitmapAlphaMode.Ignore);

        // Cause the canvas control to redraw itself.
        this.InvalidateCanvasControl();
      }
    }
    void InvalidateCanvasControl()
    {
      // Fire and forget.
      this.Dispatcher.RunAsync(CoreDispatcherPriority.High, this.canvasControl.Invalidate);
    }
    void OnPoseFrameArrived(object sender, mtPoseTrackingFrameEventArgs e)
    {
      // NB: we do not invalidate the control here but, instead, just keep
      // this frame around (maybe) until the colour frame redraws which will 
      // (depending on race conditions) pick up this frame and draw it
      // too.
      this.lastPoseEventArgs = e;
    }
    void OnDraw(CanvasControl sender, CanvasDrawEventArgs args)
    {
      // Capture this here (in a race) in case it gets over-written
      // while this function is still running.
      var poseEventArgs = this.lastPoseEventArgs;

      args.DrawingSession.Clear(Colors.Black);

      // Do we have a colour frame to draw?
      if (this.lastConvertedColorBitmap != null)
      {
        using (var canvasBitmap = CanvasBitmap.CreateFromSoftwareBitmap(
          this.canvasControl,
          this.lastConvertedColorBitmap))
        {
          // Draw the colour frame
          args.DrawingSession.DrawImage(
            canvasBitmap,
            this.canvasSize,
            this.bitmapSize.Value);

          // Have we got a skeletal frame hanging around?
          if (poseEventArgs?.PoseEntries?.Length > 0)
          {
            foreach (var entry in poseEventArgs.PoseEntries)
            {
              foreach (var pose in entry.Points)
              {
                var centrePoint = ScalePosePointToDrawCanvasVector2(pose);

                args.DrawingSession.FillCircle(
                  centrePoint, circleRadius, Colors.Red);
              }
            }
          }
        }
      }
      Interlocked.Exchange(ref this.isBetweenRenderingPass, 0);
    }
    Vector2 ScalePosePointToDrawCanvasVector2(Point posePoint)
    {
      return (new Vector2(
        (float)((posePoint.X / this.bitmapSize.Value.Width) * this.canvasSize.Width),
        (float)((posePoint.Y / this.bitmapSize.Value.Height) * this.canvasSize.Height)));
    }
    Rect? bitmapSize;
    Rect canvasSize;
    int isBetweenRenderingPass;
    SoftwareBitmap lastConvertedColorBitmap;
    mtPoseTrackingFrameEventArgs lastPoseEventArgs;
    mtKinectColorPoseFrameHelper helper;
    static readonly float circleRadius = 10.0f;
  }
}

I don’t think there’s too much in there that would require explanation other than that I took a couple of arbitrary decisions;

  1. That I essentially process one colour frame at a time using a form of ‘lock’ to try and drop any colour frames that arrive while I am still in the process of drawing the last colour frame and that ‘drawing’ involves both the method OnColorFrameArrived and the async call to OnDraw it causes.
  2. That I don’t force a redraw when a ‘pose’ frame arrives. Instead, the data is held until the next OnDraw call which comes from handling the colour frames.It’s certainly possible that the various race conditions involved there might cause that frame to be dropped and another to replace it in the meantime.

Even though there’s a lot of allocations going on in that code as it stands, here’s a screenshot of it running and the performance isn’t bad at all running it from my Surface Pro 3 and I’m particularly pleased with the red nose that I end up with here Smile

image

The code is quite rough and ready as I was learning as I went along and some next steps might be to;

  1. Draw joints that are inferred in a different colour to those that are properly tracked.
  2. Draw the skeleton rather than just the joints.
  3. Do quite a lot of optimisations as the code here allocates a lot.
  4. Do more tracking around entities arriving/leaving based on their IDs and handle multiple people with different colours.
  5. Refactor to specialise the mtMediaSourceReader class to have separate types for Color/Depth/Custom and thereby tidy up the code which uses this type.

but, for now, I was just trying to get some basics working.

Here’s the code on GitHub if you want to try things out and note that you’d need that additional sample code from the official samples to make it work.

Windows 10 1607, UWP, Composition APIs–Walked Through Demo Code

I’ve written a few posts about the Windows 10 composition APIs for beautiful, fluid, animated UX gathered under this URL;

Composition Posts

and today I was putting together some demo code for other purposes and I thought I’d screen-capture what I had as a walk through of some of the capabilities of those composition APIs starting from a blank slate and walking through it;

That’s just one of my own, unofficial walk-throughs. For the official bits, visit the team site at;

http://aka.ms/winuilabs

Enjoy Smile