Hands, Gestures and Popping back to ‘Prague’

Just a short post to follow up on this previous post;

Hands, Gestures and a Quick Trip to ‘Prague’

I said that if I ‘found time’ then I’d revisit that post and that code and see if I could make it work with the Kinect for Windows V2 sensor rather than with the Intel RealSense SR300 which I used in that post.

In all honesty, I haven’t ‘found time’ but I’m revisiting it anyway Smile

I dug my Kinect for Windows V2 and all of its lengthy cabling out of the drawer, plugged it into my Surface Book and … it didn’t work. Instead, I got the flashing white light which usually indicates that things aren’t going so well.

Not to be deterred, I did some deep, internal Microsoft research (ok, I searched the web Smile) and came up with this;

Kinect Sensor is not recognized on a Surface Book

and getting rid of the text value within that registry key sorted out that problem and let me test that my Kinect for Windows V2 was working in the sense that the configuration verifier says;


which after many years of experience I have learned to interpret as “Give it a try!” Winking smile and so I tried out a couple of the SDK samples and they worked fine for me and so I reckoned I was in a good place to get started.

However, the Project Prague bits were not so happy and I found they were logging a bunch of errors in the ‘Geek View’ about not being able to connect/initialise to either the SR300 or the Kinect camera.

This seemed to get resolved by me updating my Kinect drivers – I did an automatic update and Windows found new drivers online which took me to this version;


which I was surprised that I didn’t have already as it’s quite old but that seemed to make the Project Prague pieces happy and the Geek View is back in business showing output from Kinect;


and from the little display window on the left there it felt like this operated at a range of approx 0.5m to 1.0m. I wondered whether I could move further away but that didn’t seem to be the case in the quick experiment that I tried.

The big question for me then was whether the code that I’d previously written and run against the SR300 would “just work” on the Kinect for Windows V2 and, of course, it does Smile Revisit the previous post for the source code if you’re interested but I found my “counting on four fingers” gesture was recognised quickly and reliably here;


This is very cool – it’d be interesting to know exactly what ‘Prague’ relies on from the perspective of the camera and also from the POV of system requirements (CPU, RAM, GPU, etc) in order to make this work but it looks like they’ve got a very decent system going for recognising hand gestures across different cameras.

Hands, Gestures and a Quick Trip to ‘Prague’

Sorry for the title – I couldn’t resist and, no, I’ve not switched to writing a travel blog just yet although I’ll keep the idea in my back pocket for the time when the current ‘career’ hits the ever-looming buffers Winking smile

But, no, this post is about ‘Project Prague’ and hand gestures and I’ve written quite a bit in the past about natural gesture recognition with technologies like the Kinect for Windows V2 and with the RealSense F200 and SR300 cameras.

Kinect has has great capabilities for colour, depth, infra-red imaging and a smart (i.e. cloud-trained AI) runtime which can bring all those streams together and give you (human) skeletal tracking of 25 joints on 6 bodies at 30 frames per second. It can also do some facial tracking and has an AI based gesture recognition system which can be trained to recognise human-body based gestures like “hands above head” or “golf swing” and so on.

That camera has a range of approx 0.5m to 4.5m and perhaps because of this long range it does not have a great deal of support for hand-based gestures although it can report some hand joints and a few different hand states like open/closed but it doesn’t go much beyond that.

I’ve also written about the RealSense F200 and SR300 cameras although I never had a lot of success with the SR300 and those cameras have a much shorter range (< 1m) than the Kinect for Windows V2 but have/had some different capabilities in that they have surfaced functionality like;

  • Detailed facial detection providing feature positions etc and facial recognition.
  • Emotion detection providing states like ‘happy’, ‘sad’ etc (although this got removed from the original SDK at a later point)
  • Hand tracking features
    • The SDK has great support for tracking of hands down to the joint level with > 20 joints reported by the SDK
    • The SDK also has support for hand-based gestures such as “V sign”, “full pinch” etc.

With any of these cameras and their SDKs the processing happens locally on the (high bandwidth) data at frame rates of 15/30/60 FPS and so it’s quite different to those scenarios where you might be selectively capturing data and sending it to the cloud for processing as you see with the Cognitive Services but both approaches have their benefits and are open to being used in combination.

In terms of this functionality around hand tracking and gestures, I bundled some of what I knew about this into a video last year and published it to Channel9 although it’s probably quite a bit out of date at this point;


but it’s been a topic that interested me for a long time and so when I saw ‘Project Prague’ announced a few weeks ago I was naturally interested.

My first question on ‘Prague’ was whether it would be make use of a local-processing or a cloud-based-processing model and, if the former, whether it would require a depth camera or would be based purely on a web cam.

It turns out that ‘Prague’ is locally processing data and does require either a Kinect for Windows V2 camera or a RealSense SR300 camera with the recommendation on the website being to use the SR300.

I dug my Intel RealSense SR300 out of the drawer where it’s been living for a few months, plugged it in to my Surface Book and set about seeing whether I could get a ‘Prague’ demo up and running on it.

Plugging in the SR300

I hadn’t plugged the SR300 into my Surface Book since I reinstalled Windows and so I wondered how that had progressed since the early days of the camera and since Windows has moved to Creators Update (I’m running 15063.447).

I hadn’t installed the RealSense SDK onto this machine but Windows seemed to recognise the device and install it regardless although I did find that the initial install left some “warning triangles” in device manager that had to be resolved by a manual “Scan for hardware changes” from the Device Manager menu but then things seemed to sort themselves out and Device Manager showed;


which the modern devices app shows as;


and that seemed reasonable and I didn’t have to visit the troubleshooting page but I wasn’t surprised to see that it existed based on my previous experience with the SR300 but, instead, I went off to download ‘Project Prague’.

Installing ‘Prague’

Nothing much to report here – there’s an MSI that you download and run;


and “It Just Worked” so nothing to say about that.

Once installation was figured, as per the docs, the “Microsoft Gestures Service” app ran up and I tried to do as the documentation advised and make sure that the app was recognising my hand – it didn’t seem to be working as below;


but then I tried with my right hand and things seemed to be working better;


This is actually the window view (called the ‘Geek View’!) of a system tray application (the “gestures service”) which doesn’t seem to be a true service in the NT sense but instead seems to be a regular app configured to run at startup on the system;


so, much like the Kinect Runtime it seems that this is the code which sits and watches frames from the camera and then applications become “clients” of this service and the “DiscoveryClient” which is also highlighted in the screenshot as being configured to run at startup is one such demo app which picks up gestures from the service and (according to the docs) routes the gestures through to the shell.

Here’s the system tray application;


and if I perform the “bloom” gesture (familiar from Windows Mixed Reality) then the system tray app pops up;


and tells me that there are other gestures already active to open the start menu and toggle the volume. The gestures animate on mouse over to show how to execute them and I had no problem with using the gesture to toggle the volume on my machine but I did struggle a little with the gesture to open the start menu.

The ‘timeline’ view in the ‘Geek View’ here is interesting because it shows gestures being detected or not in real time and you can perhaps see on the timeline below how I’m struggling to execute the ‘Shell_Start’ gesture and it’s getting recognised as a ‘Discovery_Tray’ gesture. In that screenshot the white blob indicates a “pose” whereas the green blobs represent completed “gestures”.


There’s also a ‘settings’ section here which shows me;


and then on the GestPacks section;


suggests that the service has integration for various apps. At the time of writing, the “get more online” option didn’t seem to link to anything that I could spot but I noticed by running PowerPoint that the app is monitoring which app is in the foreground and is switching its gestures list to relate to that contextual app.

So, when running PowerPoint, the gesture service shows;


and those gestures worked very well for me in PowerPoint – it was easy to start a slideshow and then advance the slides by just tapping through in the air with my finger. These details can also be seen in the settings app;


which suggests that these gestures are contextual within the app – for example the “Rotate Right 90” option doesn’t show up until I select an object in PowerPoint;


and I can see this dynamically changing in the ‘Geek View’ – here’s the view when no object is selected;


and I can see that there are perhaps 3 gestures registered whereas if I select an object in PowerPoint then I see;


and those gestures worked pretty well for me Smile 

Other Demo Applications

I experimented with the ‘Camera Viewer’ app which works really well. Once again, from the ‘Geek View’ I can see that this app has registered some gestures and you can perhaps see below that I am trying out the ‘peace’ gesture and the geek view is showing that this is registered, that it has completed and the app is displaying some nice doves to show it’s seen the gesture;


One other interesting aspect of this app is that it displays a ‘Connecting to Gesture Service’ message as you bring it back into focus suggesting that there’s some sort of ‘connection’ to the gestures service that comes/goes over time.

These gestures worked really well for me and by this point I was wondering how these gestures apps were plugging into the architecture here, how they were implemented and so I wanted to see if I could write some code. I did notice that the GestPacks seem to live in a folder under the ‘Prague’ installation;


and a quick look at one of the DLLs (e.g. PowerPoint) shows that this is .NET code interop’ing into PowerPoint as you’d expect although the naming suggests there’s some ATL based code in the chain here somewhere;


Coding ‘Prague’

The API docs link leads over to this web page which points to a Microsoft.Gestures namespace that seems to suggest is part of .NET Core 2.0. That would seem to suggest that (right now) you’re not going to be able to reference this from a Universal Windows App project but you can reference it from a .NET Framework project and so I just referenced it from a command line project targeting .NET Framework 4.6.2.

The assemblies seem to live in the equivalent of;


and I added a reference to 3 of them;


It’s also worth noting that there are a number of code samples over in this github repository;


Although, at the time of writing, I haven’t really referred to those too much as I was trying to see what my experience was like in ‘starting from scratch’ and to that end I had a quick look at what seemed to be the main assembly in the object browser;


and the structure seemed to suggest that the library is using TCP sockets as an ‘RPC’ mechanism to communicate between an app and the gestures service and a quick look at the gestures service process with Process Explorer did show that it was listening for traffic;


So, how to get a connection? It seems fairly easy in that the docs point you to the GesturesEndpointService class and there’s a GesturesEndpointServiceFactory to make those and then IntelliSense popped up as below to reinforce the idea that there is some socket based comms going on here;


From there, I wanted to define my own gesture which would allow the user to start with an open spread hand and then tap their thumb onto their four fingers in sequence which seemed to consist of 5 stages and so I read the docs around how gestures, poses and motion work and added some code to my console application to see if I could code up this gesture;

namespace ConsoleApp1
  using Microsoft.Gestures;
  using Microsoft.Gestures.Endpoint;
  using System;
  using System.Collections.Generic;
  using System.Threading.Tasks;

  class Program
    static void Main(string[] args)

      Console.WriteLine("Hit return to exit...");


    static async Task ConnectAsync()

        var connected = await ServiceEndpoint.ConnectAsync();

        if (!connected)
          Console.WriteLine("Failed to connect...");
          await serviceEndpoint.RegisterGesture(CountGesture, true);
        Console.WriteLine("Exception thrown in starting up...");
    static void OnTriggered(object sender, GestureSegmentTriggeredEventArgs e)
      Console.WriteLine($"Gesture {e.GestureSegment.Name} triggered!");
    static GesturesServiceEndpoint ServiceEndpoint
        if (serviceEndpoint == null)
          serviceEndpoint = GesturesServiceEndpointFactory.Create();
        return (serviceEndpoint);
    static Gesture CountGesture
        if (countGesture == null)
          var poses = new List<HandPose>();

          var allFingersContext = new AllFingersContext();

          // Hand starts upright, forward and with fingers spread...
          var startPose = new HandPose(
            new FingerPose(
              allFingersContext, FingerFlexion.Open),
            new FingertipDistanceRelation(
              allFingersContext, RelativeDistance.NotTouching));


          foreach (Finger finger in
            new[] { Finger.Index, Finger.Middle, Finger.Ring, Finger.Pinky })
              new HandPose(
              new FingertipDistanceRelation(
                Finger.Thumb, RelativeDistance.Touching, finger)));
          countGesture = new Gesture("count", poses.ToArray());
          countGesture.Triggered += OnTriggered;
        return (countGesture);
    static Gesture countGesture;
    static GesturesServiceEndpoint serviceEndpoint;

I’m very unsure as to whether my code is specifying my gesture ‘completely’ or ‘accurately’ but what amazed me about this is that I really only took one stab at it and it “worked”.

That is, I can run my app and see my gesture being built up from its 5 constituent poses in the ‘Geek View’ and then my console app has its event triggered and displays the right output;


What I’d flag about that code is that it is bad in that it’s using async/await in a console app and so it’s likely that thread pool threads are being used to dispatch all the “completions” which mean that lots of threads are potentially running through this code and interacting with objects which may/not have thread affinity – I’ve not done anything to mitigate that here.

Other than that, I’m impressed – this was a real joy to work with and I guess the only way it could be made easier would be to allow for the visual drawing or perhaps the recording of hand gestures.

The only other thing that I noticed is that my CPU can get a bit active while using these bits and they seem to run at about 800MB of memory but then Project Prague is ‘Experimental’ right now so I’m sure that could change over time.

I’d like to also try this code on a Kinect for Windows V2 – if I do that, I’ll update this post or add another one.

Experiments with Shared Holographic Experiences and Photon Unity Networking

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Backdrop – Shared Holographic Experiences (or “Previously….”)

Recently, I seem to have adopted this topic of shared holographic experiences and I’ve written quite a few posts that relate to it and I keep returning to it as I find it really interesting although most of what I’ve posted has definitely been experimental rather than any kind of finished/polished solution.

One set of posts began quite a while ago with this post;

Windows 10, UWP, HoloLens & A Simple Two-Way Socket Library

where I experimented with writing my own comms library between two HoloLens devices on a local network with the initial network discovery being handled by Bluetooth and with no server or cloud involved.

That had limits though and I moved on to using the sharing service from the HoloToolkit-Unity culminating (so far) in this post;

Hitchhiking the HoloToolkit-Unity, Leg 13–Continuing with Shared Experiences

although I did recently go off on another journey to see if I could build a shared holographic experience on top of the AllJoyn protocol in this post;

Experiments with Shared Holographic Experiences and AllJoyn (Spoiler Alert- this one does not end well)

I should really have got this out of my system by now but I’m returning to it again in this post for another basic experiment.

That recent AllJoyn experiment had a couple of advantages including;

  • Performing automatic device discovery (i.e. letting AllJoyn handle the discovery)
  • Not requiring a cloud connection
  • Easy programming model (using the UWP tooling)

but the disadvantages came in that I ended up having to introduce some kind of ‘server’ app when I didn’t really intend to plus there was pretty bad performance when it came to passing around what are often large world anchor buffers.

That left me wanting to try out a few other options and I spent a bit of time looking at Unity networking (or UNET) but didn’t progress it too far because I couldn’t get the discovery mechanisms (based on UDP multicasting) to work nicely for me across a single HoloLens device and the HoloLens emulator and so I let that drop although, again, it looks to offer a server-less solution with a single device being able to operate as both ‘client’ and ‘host’ and the programming model seemed pretty easy.

Photon Unity Networking

Putting that to one side for the moment, I turned my attention to “Photon Unity Networking” (or PUN) to see if I could make use of that to build out the basics of a shared holographic experience and this post is a write up of my first experiment there.

PUN seems to involve a server which can either be run locally or in the cloud and Photon provide a hosted version of it and I figured that had to be the easiest starting point and so I went with that although, as you’ll see later, it brought with it a limitation that I could have avoided if I’d decided to host the server myself.

Getting started with cloud-hosted PUN is easy. I went for the free version of this cloud hosted model which seems to offer me up to 20 concurrent users and it was very easy to;

  1. Sign up for the service
  2. Use the portal to create an application as my first app and get an ID that can be fed into the SDK
  3. Download the SDK pieces from the Unity asset store and bring them into a Unity project

and so from there I thought it would be fun to see if I could get some basic experiment with shared holograms up and running on PUN and that’s what the rest of this post is about.

The Code

The code that I’m referring to here is all hosted on Github and it’s very basic in that all that it does (or tries to do) is to let the user use 3 voice commands;

  • “create”
  • “open debug log”
  • “close debug log”

and the keyword “create” creates a cube which should be visible across all the devices that are running the app and in the same place in the same physical location.

That’s it Smile I haven’t yet added the ability to move, manipulate holograms or show the user’s head positions as I’ve done in earlier posts. Perhaps I’ll return to that later.

But the code is hosted here;

Code on Github

and I’m going to refer to classes from it through the rest of the post.

It’s important to realise that the code is supplied without the Photon Application ID (you’d need to get your own) and without the storage access keys for my Azure storage account (you’d need to get your own).

The Blank Project

I think it’s fair to say that Photon has quite a lot of functionality that I’m not even going to attempt to make use of around lobbies and matchmaking – I really just wanted the simplest solution that I could make use of and so I started a new Unity project and added 4 sets of code to it straight off the bat as shown below;


Those pieces are;

  1. The HoloToolkit-Unity
  2. The Mixed Reality Design Labs
  3. The Photon Unity Networking Scripts
  4. A StorageServices library

I’ll return to the 4th one later in the post but I’m hoping that the other 3 are well understood and, if not, you can find reference to them on this blog site in many places;

Posts about Mixed Reality

I made sure that my Unity project was set up for Holographic development using the HoloToolkit menu options to set up the basic scene settings, project settings;


and specifically that my app had the capability to access both the microphone (for voice commands) and spatial perception (for world anchoring).

From there, I created a scene with very little in it other than a single empty Root object along with the HoloLens prefab from the Mixed Reality Design Labs (highlighted orange below) which provides the basics of getting that library into my project;


and I’m now “ready to go” in the sense of trying to make use of PUN to get a hologram shared across devices. Here’s the steps I undertook.

Configuring PUN

PUN makes it pretty easy to specify the details of your networking setup including your app key in that they have an option to use a configuration file which can be edited in the Unity editor and so I went via that route.

I didn’t change too much of the setup here other than to add my application id, specify TCP (more on that later) and a region of EU and then specify that I didn’t want to auto-join a lobby or enable stats as I’m hoping to avoid lobbies.


Making a Server Connection

I needed to make a connection to the server and PUN makes that pretty simple.

There’s a model in PUN of deriving your class from a PunBehaviour which then has a set of overrides that you can use to run code as/when certain networking events happen like a server connection or a player joining the game. I wrapped up the tiny bit of code needed to make a server connection based on a configuration file into a simple component that I called PhotonConnector which essentially takes the override-model of PUN and turns it into an event based model that suited me better. Here’s that class;

The PhotonConnector Class

and so the idea here is that I just use the OnConnectedToMaster override to wait for a connection and then I fire an event (FirstConnection) that some other piece of my code can pick up.

I dropped an instance of this component onto my Root object;


So, that’s hopefully my code connected to the PUN cloud server.

Making/Joining a Room

Like many multiplayer game libraries, PUN deals with the notion of a bounded set of users inside of a “room” (joined from a “lobby”) and I wanted to keep this as simple as possible for my experiment here and so I tried to bypass lobbies in as much as possible and tried to avoid building UI for the user to select a room.

Instead, I just wanted to hard-wire my app such that it would attempt to join (or create if necessary) a room given a room name and so I wrote a simple component which would attempt to either create or join a room given the room name;

The PhotoRoomJoiner Class

and so this component is prepared to look for the PhotonConnector, wait for it to connect to the network before then attempting to join/create a room on the server. Once done, like the PhotonConnector it fires an event to signify that it has completed.

I dropped an instance of this component onto my Root object leaving the room name setting as “Default Room”;


and by this point I was starting to realise that I was lacking any way of visualising Debug.Log calls on my device and that was starting to be a limiting factor…

Visualising Debug Output

I’ve written a few ugly solutions to displaying debug output on the HoloLens and I wanted to avoid writing yet another one and so I finally woke up and realised that I could make use of the DebugLog prefab from the Mixed Reality Design Labs;


and I left its configuration entirely alone but now I can see all my Debug.Log output by simply saying “open debug log” inside of my application which is a “very useful thing indeed” given how little I paid for it! Smile


One World Anchor Per App or Per Hologram?

In order to have holograms appear in a consistent position across devices, those devices are going to have to agree on a common coordinate system and that’s done by;

  • Creating an object at some position on one device
  • Applying a world anchor to that object to lock it in position in the real world
  • Obtaining (‘exporting’) the blob representing that world anchor
  • Sending the blob over the network to other devices
  • On those additional devices
    • Receiving the blob over the network
    • Creating the same type of object
    • Importing the world anchor blob onto the device
    • Applying (‘locking’) the newly created object with the imported world anchor blob so as to position it in the same position in the physical world as the original

It’s a multi-step process and, naturally, there’s many things that can go wrong along the way.

One of the first decisions to make is whether to apply a world anchor to every hologram shared or to perhaps apply one world anchor across the whole scene and parent all holograms from it. The former is likely to have great accuracy but the latter is a lot less expensive in terms of how many bytes need to be shipped around the network.

For this experiment, I decided to go with a halfway house. The guidance suggests that;

“A good rule of thumb is to ensure that anything you render based on a distant spatial anchor’s coordinate system is within about 3 meters of its origin”

and so I decided to go with that and to essentially create and share a new world anchor any time a hologram is created more than 3m from an existing world anchor.

In order to do that, I need to track where world anchors have been placed and I do that locally on the device.

Rather than use a hologram itself as a world anchor, I create an empty object as the world anchor and then any hologram within 3m of that anchor would be parented from that anchor.

Tracking World Anchor Positions

In order to keep track of the world anchors that a device has created or which it has received from other devices I have each device maintain a simple list of world anchors with a GUID-based naming scheme to ensure that I can refer to these world-anchors across devices. It’s a fairly simple thing and it’s listed here;

The AnchorPositionList Class

Importing/Exporting World Anchors

The business of importing or exporting world anchors takes quite a few steps and I’ve previously written code which wraps this up into a (relatively) simple single method call where I can hand a GameObject over to a method which will;

  • For export
    • Add a WorldAnchor component to the GameObject
    • Wait for that WorldAnchor component to flag that it isLocated in the world
    • Export the data for that WorldAnchor using the WorldAnchorTransferBatch
    • Return the byte[] array exported
  • For import
    • Take a byte[] array and import it using the WorldAnchorTransferBatch
    • Apply the LockObject call to the GameObject

That code is all wrapped up in a class I called SpatialAnchorHelpers

The SpatialAnchorHelpers class

One thing I’d add about this class is that it is very much “UWP” specific in that I made no attempt to make this code particularly usable from the Unity Editor and to avoid getting tied up in lots of asynchronous callbacks I just wrote code with async/await which Unity can’t make sense of but, for me, makes for much more readable code.

This code also needs to “wait” for the isLocated flag on a WorldAnchor component to signal ‘true’ and so I needed to make an awaitable version of this and I used this pretty ugly class that I’ve used before;

The PredicateLoopWatcher class

I’m not too proud of that and it perhaps needs a rethink but it’s “kind of working” for me for now although if you look at it you’ll realise that there’s a strong chance that it might loop forever and so some kind of timeout might be a good idea!

Using async/await without a suitable SynchronizationContext can mean that code can easily end up on the wrong thread for interacting with Unity’s UI objects and so I added a Dispatcher component which I try to use to help with marshalling code back onto Unity’s UI thread;

The Dispatcher Class

and so that’s part of the scripts I wrote here too and I just added an instance of it to my root script so that I’d be able to get hold of it;


Passing World Anchor Blobs Around the Network

For even the simplest, most basic solution like this one there comes a time when one device needs to ‘notify’ another device that either;

  • a new world anchor has been created
  • a new hologram has been created relative to an existing world anchor

and so there’s a need for some kind of ‘network notification’ which carries some data with it. The major decision though is how much data and initially what I was hoping to achieve here was for the notification to carry all of the data.

To put that into plainer English, I was hoping to use PUN’s RPC feature to enable me to send out an RPC from one device to another saying

“Hey, there’s a new world anchor called {GUID} and here’s the 1-10MB of data representing it”

Now, I must admit that I suspected that this would cause me problems (like it did when I tried it with AllJoyn) and it did Smile

Firstly, the default protocol for PUN is UDP and, naturally, it’s not a great idea to try and send MB over UDP this way and so I switched the protocol for my app to be TCP via the configuration screen that I screenshotted earlier.

Making an RPC method in PUN is simple, I just need to make sure that there’s a PhotonView component on my GameObject and then I can just add an [PunRPC] attribute and make sure that the parameters can be serialized by PUN or by my custom code if necessary.

Invoking the RPC method is also simple – you grab hold of the PhotonView component and use the RPC() method on it and there’s a target parameter on there which was really interesting to me.

In my scenario, I only really need two RPCs, something like;

  • NewWorldAnchorAdded( anchor identifier, anchor byte array )
  • NewHologramAdded( anchor identifier, hologram position relative to anchor )

Given that I was hoping to pass the entire world anchor blob over the RPC call, I didn’t want that mirrored back to the originating client by the server because that client already had that blob and so it would be wasteful.

Consequently, I used the Targets.OthersBuffered option to try and send the RPC to all the other devices in the room.

The other nice aspect around this option is the Buffered part in the sense that the server will keep the RPC details around and deliver it (and others) to new clients as they join the room.


It didn’t work for me though because, although PUN doesn’t place size limits on parameters to an RPC call, the cloud-hosted version of PUN does and the server bounced my RPCs straight back at me and after a little online discussion I was pointed to this article which flags that the server limit is 0.5MB for a parameter.

So, using RPCs for these large blobs wasn’t going to work much like it didn’t really work very nicely for me when I looked at doing something similar over AllJoyn.

What next? Use a blob store…

Putting Blobs in…a Blob Store!

I decided that I’d stick with the RPC mechanism for signalling the details of new world anchors and new holograms but I wouldn’t try and pass all of the bytes of the blob representing the world anchor across that boundary.

Instead, given that I’d already assumed a cloud connection to the PUN server I’d use the Azure cloud to store the blobs for my world anchors.

The next question is then how to best make use of Azure blob storage from Unity without having to hand-crank a bunch of code and set up HTTP headers etc. myself.

Fortunately, my colleague Dave has done some work around calling into Azure app services and blob storage from Unity and he has a blog post around it here;

Unity 3D and Azure Blog Storage

which points to a github repo over here;

Unity3DAzure on Github

and so I lifted this code into my project and wrote my own little BlobStorageHelper class around it so as to make it relatively easy to use in my scenario;

The AzureBlobStorageHelper class

There’s not a lot to it on top of what Dave already wrote – I just wrap it up for my use and add a little bit of code to download a blob directly from blob storage.

Naturally, to set this up I needed an Azure storage account (I already had one) and I just made a container within it (named ‘sharedholograms’) and made sure that it allowed public reads and authenticated writes and I copied out the access key such that the code would be able to make use of it.

I can then set up an instance of this component on my root game object;


so it’s available any time I want it from that script.

Back to RPCs

With my issue around what to do with large byte array parameters out of the way, I could return to my RPCs being as simple as their final signatures ended up being;

  void WorldAnchorCreatedRemotely(string sessionId, string anchorId)
  void CubeCreatedRemotely(string sessionId, string anchorId, Vector3 relativePosition)


because the name of the blob on the blob store can be derived from the anchorId and so it’s enough just to distribute that id.

However, what’s this sessionId parameter? This goes back to the earlier idea that I would dispatch my RPC calls using the Targets.OtherBuffered flag to notify all devices apart from the current one that something had changed.

However, what I seemed to find was that if DeviceA created one world anchor and three holograms and then quit/rejoined the server it didn’t seem to receive those 4 buffered RPCs from the server which would tell it to recreate those objects.

I’m unsure how PUN makes the distinction of “Other” but I decided that perhaps the best idea was to switch OtherBuffered to AllBuffered and then just my own mechanism to ignore RPCs which originated on the current device. Because I’m no longer sending large byte arrays over the network this didn’t feel like a particularly wasteful thing to do and so I stuck with it but it could do with a little more investigation on my part.

The other thing that I played with here was the way in which the room is originally created by my PhotoRoomJoiner component in that, initially, I wasn’t setting the RoomOptions.CleanUpCacheOnLeave which I think means that the buffered RPCs left by a player would disappear when they left the room.

However, I still seemed to find that even when I asked the room to keep around RPCs for a player that left the room the OtherBuffered option didn’t seem to deliver those RPCs back to that player when they connected again and hence me sticking with the AllBuffered option for the moment. Again, it needs more investigation.

Those big blob buffers though still cause me another problem…

Ordering of RPCs

I saw this one coming Smile Now that the upload/download of the blob representing a world anchor is done asynchronously through the cloud in a manner that’s outside the bounds of the RPCs being delivered by Photon it’s fairly easy to see a sequence of events where an RPC is delivered to create a hologram relative to a world anchor that has not yet been downloaded to the device – it’s a race and it’s pretty much certain to happen and especially if a device connected to a room with buffered RPCs containing a sequence of anchors and holograms.

Consequently, I simply keep a little lookaside list of holograms that a client has been asked to create when the world anchor that they are parented off has not yet been created. The assumption is that the world anchor will show up at some point in the future and this list can be consulted to check for all the pending holograms that then need to be created.

The AnchorCubeList Class

Bringing it All Together

All of these components are ultimately brought together by a simple “co-ordinating” script on my (almost) empty GameObject named Root that has been in the scene all along;


The only component that I haven’t mentioned there is the use of a KeywordManager from the HoloToolkit-Unity which sends the voice keyword “create” through to a function on my Root script which kicks off the whole process of creating a world anchor if necessary before creating a hologram (cube) 3m along the user’s gaze vector.

That Root script is longer than I’d like it to be at the moment so I could tidy that up a little but here it is in its completeness;

The Root Class

Testing and Carrying On…

I’ve left it to the end of the blog post to admit that I haven’t tested this much at the time of writing – it’s a bit of an experiment and so don’t expect too much from it Smile

One of the reasons for that is that I’m currently working with one HoloLens and the emulator and so importing/exporting of world anchors can be a bit of a challenge as it’s hard to know in the emulator whether things are working correctly or not and it’s much easier to test with multiple devices for that reason.

I’ll try that out in the coming days/weeks and will update the post or add to another post. I’d also like to add a little more into the code to make it possible to manipulate the holograms, show the user’s position as an avatar and so on as I’ve done in other posts around this topic so I’ll create a branch and keep working on that.

Beyond that, it might be “nice” to take away the dependency on PUN here and just build out a solution using nothing but standard pieces from Azure like service bus + blob storage as I don’t think that’d be a long way from what I’ve got here – that might be another avenue for a future post…