Baby Steps with the Azure Spatial Anchors Service

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens or Azure Mixed Reality other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

One of the many, many strands of the exciting, recent announcements around Mixed Reality (see the video here) was the announcement of a set of Azure Mixed Reality Services.

You can find the home page for these services on the web here and they encompass;

  • Azure Spatial Anchors
  • Azure Remote Rendering

Both of these are, to my mind, vital foundational services that Mixed Reality application builders have needed for quite some time so it’s great to see them surface at Azure.

At the time of writing, the Azure Remote Rendering service is in a private preview so I’m not looking at that right now but the Azure Spatial Anchors service is in a public preview and I wanted to experiment with it a little and thought I would write up some notes here as I went along.

Before I do that though…

Stop – Read the Official Docs

There’s nothing that I’m going to say in this post that isn’t covered by the official docs so I’d recommend that you read those before reading anything here and I’m providing some pointers below;

  1. Check out the overview page here if you’re not familiar with spatial anchors.
  2. Have a look at the Quick Start here to see how you can quickly get started in creating a service & making use of it from Unity.
  3. Check out the samples here so that you can quickly get up and running rather than fumbling through adding library references etc. (note that the Quick Start will lead you to the samples anyway).

With that said, here’s some rough notes that I made while getting going with the Azure Spatial Anchors service from scratch.

Please keep in mind that this service is new to me so I’m really writing up my experiments & I may well make some mistakes.

A Spatial WHAT?!

If you’re not coming from a HoloLens background or from some other type of device background where you’re doing MR/AR and creating ‘spatial anchors’ then you might wonder what these things are.

To my mind, it’s a simple concept that’s no doubt fiendishly difficult to implement. Here’s my best attempt;

A spatial anchor is a BLOB of data providing a durable representation of a 3D point and orientation in a space.

That’s how I think of it. You might have other definitions. These BLOBs of data usually involve recognising ‘feature points’ that are captured from various camera frames taken from different poses in a space.

If you’re interested in more of the mechanics of this, I found this video from Apple’s 2018 WWDC conference to be one of the better references that I’ve seen;

Understanding ARKit Tracking and Detection

So, a ‘spatial anchor’ is a BLOB of data that allows a device to capture a 3D point and orientation in space & potentially to identify that point again in the future (often known as ‘re-localising the anchor’). It’s key to note that devices and spaces aren’t perfect and so it’s always possible that a stored anchor can’t be brought back to life at some future date.

I find it useful sometimes to make a human analogy around spatial anchors. I can easily make a ‘spatial anchor’ to give to a human being and it might contain very imprecise notions of positioning in space which can nonetheless yield accurate results.

As an example, I could give this description of a ‘spatial anchor’ to someone;

Place this bottle 2cm in on each side from the corner of the red table which is nearest to the window. Lay the bottle down pointing away from the window.

You can imagine being able to walk into a room with a red table & a window and position the bottle fairly accurately based on that.

You can also imagine that this might work in many rooms with windows and red tables & that humans might even adapt and put the bottle onto a purple table if there wasn’t a red one.

Equally, you can imagine finding yourself in a room with no table and saying “sorry, I can’t figure this out”.

I think it’s worth saying that having this set of ‘instructions’ does not tell the person how to find the room nor whether they are in the right room, that is outside of the scope and the same is true for spatial anchors – you have to be vaguely in the right place to start with or use some other mechanism (e.g. GPS, beacons, markers, etc) to get to that place before trying to re-localise the anchor.

Why Anchor?

Having been involved in building applications for HoloLens for a little while now, I’ve become very used to the ideas of applying anchors and, to my mind, there are 3 main reasons why you would apply an anchor to a point/object and the docs are very good on this;

  • For stability of a hologram or a group of holograms that are positioned near to an anchor.
    • This is essentially about preserving the relationship between a hologram and a real point in the world as the device alters its impression of the structure of the space around it. As humans, we expect a hologram placed on the edge of a table to stay on the edge of that table even if a device is constantly refining its idea of the mesh that makes up that table and the rest of the space around it.
  • For persistence.
    • One of the magical aspects of mixed reality enabled by spatial anchors is the ability for a device to remember the positions of holograms in a space. The HoloLens can put the hologram back on the edge of the table potentially weeks or months after it was originally placed there.
  • For sharing.
    • The second magical aspect of mixed reality enabled by spatial anchors is the ability for a device to read a spatial anchor created by another device in a space and thereby construct a transform from the co-ordinate system of the first device to that of the second. This forms the basis for those magical shared holographic experiences.

Can’t I Already Anchor?

At this point, it’s key to note that for HoloLens developers the notion of ‘spatial anchors’ isn’t new. The platform has supported anchors since day 1 and they work really well.

Specifically, if you’re working in Unity then you can fairly easily do the following;

  • Add the WorldAnchor component to your GameObject in order to apply a spatial anchor to that component.
    • It’s fairly common to use an empty GameObject which then acts as a parent to a number of other game objects.
    • The isLocated property is fairly key here as is the OnTrackingChanged event and note also that there is an opaque form of reference to the  underlying BLOB via GetNativeSpatialAnchorPtr and SetNativeSpatialAnchorPtr.
  • Use the WorldAnchorStore class in order to maintain a persistent set of anchors on a device indexed by a simple string identifier.
  • Use the WorldAnchorTransferBatch class in order to;
    • Export the blob representing the anchor
    • Import a blob representing an anchor that has previously been exported

With this set of tools you can quite happily build HoloLens applications that;

  • Anchor holograms for stability.
  • Persist anchors over time such that holograms can be recreated in their original locations.
  • Share anchors between devices such that they can agree on a common co-ordinate system and present shared holographic experiences.

and, of course, you can do this using whatever transfer or networking techniques you like including, naturally, passing these anchors through the cloud via means such as Azure Blob Storage or ASP.NET SignalR or whatever you want. It’s all up for grabs and has been for the past 3 years or so.

Why A New Spatial Anchor Service?

With all that said, why would you look to the new Azure Spatial Anchor service if you already have the ability to create anchors and push them through the cloud. For me, I think there’s at least 3 things;

  1. The Azure Spatial Anchor service is already built and you can get an instance with a few clicks of the mouse.
    1. You don’t have to go roll your own service and wonder about all the “abilities” of scalability, reliability, availability, authentication, authorisation, logging, monitoring, etc.
    2. There’s already a set of client-side libraries to make this easy to use in your environment.
  2. The Azure Spatial Anchor service/SDK gives you x-platform capabilities for anchors.
    1. The Azure Spatial Anchor service gives you the ability to transfer spatial anchors between applications running on HoloLens, ARKit devices and ARCore devices.
  3. The Azure Spatial Anchor service lets you define metadata with your anchors.
    1. The SDK supports the notion of ‘nearby’ anchors – the SDK lets you capture a group of anchors that are located physically near to each other & then query in the future to find those anchors again.
    2. The SDK also supports adding property sets to anchors to use for your own purposes.

Point #2 above is perhaps the most technically exciting feature here – i.e. I’ve never before seen anchors shared across HoloLens, iOS and Android devices so this opens up new x-device scenarios for developers.

That said, point #1 shouldn’t be underestimated – having a service that’s already ready to run is usually a lot better than trying to roll your own.

So, how do you go about using the service? Having checked out the samples, I then wanted to do a walkthrough on my own and that’s what follows here but keep a couple of things in mind;

  • I’m experimenting here, I can get things wrong Smile
  • The service is in preview.
  • I’m going to take a HoloLens/Unity centric approach as that’s the device that I have to hand.
  • There are going to be places where I’ll overlap with the Quick Start and I’ll just refer to it at that point.

Using the Service Step 1 – Creating an Instance of the Service

Getting to the point where you have a service up and running using (e.g.) the Azure Portal is pretty easy.

I just followed this Quick Start step labelled “Create a Spatial Anchors Resource” and I had my service visible inside the portal inside of 2-3 minutes.

Using the Service Step 2 – Making a Blank Project in Unity

Once I had a service up and running, I wanted to be able to get to it from Unity and so I went and made a blank project suitable for holographic development.

I’m using Unity 2018.3.2f1 at the time of writing (there are newer 2018.3 versions).

I’ve gone through the basics of setting up a project for HoloLens development many times on this blog site before so I won’t cover them here but if you’re new to this then there’s a great reference over here that will walk you through getting the camera, build settings, project settings etc. all ok for HoloLens development.

Using the Service Step 3 – Getting the Unity SDK

Ok, this is the first point at which I got stuck. When I click on this page on the docs site;

image

then the link to the SDK takes me to the detailed doc pages but it doesn’t seem to tell me where I get the actual SDK from – I was thinking of maybe getting a Unity package or similar but I’ve not found that link yet.

This caused me to unpick the sample a little and I learned a few things by doing that. In the official Unity sample you’ll see that the plugins folder (for HoloLens) contains these pieces;

image

and if you examine the post-build step here in this script you’ll see that there’s a function which essentially adds the nuget package Microsoft.Azure.SpatialAnchors.WinCPP into the project when it’s built;

image

and you can see that the script can cope with .NET projects and C++ projects (for IL2CPP) although I’d flag a note in the readme right now which suggests that this doesn’t work for IL2CPP anyway today;

### Known issues for HoloLens

For the il2cpp scripting backend, see this [issue]( https://forum.unity.com/threads/httpclient.460748/ ).

The short answer to the workaround is to:

1. First make a mcs.rsp with the single line `-r:System.Net.Http.dll`. Place this file in the root of your assets folder.
2. Copy the `System.net.http.dll` from `<unityInstallDir>\Editor\Data\MonoBleedingEdge\lib\mono\4.5\System.net.http.dll` into your assets folder.

There is an additional issue on the il2cpp scripting backend case that renders the library unusable in this release.

so please keep that in mind given that IL2CPP is the new default backend for these applications.

I haven’t poked into the iOS/Android build steps at the time of writing so can’t quite say what happens there just yet.

This all means that when I build from Unity I end up with a project which includes a reference to Microsoft.Azure.SpatialAnchors.WinCPP as a Nuget package as below (this is taken from a .NET backend project);

image

so, what’s in that thing? Is it a WinRT component?

I don’t think it is. I had to go and visit the actual Nuget package to try and figure it out but when I took a look I couldn’t find any .winmd file or similar. All I found in that package was a .DLL;

image

and as far as I can tell this is just a flat DLL with a bunch of exported flat functions like these;

image

I can only guess but I suspect then that the SDK is built in C/C++ so as to be portable across iOS, Android & UWP/Unity and then packaged up in slightly different ways to hit those different target environments.

Within Unity, this is made more palatable by having a bridge script which is included in the sample project called AzureSpatialAnchorsBridge.cs;

image

which then does a bunch of PInvokes into that flat DLL like this one;

image

so that’s how that seems to work.

If I then want to take this across to a new project, it feels like I need to package up a few things and I tried to package;

image

hoping to come away with the minimal set of pieces that I need to make this work for HoloLens and that seemed to work when I imported this package into my new, blank project.

I made sure that project had both the InternetClient and Microphone capabilities and importantly SpatialPerception, with that in place, I’m now in my blank project and ready to write some code.

Using the Service Step 4 – Getting a Cloud Session

In true Unity tradition, I made an empty GameObject and threw a script onto it called ‘TestScript’ and then I edited in a small amount of infrastructure code;

using Microsoft.Azure.SpatialAnchors;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.Windows.Speech;
using UnityEngine.XR.WSA;
#if ENABLE_WINMD_SUPPORT
using Windows.Media.Core;
using Windows.Media.Playback;
using Windows.Media.SpeechSynthesis;
#endif // ENABLE_WINMD_SUPPORT

public class TestScript : MonoBehaviour
{
    public Material cubeMaterial;

    void Start()
    {
        this.cubes = new List<GameObject>();

        var speechActions = new Dictionary<Task>()
        {
            ["session"] = this.OnCreateSessionAsync,
            ["cube"] = this.OnCreateCubeAsync,
            ["clear"] = this.OnClearCubesAsync
        };
        this.recognizer = new KeywordRecognizer(speechActions.Keys.ToArray());

        this.recognizer.OnPhraseRecognized += async (s) =>
        {
            if ((s.confidence == ConfidenceLevel.Medium) || 
                (s.confidence == ConfidenceLevel.High))
            {
                Func<Task> value = null;

                if (speechActions.TryGetValue(s.text.ToLower(), out value))
                {
                    await value();
                }
            }
        };
        this.recognizer.Start();
    }
    async Task OnCreateSessionAsync()
    {
        // TODO: Create a cloud anchor session here.
    }
    async Task OnCreateCubeAsync()
    {
        var cube = GameObject.CreatePrimitive(PrimitiveType.Cube);

        cube.transform.localScale = new Vector3(0.2f, 0.2f, 0.2f);

        cube.transform.position = 
            Camera.main.transform.position + 2.0f * Camera.main.transform.forward;

        cube.GetComponent<Renderer>().material = this.cubeMaterial;

        this.cubes.Add(cube);

        var worldAnchor = cube.AddComponent<WorldAnchor>();
    }
    async Task OnClearCubesAsync()
    {
        foreach (var cube in this.cubes)
        {
            Destroy(cube);
        }
        this.cubes.Clear();
    }
    public async Task SayAsync(string text)
    {
        // Ok, this is probably a fairly nasty way of playing a media stream in
        // Unity but it sort of works so I've gone with it for now 🙂
#if ENABLE_WINMD_SUPPORT
        if (this.synthesizer == null)
        {
            this.synthesizer = new SpeechSynthesizer();
        }
        using (var stream = await this.synthesizer.SynthesizeTextToStreamAsync(text))
        {
            using (var player = new MediaPlayer())
            {
                var taskCompletionSource = new TaskCompletionSource<bool>();

                player.Source = MediaSource.CreateFromStream(stream, stream.ContentType);

                player.MediaEnded += (s, e) =>
                {
                    taskCompletionSource.SetResult(true);
                };
                player.Play();
                await taskCompletionSource.Task;
            }
        }

#endif // ENABLE_WINMD_SUPPORT
    }
#if ENABLE_WINMD_SUPPORT
    SpeechSynthesizer synthesizer;
#endif // ENABLE_WINMD_SUPPORT

    KeywordRecognizer recognizer;
    List<GameObject> cubes;
}

and so this gives me the ability to say “session” to create a session, “cube” to create a cube with a world anchor and “clear” to get rid of all my cubes.

Into that, it’s fairly easy to add an instance of CloudSpatialAnchorSession and create it but note that I’m using the easy path at the moment of configuring it with the ID and Key for my service. In the real world, I’d want to configure it to do auth properly and the service is integrated with AAD auth to make that easier for me if I want to go that way.

I added a member variable of type CloudSpatialAnchorSession and then just added in a little code into my OnCreateSessionAsync method;

    async Task OnCreateAsync()
    {
        if (this.cloudAnchorSession == null)
        {
            this.cloudAnchorSession = new CloudSpatialAnchorSession();
            this.cloudAnchorSession.Configuration.AccountId = ACCOUNT_ID;
            this.cloudAnchorSession.Configuration.AccountKey = ACCOUNT_KEY;
            this.cloudAnchorSession.Error += async (s, e) => await this.SayAsync("Error");
            this.cloudAnchorSession.Start();
        }
    }

and that’s that. Clearly, I’m using speech here to avoid having to make “UI”.

Using the Services Step 5 – Creating a Cloud Anchor

Ok, I’ve already got a local WorldAnchor on any and all cubes that get created here so how do I turn these into cloud anchors?

The first thing of note is that the CloudSpatialAnchorSession has these 2 floating point values (0-1) which tell you whether it is ready or not to create a cloud anchor. You call GetSessionStatusAsync and it returns a SessionStatus which reports;

If it’s not ready then you need to get your user to walk around a bit until it is ready with some nice UX and so on and you can even query the UserFeedback to see what you might suggest to the user to get them to improve on the situation.

It looks like you can also get notified of changes to these values by handling the SessionUpdated event as well.

Consequently, I wrote a little method to try and poll these values, checking for something that was over 1.0f;

    async Task WaitForSessionReadyToCreateAsync()
    {
        while (true)
        {
            var status = await this.cloudAnchorSession.GetSessionStatusAsync();

            if (status.ReadyForCreateProgress >= 1.0f)
            {
                break;
            }
            await Task.Delay(250);
        }
    }

and that seemed to work reasonably although, naturally, the hard-coded 250ms delay might not be the smartest thing to do.

With that in place though I can then add this little piece of code to my OnCreateCubeAsync method just after it attaches the WorldAnchor to the cu

        var cloudSpatialAnchor = new CloudSpatialAnchor(
            worldAnchor.GetNativeSpatialAnchorPtr(), false);

        await this.WaitForSessionReadyToCreateAsync();

        await this.cloudAnchorSession.CreateAnchorAsync(cloudSpatialAnchor);

        this.SayAsync("cloud anchor created");
and sure enough I see the portal reflecting that I have created an anchor in the cloud;

image

Ok – anchor creation is working! Let’s move on and see if I can get an anchor re-localised.

Using the Service Step 5 – Localising an Anchor

In so much as I can work out so far, the process of ‘finding’ one or more anchors comes down to using a CloudSpatialAnchorWatcher and asking it to look for some anchors for you in one of two ways by using this AnchorLocateCriteria;

  • I can give the watcher one or more identifiers for anchors that I have previously uploaded (note that the SDK fills in the cloud anchor ID (string (guid)) in the Identifier property of the CloudSpatialAnchor after it has been saved to the cloud).
  • I can ask the watcher to look for anchors that are nearby another anchor.

I guess the former scenario works when my app has some notion of a location based on something like a WiFI network name, a marker, a GPS co-ordinate or perhaps just some setting that the user has chosen and this can then be used to find a bunch of named anchors that are supposed to be associated with that place.

Once one or more of those anchors has been found, the nearby mode can perhaps be used to find other anchors near to that site. The way in which anchors become ‘nearby’ is documented in the “Connecting Anchors” help topic here.

It also looks like I have a choice when loading anchors as to whether I want to include the local cache on the device and whether I want to load anchors themselves or purely their metadata so that I can (presumably) do some more filtering before deciding to load. That’s reflected in the properties BypassCache and RequestedCategories respectively.

In trying to keep my test code here as short as possible, I figured that I would simply store in memory any anchor Ids that have been sent off to the cloud and then I’d add another command “Reload” which attempted to go back to the cloud, get those anchors and recreate the cubes in the locations where they were previously stored.

I set the name of the cube to be the anchor ID from the cloud, i.e. after I create the cloud anchor I just do this;

        await this.cloudAnchorSession.CreateAnchorAsync(cloudSpatialAnchor);

        // NEW!
        cube.name = cloudSpatialAnchor.Identifier;

        this.SayAsync("cloud anchor created");

and so that stores the IDs for me. I also need to change the way in which I create the session in order to handle 2 new events, AnchorLocated and LocateAnchorsCompleted when I create the CloudSpatialAnchorSession;

   async Task OnCreateSessionAsync()
    {
        if (this.cloudAnchorSession == null)
        {
            this.cloudAnchorSession = new CloudSpatialAnchorSession();
            this.cloudAnchorSession.Configuration.AccountId = ACCOUNT_ID;
            this.cloudAnchorSession.Configuration.AccountKey = ACCOUNT_KEY;
            this.cloudAnchorSession.Error += async (s, e) => await this.SayAsync("Error");

            // NEW
            this.cloudAnchorSession.AnchorLocated += OnAnchorLocated;

            // NEW
            this.cloudAnchorSession.LocateAnchorsCompleted += OnLocateAnchorsCompleted;

            this.cloudAnchorSession.Start();
        }
    }

and then I added a new voice command “reload” which grabs all the IDs from the cubes and attempts to create a watcher to reload them;

    async Task OnReloadCubesAsync()
    {
        if (this.cubes.Count > 0)
        {
            var identifiers = this.cubes.Select(c => c.name).ToArray();

            await this.OnClearCubesAsync();

            var watcher = this.cloudAnchorSession.CreateWatcher(
                new AnchorLocateCriteria()
                {
                    Identifiers = identifiers,
                    BypassCache = true,
                    RequestedCategories = AnchorDataCategory.Spatial,
                    Strategy = LocateStrategy.AnyStrategy
                }
            );
        }
    }

and then finally the event handler for each located anchor is as follows – I basically recreate the cube and attach the anchor;

    void OnAnchorLocated(object sender, AnchorLocatedEventArgs args)
    {
        UnityEngine.WSA.Application.InvokeOnAppThread(
            () =>
            {
                var cube = GameObject.CreatePrimitive(PrimitiveType.Cube);

                cube.transform.localScale = new Vector3(0.2f, 0.2f, 0.2f);

                cube.GetComponent<Renderer>().material = this.relocalizedCubeMaterial;

                var worldAnchor = cube.AddComponent<WorldAnchor>();

                worldAnchor.SetNativeSpatialAnchorPtr(args.Anchor.LocalAnchor);

                cube.name = args.Identifier;

                SayAsync("Anchor located");
            },
            false
        );
    }

and the handler for when all anchors have been located just tells me that the process has finished;

    void OnLocateAnchorsCompleted(object sender, LocateAnchorsCompletedEventArgs args)
    {
        SayAsync("Anchor location completed");
        args.Watcher.Stop();
    }

and that’s pretty much it – I found that my anchors reload in much the way that I’d expect them to.

Wrapping Up

As I said at the start of the post, this was just me trying out a few rough ideas and I’ve covered nothing that isn’t already present in the official samples but I found that I learned a few things along the way and I feel like I’m now a little more conversant with this service. Naturally, I need to revisit and go through the process of updating/deleting anchors and also of looking at gathering ‘nearby’ anchors and re-localising them but I think that I “get it” more than I did at the start of the post.

The other thing I need to do is to try this out from a different kind of device, more than likely an Android phone but that’s for another post Smile

Grabbing a Photo & Calling Azure Vision API from ‘Pure UWP’ Code

Just using a blog post as a pastie – I had cause to write a function today to take a photo from a camera on a UWP device and send it to the Azure Cognitive Service for Vision to ask for ‘tags’ from the image.

The intention was to call the function from a UWP-specific Unity app running on HoloLens (the code does work on HoloLens). It would need camera, microphone and internet client capabilities to work.

It’s very specific to one task but, clearly, could be wrapped up into some class that made it a lot more general-purpose and exercised more pieces of that API but I just wanted somewhere to put the code in case I want it again in the future. This is what I had…

static async Task<Dictionary<string,double>> TakePhotoAnalyzeAzureForTagsAsync(
            string azureVisionKey,
            string azureVisionBaseEndpoint = "https://westeurope.api.cognitive.microsoft.com")
        {
            var azureVisionApi = "/vision/v1.0/analyze?visualFeatures=Tags";
            Dictionary<String,double> resultDictionary = null;

            // Capture an image from the camera.
            var capture = new MediaCapture();

            await capture.InitializeAsync();

            var stream = new InMemoryRandomAccessStream();

            await capture.CapturePhotoToStreamAsync(
                ImageEncodingProperties.CreateJpeg(), stream);

            stream.Seek(0);

            // Now send that off to Azure for processing
            var httpClient = new HttpClient();
            httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", azureVisionKey);

            var streamContent = new HttpStreamContent(stream);
            streamContent.Headers["Content-Type"] = "application/octet-stream";

            var response = await httpClient.PostAsync(
                new Uri(azureVisionBaseEndpoint + azureVisionApi),
                streamContent);

            if (response.IsSuccessStatusCode)
            {
                var responseString = await response.Content.ReadAsStringAsync();
                JsonObject jsonObject;

                if (JsonObject.TryParse(responseString, out jsonObject))
                {
                    resultDictionary = 
                        jsonObject.GetNamedArray("tags").ToDictionary(
                            tag => tag.GetObject().GetNamedString("name"),
                            tag => tag.GetObject().GetNamedNumber("confidence"));
                }
            }
            return (resultDictionary);
        }

As a related aside, I also revisited this blog post recently as part of looking back at Cognitive Services and its facial APIs and I realised that code needed changing a bit to make it work and so I did that work and dropped it onto github over here;

Example of using both local UWP face detection and Cognitive Service face detection in a UWP app

Experiments with Shared Holograms and Azure Blob Storage/UDP Multicasting (Part 7)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

A follow-up to my previous post around experiments with shared holograms using Azure blob storage and UDP multicasting techniques.

At the end of the previous post, I said that I might return and make a slightly better ‘test scene’ for the Unity project this post is my write up of my attempts to do that.

What’s in the New Test Scene?

I found a model of a house on Remix3D.com;

image

and I made the test scene about visualising that model in a consistent place on multiple devices with the ability to rotate, scale and move it such that the multiple devices keep a consistent view.

What I built is pretty simple and the essential steps involved in the scene are;

  • The app runs and waits for the underlying library to tell it whether there are already other devices on the same network or not. During this period, it displays a ‘waiting screen’ for up to 5 seconds if it doesn’t receive notification that there are other devices on the network.

20180110_130146_HoloLens

  • If the app determines that no-other devices are on the network then it pops up a model of a house gaze-locked to the device so that the user can potentially move it around and say ‘done’ to place it.

20180110_125124_HoloLens

  • Once positioned, the app replaces the model displayed by using the APIs detailed in the previous posts to create a shared hologram which is exactly the same as the house and in the same position etc. At this point, its creation will be multicast around the network and the blob representing its world anchor will be uploaded to Azure.
  • If the app determines that there are other devices on the network at start-up time then it will inform the user of this;

20180110_125554_HoloLens

  • and it will stop the user from positioning the model while waiting to bring the position data (world anchor) from Azure. The same thing should happen in the race condition where multiple users start the app at the same time and then one of them becomes the first to actually position the model.

20180110_125733_HoloLens

  • Once the model has been positioned on the local device (in whichever way) it enters into a mode which allows for voice commands to be used to enter ‘rotate’, ‘scale’ and ‘move’ modes to move it around;

20180110_125155_HoloLens

  • those transformations are then multicast to other devices on the network such that they all display the same model of a house in the same place.

and that’s pretty much it Smile

How’s the Test Scene Structured?

I already had a test scene within the Unity project that I’d published to github and so I just altered it rather than starting from scratch.

It’s very simple – the scene starts with the main camera parenting both a text object (to give a very poor Heads-Up-Display) and the model of the house (to give a very poor gaze-locked positioning system) as below;

image

there is then one object called ScriptHolder which has an instance of the Shared Hologram Controller component (and its dependency) that I discussed in the previous posts;

image

I’ve ommitted the details of my own Azure configuration so that would need to be filled in to specify the storage details and I’ve also told the script that I want to synchronise transforms on a fairly high frequency which, realistically, I think I could drop down a little.

Beyond that, I also have a script here called Main Script which contains the logic for the scene with the positive part of it being that there’s not too much of it;

using SharedHolograms;
using System;
using System.Linq;
using UnityEngine;
using UnityEngine.Windows.Speech;

public class MainScript : MonoBehaviour, ICreateGameObjects
{
    // Text to display output messages on
    public TextMesh StatusDisplayTextMesh;

    // GameObject to use as a marker to position the model (i.e. the house)
    public GameObject PositionalModel;

    // Implementation of ICreateGameObject - because we are not creating a Unity primitive
    // I've implemented this here and 'plugged it in' but our creation is very simple in
    // that we duplicate the object that we're using as the PositionalModel (i.e. the
    // house in my version).
    public void CreateGameObject(string gameObjectSpecifier, Action<GameObject> callback)
    {
        // Right now, we know how to create one type of thing and we do it in the most
        // obvious way but we could do it any which way we like and even get some other
        // componentry to do it for us.
        if (gameObjectSpecifier == "house")
        {
            var gameObject = GameObject.Instantiate(this.PositionalModel);
            gameObject.SetActive(true);
            callback(gameObject);
        }
        else
        {
            // Sorry, only know about "house" right now.
            callback(null);
        }
    }
    void Start()
    {
        // Set up our keyword handling. Originally, I imagined more than one keyword but
        // we ended up just with "Done" here.
        var keywords = new[]
        {
            new { Keyword = "done", Handler = (Action)this.OnDoneKeyword }
        };
        this.keywordRecognizer = new KeywordRecognizer(keywords.Select(k => k.Keyword).ToArray());

        this.keywordRecognizer.OnPhraseRecognized += (e) =>
        {
            var understood = false;

            if ((e.confidence == ConfidenceLevel.High) ||
                (e.confidence == ConfidenceLevel.Medium))
            {
                var handler = keywords.FirstOrDefault(k => k.Keyword == e.text.ToLower());

                if (handler != null)
                {
                    handler.Handler();
                    understood = true;
                }
            }
            if (!understood)
            {
                this.SetStatusDisplayText("I might have missed what you said...");
            }
        };
        // We need to know when various things happen with the shared holograms controller.
        SharedHologramsController.Instance.SceneReady += OnSceneReady;
        SharedHologramsController.Instance.Creator.BusyStatusChanged += OnBusyStatusChanged;
        SharedHologramsController.Instance.Creator.HologramCreatedRemotely += OnRemoteHologramCreated;
        SharedHologramsController.Instance.Creator.GameObjectCreator = this;

        // Wait to see whether we should make the positional model active or not.
        this.PositionalModel.SetActive(false);
        this.SetStatusDisplayText("waiting...");
    }
    void OnDoneKeyword()
    {
        if (!this.busy)
        {
            this.keywordRecognizer.Stop();

            this.SetStatusDisplayText("working, please wait...");

            if (this.PositionalModel.activeInHierarchy)
            {
                // Get rid of the placeholder.
                this.PositionalModel.SetActive(false);

                // Create the shared hologram in the same place as the placeholder.
                SharedHologramsController.Instance.Creator.Create(
                    "house",
                    this.PositionalModel.transform.position,
                    this.PositionalModel.transform.forward,
                    Vector3.one,
                    gameObject =>
                    {
                        this.SetStatusDisplayText("object created and shared");
                        this.houseGameObject = gameObject;
                        this.AddManipulations();
                    }
                );
            }
        }
    }
    void OnBusyStatusChanged(object sender, BusyStatusChangedEventArgs e)
    {
        this.busy = e.Busy;

        if (e.Busy)
        {
            this.SetStatusDisplayText("working, please wait...");
        }
    }
    void OnSceneReady(object sender, SceneReadyEventArgs e)
    {
        // Are there other devices around or are we starting alone?
        if (e.Status == SceneReadyStatus.OtherDevicesInScene)
        {
            this.SetStatusDisplayText("detected other devices, requesting sync...");
        }
        else
        {
            this.SetStatusDisplayText("detected no other devices...");

            // We need this user to position the model so switch it on
            this.PositionalModel.SetActive(true);
            this.SetStatusDisplayText("walk to position the house then say 'done'");

            // Wait for the 'done' keyword.
            this.keywordRecognizer.Start();
        }
    }
    void OnRemoteHologramCreated(object sender, HologramEventArgs e)
    {
        // Someone has beaten this user to positioning the model
        // turn off the model.
        this.PositionalModel.SetActive(false);

        this.SetStatusDisplayText("sync'd...");

        // Stop waiting for the 'done' keyword (if we are)
        this.keywordRecognizer.Stop();

        this.houseGameObject = GameObject.Find(e.ObjectId.ToString());

        // Make sure we can manipulate what the other user has placed.
        this.AddManipulations();
    }
    void AddManipulations()
    {
        this.SetStatusDisplayText("say 'move', 'rotate' or 'scale'");

        // The Manipulations script contains a keyword recognizer for 'move', 'rotate', 'scale'
        // and some basic logic to wire those to hand manipulations
        this.houseGameObject.AddComponent<Manipulations>();
    }
    void SetStatusDisplayText(string text)
    {
        if (this.StatusDisplayTextMesh != null)
        {
            this.StatusDisplayTextMesh.text = text;
        }
    }
    KeywordRecognizer keywordRecognizer;
    GameObject houseGameObject;
    bool busy;
}

if someone (anyone! please! please! Winking smile) had been following the previous set of blog scripts closely they might have noticed that in order to write that code I had to change my existing code to at least;

  • Fire an event when the device joins the network such that code can be notified of whether the messaging layer has seen other devices on the network or not.
  • Fire events when other devices on the network create/delete holograms causing them to be imported and created by the local device.
  • Fire an event as/when the underlying code is ‘busy’ doing some downloading or uploading or similar.

Having tried to implement this scene it was immediately obvious to me that this was needed but it wasn’t so obvious to me that I implemented those pieces beforehand and so that was a useful output of writing this test scene.

The other thing that’s used in the scene is a MonoBehaviour named Manipulations. This is a version of a script that I’ve used in a few places in the past and it’s a very cheap and cheerful way to provide rotate/scale/move behaviour on a focused object in response to voice commands and hand manipulations.

I placed this script and the other script that is specific to the test scene in the ‘Scene Specific’ folder;

image

and the Manipulations script has a dependency on the 3 materials in the Resources folder that it uses for drawing different coloured boxed around an object while it is being rotated/scaled/moved;

image

and that’s pretty much it.

One thing that I’d note is that when I’d used this Manipulations scripts before it was always in projects that were making use of the Mixed Reality Toolkit for Unity and, consequently, I had written the code to depend on some items of the toolkit – specifically around the IManipulationHandler interface and the IInputClickHandler interface.

I don’t currently have any use of the toolkit in this test project and it felt like massive overkill to add it just to enable this one script and so I reworked the script to move it away from having a dependency on the toolkit and I was very pleased to find that this was only a small piece of work – i.e. the toolkit had mostly done a bit of wrapping on the raw Unity APIs and so it wasn’t difficult to unpick that dependency here.

Wrapping Up

I don’t intend to write any more posts in this mini-series around using Azure blob storage and UDP multicasting to enable shared holograms, I think I’ve perhaps gone far enough Smile

The code is all up on github should anyone want to explore it, try it, take some pieces for their own means.

I’m always open to feedback so feel free to do that if you want to drop me a line and be aware that I’ve only tested this code in a limited way as I wrote it all on a single HoloLens device using the (supplied) test programs to simulate responses from a second device but I’m ‘reasonably’ happy that it’s doing sensible things.