Rough Notes on Porting “glTF Viewer” from Mixed Reality Toolkit (MRTK) V1 to MRTK V2 (RC2.1)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens or Azure Mixed Reality other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Around 6 months ago, I wrote a simple application for HoloLens 1 and published it to the Windows Store.

It’s called “glTF Viewer” and it provides a way to view models stored in glTF format on the HoloLens with basic move, rotate, scale manipulations. It also provides a way via which one user can put such a model onto their HoloLens, open it up and then share it automatically to other users on the same local network such that they will also be able to see the same model and the manipulations performed on it. This includes downloading the files for the model from the originating device and caching them onto the requesting device.

You can find the application in the store here;

glTF Viewer in the Microsoft Store

and you can find the original blogpost that I wrote about the process of writing this application here;

A Simple glTF Viewer for HoloLens

and you can find the source code for the application over here;

glTF Viewer on GitHub

I’d like to keep this application up to date and so with the arrival of MRTK V2 (release candidates) I thought that it would be a good idea to port the application over to MRTK V2 such that the application was “more modern” and better suited to work on HoloLens 2 when the device becomes available.

In doing that work, I thought it might be helpful to document the steps that I have taken to port this application and that’s what this blog post is all about – it’s a set of ‘rough notes’ made as I go through the process of moving the code from V1 to V2.

Before beginning, though, I want to be honest about the way in which I have gone about this port. What I actually did was;

  1. Begin the port thinking that I would write it up as I went along.
  2. Get bogged down in some technical details.
  3. Complete the port.
  4. Realise that I had not written anything much down.

So it was a bit of a failure in terms of writing anything down.

Consequently, what I thought that I would do is to revisit the process and repeat the port from scratch but, this time, write it down Smile as I went along.

That’s what the rest of this post is for – the step-by-step process of going from MRTK V1 to MRTK V2 on this one application having done the process once already.

Before I get started though, I’d like to point out some links.

Some Links…

There are a number of links that relate to activities and reading that you can do if you’re thinking of getting started with a mixed reality application for HoloLens 2 and/or thinking of porting an existing application across from HoloLens 1. The main sites that I find myself using are;

Armed with those docs, it’s time to get started porting my glTF Viewer to MRTK V2.

Making a New Branch, Getting Versions Right

I cloned my existing repo from https://github.com/mtaulty/GLTF-Model-Viewer using a recursive clone and made sure that it would still build.

There are quite a few steps necessary to build this project right now described in the readme at https://github.com/mtaulty/GLTF-Model-Viewer.

Specifically, the repo contains a sub-module which uses UnityGLTF from the Kronos Group. There’s nothing too unusual about that except that the original MRTK also included some pieces around GLTF which clashed with UnityGLTF and so I had to write some scripts to as to set a few things up and remove one or two toolkit files in order to get things to build.

I described this process in the original blog post under the section entitled ‘A Small Challenge with UnityGLTF’.

One of the expected benefits of porting to MRTK V2 with its built-in support for GLTF is to be able to get rid of the sub-module and the scripts needed to hack the build process and end up with a much cleaner project all round Smile 

I made a new branch for my work named V2WorkBlogPost as I already had the V2Work branch where I first tried to make a port and from which I intend to merge back into master at some later point.

With that branch in play, I made sure that I had the right prerequisites for what I was about to do, taking them from the ‘Getting Started’ page here;

  • Visual Studio 2017.
    • I have this although I’m actually working in 2019 at this point.
  • Unity 2018.4.x.
    • I have 2018.4.3f1 – I have a particular interest in this version because it is supposed to fix a (UWP platform) issue which I raised here where the UWP implementations of System.IO.File APIs got reworked in Windows SDK 16299 which broke existing code which used those file APIs. You can see more on that in the original blog post under the title “Challenge 3 – File APIs Change with .NET Standard 2.0 on UWP”. It’s nice that Unity has taken the effort to try and fix this so I’ll be super keen to try it out.
  • Latest MRTK release.
    • I took the V2.0.0 RC2.1 release and I only took the Foundation package rather than the examples as I do not want the examples in my project here. Naturally, I have the examples in another place so that I can try things out.
  • Windows SDK 18362+.
    • I have 18362 as the latest installed SDK on this machine.

It is worth noting at this point a couple of additional things about my glTF Viewer application as it is prior to this port;

  • It has already been built in a Unity 2018.* version. It was last built with 2018.3.2f1.
  • It is already building on the IL2CPP back-end

Why is my application already building for Il2CPP?

Generally, I would much prefer to work on the .NET back-end but it has to be acknowledged that IL2CPP is inevitable given that Unity 2019 versions no longer have the .NET back-end support but there is a bigger reason for my use of IL2CPP. My application using classes from .NET Standard 2.0 (specifically HttpListener) and due to the deprecation of the .NET back-end Unity did not add support for .NET Standard 2.0 into the .NET back-end. So, if I want to use HttpListener then I have to use IL2CPP. I wrote about this in gory detail at the time that I wrote the application so please refer back to the original blog post  (in the section entitled ‘Challenge Number 1 – Picking up .NET Standard 2.0 ’) if you want the blow-by-blow.

So, armed with the right software and an application that already builds in Unity 2018 on the IL2CPP back-end, I’m ready to make some changes.

Opening the Project in Unity

I opened up my project in the 2018.4.3f1 version of Unity and allowed it to upgrade it from 2018.3.2f1.

I didn’t expect to see problems in this upgrade but it did seem to get stuck on this particular error;

image

which says;

“Project has invalid dependencies:
    com.unity.xr.windowsmr.metro: Package [com.unity.xr.windowsmr.metro@1.0.10] cannot be found”

so my best thought was to use the Package Manager which offered to upgrade this to Version 1.0.12

image

and that seemed to do the trick. I had a look at my build settings as well and switched platform over to the UWP;

image

A quick note on the debugging settings here. For Il2CPP, you can either choose to debug the C# code or the generated C++ code and Unity has all the details over here.

UWP: Debugging on IL2CPP Scripting Backend

Take extra care to ensure that you have the right capabilities set in your project for this to work as mentioned in the first paragraph of that page.

Because of this, I generally build Release code from Visual Studio and attempt to use the Unity C# debugging first. If that doesn’t help me out, I tend to debug the generated C++ code using the native debugger in Visual Studio and, sometimes, I rebuild from Visual Studio in Debug configuration to help with that debugging on native code.

I’d note also that I do toggle “Scripts Only Build” when I think it is appropriate in order to try and speed up build times but it’s “risky” as it’s easy to leave it on when you should have turned it off so beware on that one Smile

With that done, Unity was opening my project in version 2018.4.3f1 and it would build a Visual Studio project for me and so I committed those changes and moved on.

The commit is here.

A Word on Scenes

An important thing to note about the glTF Viewer application is that it’s really quite simple. There’s a bit of code in there for messaging and so on but there’s not much to it and, as such, it’s built as a single scene in Unity as you can see below;

image

If you have a multi-scene application then you’re going to need to take some steps to work with the MRTK V2 across those multiple scenes to ensure that;

  1. The MRTK doesn’t get unloaded when scenes change
  2. More than one MRTK doesn’t get loaded when scenes change

I’ve seen a few apps where this can be a struggle and there’s an issue raised on the MRTK V2 around this over here with a long discussion attached which I think leads to the approach of having a “base” scene with the MRTK embedded into it and then loading/unloading scenes with the “additive” flag set but you might want to check out that whole discussion if this is an area of interest for you as it doesn’t impact my app here.

Adding the New Toolkit

This is much easier than the previous 2 steps in that I just imported the Unity package that represents MRTK V2 RC 2.1.

I hit one error;

“Assembly has reference to non-existent assembly ‘Unity.TextMeshPro’ (Assets/MixedRealityToolkit.SDK/MixedRealityToolkit.SDK.asmdef)”

but that was easily fixed by going back into the Package Manager and installing the Text Mesh Pro package into my project and I, once again, ensured that the project would build in Unity. It did build but it spat out this list of “errors” that I have seen many times working on these pieces so I thought I would include a screenshot here;

image

These errors all relate to the “Reference Rewriter” and all seem to relate to System.Numerics and I have seen these flagged as errors by Unity in many projects recently and yet the build is still flagged as Succeeded and seems to deploy and work fine on a device.

Consequently, I ignore them although the last error listed there about a failure to copy from the Temp folder to the Library folder is an actual problem that I have with Unity at the moment and I have to fix that one by restarting the editor and the hub until it goes away Confused smile

When it did go away, I then hit this error;

“Scripted importers UnityGLTF.GLTFImporter and Microsoft.MixedReality.Toolkit.Utilities.Gltf.Serialization.Editor.GlbAssetImporter are targeting the glb extension, rejecting both.
UnityEditor.Experimental.AssetImporters.ScriptedImporter:RegisterScriptedImporters()”

but I can fully understand why Unity is complaining here because I do have two versions of UnityGLTF in the project right now so I’m not surprised that Unity is a bit puzzled but I’m hoping to address this shortly and Unity seems to be tolerating the situation for now and so, with those caveats, I do now have a project that contains both the old MRTK V1 and the new MRTK V2 as below;

image

The big question for me at this point is whether to take a dependency on the MRTK V2 as a Git sub-module or whether to just include the code from the MRTK V2 in my Unity project.

I much prefer to take a dependency on it as a sub-module but I figure that while it is not yet finished I will have the code in my project and then I can do the sub-module step at a later point. Consequently, I had quite a lot of folders to add to my Git repo and it leaves my repo in a slightly odd state because the MRTK V1 is in there as a sub-module and the MRTK V2 is in there as code but I’m about to remove MRTK V1 anyway so it won’t be in this hybrid state for too much longer.

The commit is here.

Removing the MRTK V1 – Surgical Removal or the Doomsday Option?

I now have a project with both the MRTK V1 and the MRTK V2 within it but how do I go about removing the V1 and replacing it with the V2?

So far when I’ve worked on applications that are doing this it feels to me like there are 2 possibilities;

  1. The “Doomsday” option – i.e. delete the MRTK V1 and see what breaks.
  2. The “Surgical” option – i.e. make an inventory of what’s being used from the MRTK V1 and consider what replacement is needed.

For the blog post, I’m going to go with option 2 but I’ve seen developers try both approaches and I’m not convinced that one is any better than the other.

In my particular application, I did a survey of my scene to try and figure out what is being used from the toolkit.

Firstly, I had some objects in my scene which I think I used in their default configuration;

  • Cursor object
  • InputManager object
  • MixedRealityCameraParent object

I’m expecting all of these to be replaced by the MRTK V2 camera system and input system without too much effort on my part.

I also noticed that I had a ProgressIndicator. At the time of writing, I’m asking for this to be replaced into the MRTK V2 but it’s not there as far as I know and so my expectation here is to simply keep these pieces from the MRTK V1 in my application for now and continue to use the progress indicator as it is.

Having taken a look at my scene, I wanted to see where I was using the MRTK V1 from my own code. My first thought was to attempt to use the “Code Map” feature of Visual Studio but I don’t think there’s enough “differentiation” between my code and the code in the toolkit to be able to make sense of what’s going on.

Abandoning that idea, I looked at the entire set of my scripts that existed in the scripting project;

image

There are around 30 or so scripts there, it’s not huge and so I opened them all up in the editor and searched for HoloToolkit in all of them and I came up with a list of 8 files;

image

I then opened those files and did a strategic search to try and find types from the HoloToolkit and I found;

  • A use of the interface IFocusable in FocusWatcher.cs a class which was trying to keep track of which (if any) object has focus.
  • A use of the ObjectCursor in a class CursorManager.cs which tried to make the cursor active/inactive at suitable times, usually while something was asynchronously loading.
  • The ModelUpdatesManager class which adds the type TwoHandManipulatable to a GameObject such that it can be moved, rotated, scaled and this class needs a BoundingBox prefab in order to operate.
  • A use of the ProgressIndicator type which I use in order to show/hide progress when a long running operation is going on.

Additionally, I know that I am also using UnityGLTF from the Kronos repo in order to load GLTF models from files whether they be JSON/binary and whether they be an object packaged into a single file or into multiple files which all need loading.

The application also makes use of voice commands but I know that in the MRTK V1 I had to avoid the speech support as it caused me some issues. See back to the original blog post under the section entitled “Challenge 7” for the blow-by-blow on problems I had using speech as pre

While it’s probably not a perfect list, this then gives me some things to think about – note that I am mostly building this list by looking back at the porting guide and finding equivalents for the functionality that I have used;

  1. Input – Replace the Cursor, InputManager, MixedRealityCameraParent in the scene with the new MRTK systems.
  2. Speech – Look into whether speech support in MRTK V2 works better in my scenario than it did in MRTK V1.
  3. GLTF – Replace the Unity GLTF use from the Kronos repo with the new pieces built into MRTK V2.
  4. Focus – Replace the use of IFocusable with the use of IMixedRealityFocusHandler.
  5. Cursor – Come up with a new means for showing/hiding the cursor across the various pointers that are used by the MRTK V2.
  6. Manipluations – Replace the TwoHandManipulatable script with use of the new ManipulationHandler, NearInteractionGrabbable and BoundingBox scripts with suitable options set on them.
  7. Rework – Look into which pieces of the application could benefit from being reworked, re-architected based on the new service-based approach in MRTK V2.

That’s a little backlog to work on and I’ll work through them in the following sub-sections.

Input

Firstly, I removed the InputManager, Cursor and MixedRealityCameraParent from my scene and then used the Mixed Reality Toolkit –> Add to Scene and Configure menu to add the MRTK V2 into the scene. At this point, the “Mixed Reality Toolkit” menu is a little confusing as both the MRTK V1 and V2 are contributing to it but, for now, I can live with that.

I chose the DefaultHoloLens2ConfigurationProfile for my toolkit profile as below;

image

A word about “profiles”. I think it’s great that a lot of behaviour is moving into “profiles” or what an old-fashioned person like me might call “configuration by means of a serialized object” Smile

The implication of this though is that if you were to lose these profiles then your application would break. I’ve seen these profiles be lost more than once by someone who allowed them to be stored in the MRTK folders (by default the MixedRealityToolkit.Generated folder) themselves & then deleted one version of the MRTK in order to add another losing the MixedRealityToolkit.Generated folder in the process.

Additionally, imagine that in one of today’s Default profiles a setting is “off”. What’s to say that a future profile won’t replace it with a value of “on” and change your application behaviour?

Maybe I’m just paranoid Winking smile but my way of managing these profiles is to create a “Profiles” folder of my own and then duplicate every single profile that is in use into that folder and give it a name that lines up with my app. That way, I know exactly where my profiles are coming from and I don’t run the risk of deleting them by mistake or having them overwritten by a newer toolkit.

While doing this, I noticed that the DefaultMixedRealityToolkitConfigurationProfile allows for “copy and customize”;

image

whereas the DefaultHoloLens2ConfigurationProfile doesn’t seem to;

image

but I might be missing how this is supposed to work. Regardless, I started with the DefaultMixedRealityToolkitConfigurationProfile and I cloned it to make a copy in Profiles\GLTFViewerToolkitConfigurationProfile.

I then went through that profile and;

  • Changed the Target Scale to be World.
  • Changed the Camera profile to be the DefaultHoloLens2CameraProfile before cloning that to make Profiles\GLTFViewerCameraProfile
  • Changed the Input profile to be the DefaultHoloLens2InputSystemProfile before cloning that to make Profiles\GLTFViewerInputSystemProfile
    • In doing this, I cloned all of the 8 sub-sections for Input Actions, Input Action Rules, Pointer, Gestures, Speech Commands, Controller Mapping, Controller Visualization, Hand Tracking
  • I switched off the Boundary system, leaving it configured with its default profile
  • I switched off the Teleport system, leaving it configured with its default profile
  • I switched off the Spatial Awareness system, leaving it with its default profile and removing the spatial observer (just in case!)
  • I cloned the DefaultMixedRealityDiagnosticsProfile to make my own and left it as it was.
  • I cloned the Extensions profile to make my own and left it as it was.
  • I left the editor section as it was.

With that in place, I then have all these profiles in my own folder and they feel like they are under my control.

image

At this point, I thought I’d risk pressing “Play” in the editor and I was surprised that I didn’t hear the welcome message that I had built into the app but, instead, spotted a “not implemented exception”.

Speech and Audio, Editor and UWP

I dug into this exception and realised that I had written a class AudioManager which decides whether to play voice clips or not and that class had been built to work only on UWP devices, not in the editor – i.e. it was making use of ApplicationData.Current.LocalSettings so I quickly tried to rewire that in order to use PlayerPrefs instead so that it could work both in editor and on device.

With that done, I got my audible welcome message on pressing play, I could see the framerate counter from the MRTK V2 and I seemed to be able to move around in the editor.

I couldn’t open any files though because I’d also written some more code which was editor specific.

My application uses voice commands but I had a major challenge with voice commands on the MRTK V1 in that they stopped working whenever the application lost/regained focus.

Worst of all this included when the application lost focus to make use of the file dialog so a user of the application was able to use the voice command “Open” to raise the file dialog, thereby breaking the voice commands before their model file had even been chosen.

I wrote about this in the original blog post under the section “Challenge 7”. The upshot is that I removed anything related to MRTK V1 speech or Unity speech from my application and I fell back to purely using SpeechRecognizer from the UWP for my application and that worked out fine but, of course, not in the Unity editor.

I only have 3 speech commands – open, reset, remove and so what I would ideally like to do is to work in the way of MRTK V2 by defining new input actions for these commands along with a profiler command to toggle the profile display as below in my input actions profile;

image

and then I could define some speech commands in my speech settings profile;

image

and then in my class which handles speech commands, I could add a property to map the MixedRealityInputAction (open etc.) to a handler using my own internal class ActionHandler because I don’t think Unity can serialize dictionaries for me;

image

and then configure them to their respective values in the editor…

image

and then I should be able to implement IMixedRealityInputActionHandler to invoke the actions here (rather than directly tie myself to those actions coming from only voice commands);

image

In doing so, I think I also need to register my GameObject as a “global” handler for these commands and so I need to add a call to do;

image

and that seemed to work really, really nicely.

That said, I am still pretty concerned that this isn’t going to work on the device itself reliably across invocations of the file dialog as I see the new WindowsSpeechInputProvider implementation using the KeywordRecognizer and I’m not sure that this type behaves well on the device when the application loses/gains focus.

Consequently, I figured that I would use all of this MRTK V2 infrastructure to deliver speech commands to me in the editor but, on the device, I would like to switch it off and rely on the mechanism that I’d previously built which I know works.

I edited my Input system profile in order to try and remove the WindowsSpeechInputProvider outside of the editor and I disabled the WindowsDicationInputProvider altogether;

image

and I then changed my startup code such that it did different things depending on whether it was in the editor or not;

image

and my own speech handling code is super, super simple and inefficient but I know that it works on a V1 device so I am trying to largely keep intact and here it is below – it essentially keeps creating a SpeechRecognizer (UWP not Unity) and using it for a single recognition before throwing it away and starting again;

#if ENABLE_WINMD_SUPPORT    
    /// <summary>
    /// Why am I using my own speech handling rather than relying on SpeechInputSource and
    /// SpeechInputHandler? I started using those and they worked fine.
    /// However, I found that my speech commands would stop working across invocations of
    /// the file open dialog. They would work *before* and *stop* after.
    /// I spent a lot of time on this and I found that things would *work* under the debugger
    /// but not without it.
    /// That led me to think that this related to suspend/resume and perhaps HoloLens suspends
    /// the app when you move to the file dialog because I notice that dialog running as its
    /// own app on HoloLens.
    /// I tried hard to do work with suspend/resume but I kept hitting problems and so I wrote
    /// my own code where I try quite hard to avoid a single instance of SpeechRecognizer being
    /// used more than once - i.e. I create it, recognise with it & throw it away each time
    /// as this seems to *actually work* better than any other approach I tried.
    /// I also find that SpeechRecognizer.RecognizeAsync can get into a situation where it
    /// returns "Success" and "Rejected" at the same time & once that happens you don't get
    /// any more recognition unless you throw it away and so that's behind my approach.
    /// </summary>
    async void StartSpeechCommandHandlingAsync()
    {
        while (true)
        {            
            var command = await this.SelectSpeechCommandAsync();

            if (command.Action != MixedRealityInputAction.None)
            {
                this.InvokeActionHandler(command.Action);
            }
            else
            {
                // Just being paranoid in case we start spinning around here
                // My expectatation is that this code should never/rarely
                // execute.
                await Task.Delay(250);
            }
        }
    }
    async Task<SpeechCommands> SelectSpeechCommandAsync()
    {
        var registeredCommands = MixedRealityToolkit.InputSystem.InputSystemProfile.SpeechCommandsProfile.SpeechCommands;

        SpeechCommands command = default(SpeechCommands);

        using (var recognizer = new SpeechRecognizer())
        {
            recognizer.Constraints.Add(
                new SpeechRecognitionListConstraint(registeredCommands.Select(c => c.Keyword)));

            await recognizer.CompileConstraintsAsync();

            var result = await recognizer.RecognizeAsync();

            if ((result.Status == SpeechRecognitionResultStatus.Success) &&
                ((result.Confidence == SpeechRecognitionConfidence.Medium) ||
                 (result.Confidence == SpeechRecognitionConfidence.High)))
            {
                command = registeredCommands.FirstOrDefault(c => string.Compare(c.Keyword, result.Text, true) == 0);
            }                    
        }
        return (command);
    }
#endif // ENABLE_WINMD_SUPPORT

I suspect that I’ll be revisiting this code once I try and deploy to a device but, for now, it works in the editor and moves me onto my next little challenge.

I also switched off the frame rate profiler by default in the profile;

image

and implemented my handler to toggle it on/off;

image

Opening File Dialogs

My application has, initially, a single voice command, “Open”, which raises a file dialog in order to open a glTF model.

However, I’d only written the file open code in order to support opening the file dialog on a UWP device. I hadn’t done the work to make it open in the editor and I realised that this needed addressing so I quickly amended the method that I have to add an additional piece of code for the non-UWP platform case;

    async Task<string> PickFileFrom3DObjectsFolderAsync()
    {
        var filePath = string.Empty;

#if ENABLE_WINMD_SUPPORT
        var known3DObjectsFolder = KnownFolders.Objects3D.Path.ToLower().TrimEnd('\\');

        do
        {
            filePath = await FileDialogHelper.PickGLTFFileAsync();

            if (!string.IsNullOrEmpty(filePath) &&
                !filePath.ToLower().StartsWith(known3DObjectsFolder))
            {
                filePath = string.Empty;
                this.AudioManager.PlayClipOnceOnly(AudioClipType.PickFileFrom3DObjectsFolder);
            }
        } while (filePath == string.Empty);
#else
        filePath = EditorUtility.OpenFilePanelWithFilters(
            "Select GLTF File",
            string.Empty,
            new string[] { "GLTF Files", "gltf,glb", "All Files", "*" });
#endif 

        return (filePath);
    }

but I found that even if I could raise the file dialog, I was still getting exceptions opening files…

Loading GLTF Models

The problem that I was hitting was that the GLTFParser was struggling to read the files that I was feeding it and so I decided to take the leap to stop using that code and start using the GLTF code bundled into the MRTK V2.

In the existing code, I make use of a class GLTFSceneImporter to load the one or more files that might make up a GLTF model. In my original blog post I had a few struggles using this in a deterministic way as it’s very coroutine based and I found it hard to be in control of a couple of things;

  • Knowing when it had finished
  • Knowing when it had thrown exceptions

I mentioned these challenges in the original post under the title of “A Small Challenge with Async/Await and CoRoutines” and also “Another Small Challenge with CoRoutines and Unity’s Threading Model”.

At the time, I largely worked around them by writing a base class named ExtendedMonoBehaviour which did some work for me in this regard. It’s in the repo so I won’t call it out in any detail here.

The GLTFSceneImporter delegated the responsibility for actually opening files to an implementation of an interface named ILoader which looks as below;

namespace UnityGLTF.Loader
{
	public interface ILoader
	{
		IEnumerator LoadStream(string relativeFilePath);

		void LoadStreamSync(string jsonFilePath);

		Stream LoadedStream { get; }

		bool HasSyncLoadMethod { get; }
	}
}

This was very useful for me as the user might choose to open a multi-file GLTF file with various separate material files etc. and this is the way in which my code gets to “know” which files have actually been opened. I need this list of files to be able to offer the model over HTTP to other devices that might request it in a shared experience.

In order to use this, I had a class RecordingFileLoader which implemented this ILoader interface and kept track of every file that it successfully opened on behalf of the loader and I passed this around into a couple of places that needed to know about the file list.

Looking at the new MRTK V2 support for GLTF, things seem much improved in that there is a new class GltfUtility which seems to offer an ImportGltfObjectFromPathAsync method. The built-in support for async makes my base class ExtendedMonoBehaviour redundant Smile but it does leave me with the challenge of trying to figure out how to know which files the code has actually loaded a model from.

That method returns a GltfObject and I wrote some code which attempts to work out which files loaded by interrogating the buffers property after it has been populated. I already had this class ImportedModelInfo which wrapped around my RecordingFileLoader and so I modified it to take on this extra functionality;

using Microsoft.MixedReality.Toolkit.Utilities.Gltf.Schema;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using UnityEngine;

public class ImportedModelInfo
{
    public ImportedModelInfo(
        string fullFilePath,
        GltfObject gltfObject)
    {
        // Where were these files loaded from?
        this.BaseDirectoryPath = Path.GetDirectoryName(fullFilePath);

        // What's the name of the file itself?
        this.relativeLoadedFilePaths = new List<string>();
        this.relativeLoadedFilePaths.Add(Path.GetFileName(fullFilePath));

        // Note: At the time of writing, I'm unsure about what the URI property
        // might contain here for buffers and images given that the GLTF spec
        // says that it can be file URIs or data URIs and so what does the GLTF
        // reading code return to me in these cases?

        // I'm expecting Uris like 
        //  foo.bin
        //  subfolder/foo.bin
        //  subfolder/bar/foo.bin

        // and will probably fail if I encounter something other than that.
        var definedUris =
            gltfObject.buffers
                .Where(b => !string.IsNullOrEmpty(b.uri))
                .Select(b => b.uri)
            .Concat(
                gltfObject.images
                    .Where(i => !string.IsNullOrEmpty(i.uri))
                    .Select(i => i.uri));

        this.relativeLoadedFilePaths.AddRange(definedUris);

        this.GameObject = gltfObject.GameObjectReference;
    }
    public string BaseDirectoryPath { get; private set; }
    public IReadOnlyList<string> RelativeLoadedFilePaths => this.relativeLoadedFilePaths.AsReadOnly();
    public GameObject GameObject { get; set; }

    List<string> relativeLoadedFilePaths;
}

with the reworking of one or two other pieces of code that then allowed me to delete my classes RecordingFileLoader and ExtendedMonoBehaviour which felt good Smile

I had to do another slight modification to code which had never been run in the editor before because it was expecting to export world anchors but, other than that, it was ok and I could now load at least one GLTF model in the editor as below;

image

What I couldn’t do was any kind of manipulations on the object so that was perhaps where I needed to look next although I suspect that depends on focus and I also suspect it relies on having a collider which might not be present…

The commit for these pieces is here.

Focus

The earlier code would attach this behaviour;

using HoloToolkit.Unity.InputModule;
using UnityEngine;

public class FocusWatcher : MonoBehaviour, IFocusable
{
    public void OnFocusEnter()
    {
        focusedObject = this.gameObject;
    }
    public void OnFocusExit()
    {
        focusedObject = null;
    }
    public static bool HasFocusedObject => (focusedObject != null);
    public static GameObject FocusedObject => focusedObject;
    static GameObject focusedObject;
}

to the models that had been loaded such that when voice commands like “reset” or “remove” were used, the code could check the HasFocusedObject property, get the FocusedObject value itself and then would typically look for some other component on that GameObject and make a method call on it to reset its position or remove it from the scene.

It’s questionable as to whether this behaviour should be attached to the objects themselves or whether it should just be a global handler for the whole scene but the effect is the same either way.

I need an equivalent in the new MRTK V2 and the natural thing to do would seem to be to reach into the MixedRealityToolkit.InputSystem.FocusProvider and make a call to GetFocusedObject() but that method expects that the caller knows which pointer is in use and I’m not sure that I do.

Instead, I chose to just update the existing class so as to implement IMixedRealityFocusHandler and keep doing what it had been doing before;

using HoloToolkit.Unity.InputModule;
using Microsoft.MixedReality.Toolkit.Input;
using UnityEngine;

public class FocusWatcher : MonoBehaviour, IMixedRealityFocusHandler
{
    public void OnFocusEnter(FocusEventData eventData)
    {
        focusedObject = this.gameObject;
    }
    public void OnFocusExit(FocusEventData eventData)
    {
        focusedObject = null;
    }
    public static bool HasFocusedObject => (focusedObject != null);
    public static GameObject FocusedObject => focusedObject;
    static GameObject focusedObject;
}

but I noticed that I still wasn’t able to interact with the duck – there’s still work to be done Smile

The commit for this stage is here.

Cursor

My class which manipulates the cursor for me was still stubbed out and so I attempted to update that from what it had been;

using HoloToolkit.Unity.InputModule;
using UnityEngine;

public class CursorManager : MonoBehaviour
{
    [SerializeField]
    private ObjectCursor cursor;

    public void Show()
    {
        this.cursor.gameObject.SetActive(true);
    }
    public void Hide()
    {
        this.cursor.gameObject.SetActive(false);
    }
}

to this version;

using Microsoft.MixedReality.Toolkit;
using Microsoft.MixedReality.Toolkit.Input;
using System.Collections.Generic;
using System.Linq;
using UnityEngine;

public class CursorManager : MonoBehaviour
{
    public CursorManager()
    {
        this.hiddenPointers = new List<IMixedRealityPointer>();
    }
    public void Hide()
    {
        // TODO: I need to understand how you are supposed to do this on V2, I just want
        // to switch all cursors off when the user cannot do anything useful with them.
        foreach (var inputSource in MixedRealityToolkit.InputSystem.DetectedInputSources)
        {
            foreach (var pointer in inputSource.Pointers)
            {
                if ((pointer.IsActive) && (pointer.BaseCursor != null))
                {
                    pointer.BaseCursor.SetVisibility(false);
                    this.hiddenPointers.Add(pointer);
                }
            }
        }
        MixedRealityToolkit.InputSystem.GazeProvider.Enabled = false;
    }
    public void Show()
    {
        foreach (var pointer in this.hiddenPointers)
        {
            pointer.BaseCursor.SetVisibility(true);
        }
        this.hiddenPointers.Clear();

        MixedRealityToolkit.InputSystem.GazeProvider.Enabled = true;
    }
    List<IMixedRealityPointer> hiddenPointers;
}

I’m not sure whether this is “right” or not – once again I find myself puzzling a little over all these pointers and cursors and trying to figure which ones I’m meant to interact with but the code feels reasonably “safe” in that it attempts to put back what it did in the first place so, hopefully, I’m not breaking the toolkit with this.

That commit is here.

Manipulations

Up until now, I’ve left the code which attempts to handle manipulations as it was. That is, there is code in the application;

image

which attempts to add TwoHandManipulatable to a model which has been loaded from the disk (rather than one which has been received over the network where I don’t allow local manipulations). That TwoHandManipulatable wants a BoundingBoxPrefab and so you can see that my code here has passed such a thing through to it.

It’s probably not too surprising that this isn’t working as it’s mixing MRTK V1 classes with MRTK V2 in the scene so I wouldn’t really expect it to do anything.

Additionally, I’m not sure from looking at the objects in the editor that there is any type of collider being added by the glTF loading code so I probably need to deal with that too.

I suspect then that I’m going to need to add a few pieces here;

  • A BoxCollider to allow for interactions on the model.
  • ManipulationHandler to allow the model to be moved, rotated, etc.
  • NearInteractionGrabbable so that the manipulations cater for both near and far interactions on a HoloLens 2.
  • BoundingBox to provide some visualisation of the interactions with the model.

Additionally, I think that I’m going to want to be able to have quite a bit of control over the settings of some of the materials etc. on the BoundingBox and some of the axes of control on the other pieces and so it feels like it might be a lot easier to set this all up as a prefab that I can build in the editor and then just pass through to this code.

Previously, when loading a model my code took an approach of something like this;

  • load the GLTF model, giving a new GameObject with a collider already on it
  • create a new object to act as the model’s parent, parenting this object itself off some root parent within the scene
  • position the parent object 3m down the user’s gaze vector, facing the user
  • attach a world anchor to the parent object both for stability but also so it can be exported to other devices
  • add manipulation behaviours to the GLTF model itself so that it can be moved, rotated, scaled underneath its parent which is anchored

I decided to change this slightly for the new toolkit to;

  • load the GLTF model, giving a new GameObject ( M )
  • create a new object ( A ) to act as the anchored parent
  • create a new object to act as the model’s parent ( P ) from a prefab where BoxCollider, ManipulationHandler, NearInteractionGrabbable, BoundingBox are already present and configured on that prefab
  • parent M under P, P under A, A under R
  • add a world anchor to A

and that lets me slip this prefab into the hierarchy like adding an item into a linked-list so as to let the prefab bring a bunch of behaviour with it.

That prefab is as below;

image

and I tweaked a few materials and settings both on the BoundingBox largely based on examples that I looked at in the example scenes from the toolkit;

image

and;

image

Changing the hierarchy of the components that are set up when a model is loaded into the scene had some impact on my scripts which create/access world anchors and on my scripts which tried to watch for object transformations to send/receive over the network and so I had to make a few related changes here to patch that up and pass a few objects to the right place but I’ll keep that detail out of the post.

It also broke my simplistic FocusWatcher class because that class expected that the GameObject which had focus would be the model itself with direct excess to various behaviours that I have added to it whereas, now, that object is buried in a bit of hierarchy and so I got rid of the FocusWatcher altogether at this point and tried to write this method which would hopefully return to me all focused objects which had a particular component within their hierarchy;

    IEnumerable<T> GetFocusedObjectWithChildComponent<T>() where T : MonoBehaviour
    {
        // TODO: I need to figure whether this is the right way to do things. Is it right
        // to get all the active pointers, ask them what is focused & then use that as
        // the list of focused objects?
        var pointers = MixedRealityToolkit.InputSystem.FocusProvider.GetPointers<IMixedRealityPointer>()
            .Where(p => p.IsActive);

        foreach (var pointer in pointers)
        {
            FocusDetails focusDetails;

            if (MixedRealityToolkit.InputSystem.FocusProvider.TryGetFocusDetails(
                pointer, out focusDetails))
            {
                var component = focusDetails.Object?.GetComponentInChildren<T>();

                if (component != null)
                {
                    yield return component;
                }
            }
        }
    }

whether this is a good thing to do or not, I’m not yet sure but for my app it’s only called on a couple of voice commands so it shouldn’t be executing very frequently.

I tried this out in the editor and I seemed to be at a place where I could open GTLF models and use near and far interactions to transform them as below;

image

the commit for this stage is here.

Removing the MRTK V1

At this point, I felt like I was done with the MRTK V1 apart from the ProgressRingIndicator which I am still using so I need to preserve it in my project for now.

I made a new folder named TookitV1 and I moved across the Progress related pieces which appeared to be;

  • Animations – the contents of the Progress folder
  • Fonts – I copied all of these
  • Materials – I copied only ButtonIconMaterial here
  • Prefabs – the contents of the Progress folder
  • Scripts – the contents of the Progress folder

I did a quick commit and then deleted the HoloToolkit folder and I also deleted the UnityGLTF folder as I should, at this point, not be using anything from those 2 places.

At this point, the ProgressIndicator blew up compiling and told me that it was missing the HoloToolkit.Unity namespace (easily fixed) and that it wanted to derive from Singleton<T> but I found that easy enough to fix by just changing the base class to MonoBehaviour and adding a static Instance property which was set to the first instance which spun up in the application.

I still had problems though in that I had a couple of missing scripts in the prefab for the ProgressIndicator and I tried to replicate what had been there previously with the SolverHandler and Orbital as below

image

and I had to patch a couple of materials but, other than that, the MRTK V1 was gone and the app seemed to continue to function in the editor.

The commit is here.

Removing MRTK V1 and UnityGLTF as Submodules

I had previously included the MRTK V1 and UnityGLTF github repos as submodules of my repo and I no longer need them so removing them would make the repo a lot cleaner.

Additionally, I had a setup.bat script which attempted to move a lot of files around, do some preliminary building of Unity GLTF etc. and I no longer need that either.

I should be in a state on this branch where the project can “simply” be pulled from github and built.

With that in mind, I attempted to remove both of those submodules following the procedure described here as I’ve done this once or twice but I can never remember how you’re meant to do it.

I also removed the setup.bat and altered the readme.md.

Now, usually, when I do so many things at once some thing goes wrong so the next step was to…

Make a Clean Folder, Clone the Repo, Fix Problems

I cloned the repo again recursively into a new, clean folder with git clone –recursive https://github.com/mtaulty/GLTF-Model-Viewer and then switched to the V2WorkBlogPost and I noticed that git struggled to remove the MixedRealityToolkit-Unity and the UnityGLTF folders which had been created/populated as part of bringing down the recursive repo so I got rid of them manually (I’ll admit that the finer details of submodules are a bit of a mystery to me).

I reopened that project in Unity and, remarkably, all seemed to be fine – the project ran fine in the editor once I’d switched platforms & I didn’t seem to have missed files from my commits.

The commit is here.

Deploying to a Device

At this point, it felt like it was time to build for a device and see how the application was running as I find that there are often pieces of functionality that work ok in the editor but fail on a device.

I only have a HoloLens 1 device with me at the time of writing and so I used HoloLens 1, I can’t try on HoloLens 2 right now.

In trying to build for the device I hit an immediate failure;

“IOException: Sharing violation on path C:\Data\temp\blogpost\GLTF-Model-Viewer\GLTFModelViewer\Temp\StagingArea\Data\Managed\tempStrip\UnityEngine.AudioModule.dll”

but I see this quite frequently with Unity at the moment and so did a quick re-start (and shutdown Visual Studio) but then I got hit with the;

“Copying assembly from ‘Temp/Unity.TextMeshPro.dll’ to ‘Library/ScriptAssemblies/Unity.TextMeshPro.dll’ failed”

which is another transient error I see quite a lot so I did some more restarts (of both Unity and the Unity Hub) and managed to produce a successful VS build which seemed to deploy ok and run fine;

image

In deploying to the device, I also did some basic tests of the multi-user network sharing functionality which also seemed to be working fine.

Other Rework – Mixed Reality Extension Services

There are a few places in this code base where I make use of “services” which are really “global” across the project. As examples;

  • I have a class StorageFolderWebServer which, in a limited way, takes a UWP StorageFolder and makes some of its content available over HTTP via HttpListener
  • I have a NetworkMessageProvider which facilitates the shared experience by multicasting and receiving New Model, Transformed Model, Deleted Model messages around the local network.
    • This sits on top of a MessageService which simply knows how to Send/Receive messages having initially joined a multicast group.
  • I have a MessageDialogHelper which shows message boxes without blowing up the Unity/UWP threads.
  • I have a FileDialogHelper which shows a file dialog without blowing up the Unity/UWP threads.

Most of these are probably static classes but I feel that they are really providing services which may/not have some configurable element to them and which other pieces of code just need to look up somewhere in a registry and make use of thereby allowing them to be replaced at some point in the future.

As the MRTK V2 provides a form of service registry via the means of “extensions” to the toolkit, I thought it would make sense to try that out and see if I could refactor some code to work that way.

By way of example, I started with my MessageService class and extracted an interface from it deriving it from IMixedRealityExtensionService;

using Microsoft.MixedReality.Toolkit;
using System;

namespace MulticastMessaging
{
    public interface IMessageService : IMixedRealityExtensionService
    {
        MessageRegistrar MessageRegistrar { get; set; }
        void Close();
        void Open();
        void Send<T>(T message, Action<bool> callback = null) where T : Message;
    }
}

and then I defined a profile class for my service with the sorts of properties that I might want to set on it;

using Microsoft.MixedReality.Toolkit;
using UnityEngine;

namespace MulticastMessaging
{

    [CreateAssetMenu(
        menuName = "Mixed Reality Toolkit/Message Service Profile",
        fileName = "MessageServiceProfile")]
    [MixedRealityServiceProfile(typeof(MessageService))]
    public class MessageServiceProfile : BaseMixedRealityProfile
    {
        [SerializeField]
        [Tooltip("The address to use for multicast messaging")]
        public string multicastAddress = "239.0.0.0";

        [SerializeField]
        [Tooltip("The port to use for multicast messaging")]
        public int multicastPort = 49152;
    }
}

and then implemented that on my MessageService class deriving that from BaseExtensionService and marking it with a MixedRealityExtensionService attribute as you see below;

namespace MulticastMessaging
{
    using Microsoft.MixedReality.Toolkit;
    using Microsoft.MixedReality.Toolkit.Utilities;
    using System;
    using System.Diagnostics;
    using System.IO;
    using System.Net;
    using System.Net.Sockets;

    [MixedRealityExtensionService(SupportedPlatforms.WindowsUniversal | SupportedPlatforms.WindowsEditor)]
    public class MessageService : BaseExtensionService, IMessageService
    {
        // Note: 239.0.0.0 is the start of the UDP multicast addresses reserved for
        // private use.
        // Note: 49152 is the result I get out of executing;
        //      netsh int ipv4 show dynamicport udp
        // on Windows 10.
        public MessageService(
            IMixedRealityServiceRegistrar registrar,
            string name,
            uint priority,
            BaseMixedRealityProfile profile) : base(registrar, name, priority, profile)
        {

        }
        MessageServiceProfile Profile => base.ConfigurationProfile as MessageServiceProfile;

Clearly, that’s not the whole code but note the use of the MixedRealityExtensionService attribute and also the reach into the base class to get the ConfigurationProfile and cast it up as the concrete type of my actual profile.

With that in place, I can now use the editor to create one of those profiles;

image

and then I can add my new service to extensions of the toolkit;

image

and then change my code to grab hold of the instance via

MixedRealityToolkit.Instance.GetService<IMessageService>();

whenever I want to get hold of the instance of that service.

In this branch, I only added two services this way – my networking provider and my messaging service but in my V2Work branch I’ve made more of these services and plan to rework a few more pieces in this way;

image

The commit at this point is here.

Wrapping Up

I wanted to go around the loop again on this exercise partly to make my own notes around things that I have perhaps forgotten and partly in case there were some pieces that others might pick up on and share.

I’m not planning to take this V2WorkBlogPost branch any further or release anything from it because I’ve already done the port in my V2Work branch and I want to move that forward and, ultimately, merge back into master from there but I did learn a few things by repeating the exercise, namely;

  1. I can do a better job of making speech work in the editor and at runtime.
  2. I should make more extension services for some of the other pieces of my app.
  3. I did a better job of leaving the MRTK V1 in the code base until I really no longer needed it whereas first time around I removed it too early and got in a bit of a mess Smile
  4. I realised that more of the app functionality needs to work in the editor and I can improve that but there’s still a way to go as I haven’t made attempts to have all of it work in the editor.

I hope that there was something useful in here for readers (if any get this far to the end of the post) and good luck in porting your own apps across to MRTK V2 Smile

Rough Notes on Experiments with UWP APIs in the Unity Editor with C++/WinRT

This post is a bunch of rough notes around a discussion that I’ve been having with myself around working with UWP code in Unity when building mixed reality applications for HoloLens. To date, I’ve generally written the code which calls UWP APIs in .NET code and followed the usual practice around it but in recent times I’ve seen folks doing more around implementing their UWP API calls in native code and so I wanted to experiment with that a little myself.

These notes are very rough so please apply a pinch of salt as I may well have got things wrong (it happens frequently) and I’m really just writing down some experiments rather than drawing a particular conclusion.

With those caveats in place…

Background – .NET and .NET Native

I remember coming to .NET in around 2000/2001.

At the time, I had been working as a C/C++ developer for around 10 years and I was deeply sceptical of .NET and the idea of C# as a new programming language and that I might end up running code that was Just-In-Time compiled.

That said, I was also coming off the back of 10 years of shipping code in C/C++ and the various problems around crashes, hangs, leaks, heap fragmentation, mismatched header files, etc. etc. etc. that afflicted the productivity of C/C++.

So, I was sceptical on the one hand but open to new ideas on the other and, over time, my C++ (more than my C) became rusty as I transitioned to C# and the CLR and its CIL.

There’s a bunch of advantages that come to having binaries made up of a ‘Common Intermediate Language’ underpinned by the CLR rather than native code. Off the top of my head, that might include things like;

  • Re-use of components and tooling across programming languages.
  • CIL was typically a lot smaller than native code representation of the same functionality.
  • One binary could support (and potentially optimize for) any end processor architecture by being compiled just-in-time on the device in question rather than ahead-of-time which requires per-architecture binaries and potentially per-processor variant optimisations.

and there are, no doubt, many more.

Like many things, there are also downsides with one being the potential impact on start-up times and, potentially, memory usage (and the ability to share code across processes) as CIL code is loaded for the first time and the methods JITted to a specific process’ memory in order to have runnable code on the target machine.

Consequently, for the longest time there have been a number of attempts to overcome that JITting overhead by doing ahead of time compilation including the fairly early NGEN tool (which brought with it some of its own challenges) and, ultimately, the development of the .NET Native set of technologies.

.NET Native and the UWP Developer

.NET Native had a big impact on the developer targeting the Universal Windows Platform (UWP) because all applications delivered from the Windows store are ultimately built with the .NET native tool-chain and so developers need to build and test with that tool-chain before submitting their app to the Store.

Developers who had got used to the speed with which a .NET application could be built, run and debugged inside of Visual Studio soon learned that building with .NET Native could introduce a longer build time and also that there were rare occasions where the native code didn’t match the .NET code and so one tool-chain could have bugs that another did not exhibit. That could also happen because of the .NET native compiler’s feature to remove ‘unused’ code/metadata which can have an impact on code – e.g. where reflection is involved.

However, here in 2019 those issues are few and far between & .NET Native is just “accepted” as the tool-chain that’s ultimately used to build a developer’s app when it goes to the Windows Store.

I don’t think that developers’ workload has been affected hugely because I suspect that most UWP developers probably still follow the Visual Studio project structure and use the Debug configuration (.NET compiler) to do their builds during development making use of the regular, JITted .NET compiler and reserve the Release configuration (.NET Native compiler) for their final testing. Either way, your code is being compiled by a Microsoft compiler to CIL and by a Microsoft compiler from CIL to x86/x64/ARM.

It’s worth remembering that whether you write C# code or C++ code the debugger is always doing a nice piece of work to translate between the actual code that runs on the processor and the source that you (or someone else) wrote and want to step through getting stack frames, variable evaluation etc. The compiler/linker/debugger work together to make sure that via symbols (or program databases (PDBs)) this process works so seamlessly that, at times, it’s easy to forget how complicated a process it is and we take it for granted across both ‘regular .NET’ and ‘.NET Native’.

So, this workflow is well baked and understood and, personally, I’d got pretty used to it as a .NET/UWP developer and it didn’t really change whether developing for PC or other devices like HoloLens with the possible exception that deployment/debugging is naturally going to take a little more time on a mobile-powered device than on a huge PC.

Unity and the UWP

But then I came to Unity 😊

In Unity, things initially seem the same for a UWP developer. You write your .NET code in the editor, the editor compiles it “on the fly” as you save those code changes and then you can run and debug that code in the editor.

As an aside, the fact that you can attach the .NET debugger to the Unity Editor is (to me) always technically impressive and a huge productivity gain.

When you want to build and deploy, you press the right keystrokes and Unity generates a C# project for you with/without all your C# code in it (based on the “C# Projects” setting) and you are then back into the regular world of UWP development. You have some C# code, you have your debugger and you can build debug (.NET) or release (.NET Native) just like any other UWP app written with .NET.

Unity and .NET Scripting/IL2CPP

That’s true if you’re using the “.NET Scripting backend” in Unity. However, that backend is deprecated as stated in the article that I just linked to and so, really, a modern developer should be using the IL2CPP backend.

That deprecation has implications. For example, if you want to move to using types from .NET Standard 2.0 in your app then you’ll find that Unity’s support for .NET Standard 2.0 lives only in the IL2CPP backend and hasn’t been implemented in the .NET Scripting backend (because it’s deprecated).

2018.2.16f1, UWP, .NET Scripting Backend, .NET Standard 2.0 Build Errors

With the IL2CPP backend, life in the editor continues as before. Unity builds your .NET code, you attach your .NET debugger and you can step through your code. Again, very productive.

However, life outside of the editor changes in that any code compiled to CIL (i.e. scripts plus dependencies) is translated into C++ code by the compiler. The process of how this works is documented here and I think it’s well worth 5m of your time to read through that documentation if you haven’t already.

This has an impact on build times although I’ve found that if you carefully follow the recommendations that Unity makes on this here then you can get some cycles back but it’s still a longer process than it was previously.

Naturally, when Unity now builds what drops out is not a C#/.NET Visual Studio project but, instead, a C++ Visual Studio project. You can then choose the processor architecture and debug/release etc. but you’re compiling C++ code into native code and that C++ represents all the things you wrote along with translations of lots of things that you didn’t write (e.g. lists, dictionaries, etc. etc.). Those compilations times, again, can get a bit long and you get used to watching the C++ compile churn its way through implementations of things like generics, synchronisation primitives, etc.

Just as with .NET Native, Unity’s C#->C++ translation has the advantage of stripping out things which aren’t used which can impact technologies like reflection and, just like .NET Native, Unity has a way of dealing with that as detailed here.

When it comes to debugging that code, you have two choices. You can either;

  • Debug it at the C# level.
  • Debug the generated C++.
  • Ok, ok, if you’re hardcore you can just debug the assembly but I’ll assume you don’t want to be doing this all the time (although I’ll admit that I did single step some pieces while trying to fix things for this post but it’s more by necessity than choice).

C# debugging involves setting the “Development Build” and “Script Debugging” options as described here and you essentially run up the app on the target device with this debugging support switched on and then ask the Unity debugger to attach itself to that app similarly to the way in which you ask the Unity debugger to attach to the editor. Because this is done over the network, you also have to ensure that you set certain capabilities in your UWP app manifest (InternetClient, InternetClientServer, PrivateNetworkClientServer).

For the UWP/HoloLens developer, this isn’t without its challenges at the time of writing and I mentioned some of those challenges in this post;

A Simple glTF Viewer for HoloLens

and my friend Joost just wrote a long post about how to get this working;

Debugging C# code with Unity IL2CPP projects running on HoloLens or immersive headsets

and that includes screenshots and provides a great guide. I certainly struggled to get this working when I tried it for the first time around as you can see from the forum thread I started below;

Unity 2018.2.16f1, UWP, IL2CPP, HoloLens RS5 and Managed Debugging Problems.

so a guide is very timely and welcome.

As was the case with .NET Native, of course it’s possible that the code generated by IL2CPP differs in its behavior from the .NET code that now runs inside the editor and so it’s possible to get into “IL2CPP bugs” which can seriously impact your productivity.

C# debugging kind of feels a little “weird” at this point as you stare into the internals of the sausage machine. The process makes it very obvious that what you are debugging is code compiled from a C++ project but you point a debugger at it and step through as though it was a direct compilation of your C# code. It just feels a little odd to me although I think it’s mainly perception as I have long since got over the same feeling around .NET Native and it’s a very similar situation.

Clearly, Unity are doing the right thing with making the symbols line up here which is clever in itself but I feel like there are visible signs of the work going on when it comes to performance of debugging and also some of the capabilities (e.g. variable evaluation etc). However, it works and that’s the main thing 😊

In these situations I’ve often found myself with 2 instances of Visual Studio with one debugging the C# code using the Unity debugger support while the other attached as a native debugger to see if I catch exceptions etc. in the real code. It’s a change to the workflow but it’s do-able.

IL2CPP and the UWP Developer

That said, there’s still a bit of an elephant in the room here in that for the UWP developer there’s an additional challenge to throw into this mix in that the Unity editor always uses Mono which means that it doesn’t understand calls to the UWP API set (or WinRT APIs if you prefer) as described here.

This means that it’s likely that a UWP developer (making UWP API calls) takes more pain here than the average Unity developer as to execute the “UWP specific” parts of their code they need to set aside the editor, hit build to turn .NET into C++ and then hit build in Visual Studio to build that C++ code and then they might need to deploy to a device before being able to debug (either the generated C++ or the original .NET code) that calls into the UWP.

The usual pattern for working with UWP code is detailed on this doc page and involves taking code like that below;

which causes the Unity editor some serious concern because it doesn’t understand Windows.* namespaces;

And so we have to take steps to keep this code away from the editor;

And then this will “work” both in the editor and if we compile it out for UWP through the 2-stage compilation process. Note that the use of MessageDialog here is just an example and probably not a great one because there’s no doubt some built-in support in Unity for displaying a dialog without having to resort to a UWP API.

Calling UWP APIs from the Editor

I’ve been thinking about this situation a lot lately and, again, with sympathy for the level of complexity of what’s going on inside that Unity editor – it does some amazing things in making all of this work cross-platform.

I’d assume that trying to bring WinRT/UWP code directly into that Editor environment is a “tall-order” and I think it stems from the editor running on Mono and there not being underlying support there for COM interop although I could be wrong. Either way, part of me understands why the editor can’t run my UWP code.

On the other hand, the UWP APIs aren’t .NET APIs. They are native code APIs in Windows itself and the Unity editor can happily load up native plugins and execute custom native code and so there’s a part of me wonders whether the editor couldn’t get closer to letting me call UWP APIs.

When I first came to look at this a few years ago, I figured that I might be able to “work around it” by trying to “hide” my UWP code inside some .NET assembly and then try to add that assembly to Unity as a plugin but the docs say that managed plugins can’t consume Windows Runtime APIs.

As far as I know, you can’t have a plugin which is;

  • a WinRT component implemented in .NET or in C++.
  • a .NET component that references WinRT APIs or components.

But you can have a native plugin which makes calls out to WinRT APIs so what does it look like to go down that road?

Unity calling Native code calling UWP APIs

I wondered whether this might be a viable option for a .NET developer given the (fairly) recent arrival of C++/WinRT which seems to make programming the UWP APIs much more accessible than it was in the earlier worlds of WRL and/or C++/CX.

To experiment with that, I continued my earlier example and made a new project in Visual C++ as a plain old “Windows Desktop DLL”.

NB: Much later in this post, I will regret thinking that a “plain old Windows Desktop DLL” is all I’m going to need here but, for a while, I thought I would get away with it.

To that project, I can add includes for C++/WinRT to my stdafx.h as described here;


And I can alter my link options to link with WindowsApp.lib;

And then I can maybe write a little function that’s exported from my DLL;

And the implementation there is C++/WinRT – note that I just use a Uri by declaring one rather than leaping through some weird ceremony to make use of it.

If I drag the DLL that I’ve built into Unity as a plugin then my hope is that I can tell Unity to use the 64-bit version purely for the editor and the 32-bit version purely for the UWP player;

I can then P/Invoke from my Unity script into that exported DLL function as below;

And then I can attach my 2 debuggers to Unity and debug both the managed code and the native code and I’m making calls into the UWP! from the editor and life is good & I don’t have to go through a long build cycle.

Here’s my managed debugger attached to the Unity editor;

And here’s the call being returned from the native debugger also attached to the Unity editor;

and it’s all good.

Now, if only life were quite so simple 😊

Can I do that for every UWP API?

It doesn’t take much to break this in that (e.g.) if I go back to my original example of displaying a message box then it’s not too hard to add an additional header file;

And then I can write some exported function that uses MessageDialog;

and I can import it and call it from a script in Unity;

but it doesn’t work. I get a nasty exception here and I think that’s because I chose MessageDialog as my API to try out here and MessageDialog relies on a CoreWindow and I don’t think I have one in the Unity editor. Choosing a windowing API was probably a bad idea but it’s a good illustration that I’m not likely to magically just get everything working here.

There’s commentary in this blog post around challenges with APIs that depend on a CoreWindow.

What about Package Identity?

What about some other APIs. How about this? If I add the include for Windows.Storage.h;

And then add an exported function (I added a DuplicateString function to take that pain away) to get the name of the local application data folder;

and then interop to it from Unity script;

and then this blows up;

Now, this didn’t exactly surprise me. In fact, the whole reason for calling that API was to cause this problem as I knew it was coming as part of that “UWP context” includes having a package identity and Unity (as a desktop app) doesn’t have a package identity and so it’s not really fair to ask for the app data folder when the application doesn’t have one.

There’s a docs page here about this notion of APIs requiring package identity.

Can the Unity editor have a package identity?

I wondered whether there might be some way to give Unity an identity such that these API calls might work in the editor? I could think of 2 ways.

  1. Package Unity as a UWP application using the desktop bridge technologies.
  2. Somehow ‘fake’ an identity such that from the perspective of the UWP APIs the Unity editor seems to have a package identity.

I didn’t really want to attempt to package up Unity and so I thought I’d try (2) and ended up having to ask around and came up with a form of a hack although I don’t know how far I can go with it.

Via the Invoke-CommandInDesktopPackage PowerShell command it seems it’s possible to execute an application in the “context” or another desktop bridge application.

So, I went ahead and made a new, blank WPF project and then I used the Visual Studio Packaging Project to package it as a UWP application using the bridge and that means that it had “FullTrust” as a capability and I also gave it “broadFileSystemAccess” (just in case).

I built an app package from this and installed it onto my system and then I experimented with running Unity within that app’s context as seen below – Unity here has been invoked inside the package identity of my fake WPF desktop bridge app;

I don’t really know to what extent this might break Unity but, so far, it seems to survive ok and work but I haven’t exactly pushed it.

With Unity running in this UWP context, does my code run any better than before?

Well, firstly, I noticed that Unity no longer seemed to like loading my interop DLL. I tried to narrow this down and haven’t figured it out yet but I found that;

  1. First time, Unity wouldn’t find my interop DLL.
  2. I changed the name to something invalid, forcing Unity to look for that and fail.
  3. I changed the name back to the original name, Unity found it.

I’m unsure on the exact thing that’s going wrong there so I need to return to that but I can still get Unity to load my DLL, I just have to play with the script a little first. But, yes, with a little bit of convincing I can get Unity to make that call;


And what didn’t work without an identity now works when I have one so that’s nice!

The next, natural thing to do might be to read/write some data from/to a file. I thought I’d try a read and to do that I used the co_await syntax to do the async pieces and then used the .get() method to ultimately make it a synchronous process as I wasn’t quite ready to think about calling back across the PInvoke boundary.

And that causes a problem depending on how you invoke it. If I invoke it as below;


Then I get an assertion from somewhere in the C++/WinRT headers telling me (I think) that I have called the get() method on an STA thread. I probably shouldn’t call this method directly from my own thread anyway because the way in which I have written it (with the .get()) call blocks the calling thread so regardless of STA/MTA it’s perhaps a bad idea.

However, if I ignore that assertion, the call does seem to actually work and I get the contents of the file back into the Unity editor as below;

But I suspect that I’m not really meant to ignore the assertion and so I can switch the call to something like;

and the assertion goes away and I can read the file contents 😊

It’s worth stating at this point that I’ve not even thought about how I might try to actually pass some notion of an async operation across the PInvoke boundary here, that needs more thought on my part.

Ok, Call some more APIs…

So far, I’ve called dialog.show() and file.read() so I felt like I should try a longer piece of code with a few more API calls in it.

I’ve written a few pieces of code in the past which try to do face detection on frames coming from the camera and I wondered whether I might be able to reproduce that here – maybe write a method which runs until it detects a face in the frames coming from the camera?

I scribbled out some rough code in my DLL;

// Sorry, this shouldn't really be one massive function...
IAsyncOperation<int> InternalFindFaceInDefaultCameraAsync()
{
	auto facesFound(0);

	auto devices = co_await DeviceInformation::FindAllAsync(DeviceClass::VideoCapture);

	if (devices.Size())
	{
		DeviceInformation deviceInfo(nullptr);

		// We could do better here around choosing a device, we just take
		// the front one or the first one.
		for (auto const& device : devices)
		{
			if ((device.EnclosureLocation().Panel() == Panel::Front))
			{
				deviceInfo = device;
				break;
			}
		}
		if ((deviceInfo == nullptr) && devices.Size())
		{
			deviceInfo = *devices.First();
		}
		if (deviceInfo != nullptr)
		{
			MediaCaptureInitializationSettings initSettings;
			initSettings.StreamingCaptureMode(StreamingCaptureMode::Video);
			initSettings.VideoDeviceId(deviceInfo.Id());
			initSettings.MemoryPreference(MediaCaptureMemoryPreference::Cpu);

			MediaCapture capture;
			co_await capture.InitializeAsync(initSettings);

			auto faceDetector = co_await FaceDetector::CreateAsync();
			auto faceDetectorFormat = FaceDetector::GetSupportedBitmapPixelFormats().GetAt(0);

			// We could do better here, we will just take the first frame source and
			// we assume that there will be at least one. 
			auto frameSource = (*capture.FrameSources().First()).Value();
			auto frameReader = co_await capture.CreateFrameReaderAsync(frameSource);

			winrt::slim_mutex mutex;

			handle signal{ CreateEvent(nullptr, true, false, nullptr) };
			auto realSignal = signal.get();

			frameReader.FrameArrived(
				[&mutex, faceDetector, &facesFound, faceDetectorFormat, realSignal]
			(IMediaFrameReader reader, MediaFrameArrivedEventArgs args) -> IAsyncAction
			{
				// Not sure I need this?
				if (mutex.try_lock())
				{
					auto frame = reader.TryAcquireLatestFrame();

					if (frame != nullptr)
					{
						auto bitmap = frame.VideoMediaFrame().SoftwareBitmap();

						if (bitmap != nullptr)
						{
							if (!FaceDetector::IsBitmapPixelFormatSupported(bitmap.BitmapPixelFormat()))
							{
								bitmap = SoftwareBitmap::Convert(bitmap, faceDetectorFormat);
							}
							auto faceResults = co_await faceDetector.DetectFacesAsync(bitmap);

							if (faceResults.Size())
							{
								// We are done, we found a face.
								facesFound = faceResults.Size();
								SetEvent(realSignal);
							}
						}
					}
					mutex.unlock();
				}
			}
			);
			co_await frameReader.StartAsync();

			co_await resume_on_signal(signal.get());

			// Q - do I need to remove the event handler or will the destructor do the
			// right thing for me?
			co_await frameReader.StopAsync();
		}
	}
	return(facesFound);
}

That code is very rough and ready but with an export from the DLL that looks like this;

	__declspec(dllexport) int FindFaceInDefaultCamera()
	{
		int faceCount = InternalFindFaceInDefaultCameraAsync().get();

		return(faceCount);
	}

then I found that I can call it from the editor and, sure enough, the camera lights up on the machine and the code returns that it has detected my face from the camera so that’s using a few UWP classes together to produce a result.

So, I can call into basic APIs (e.g. Uri), I can call into APIs that require package identity (e.g. StorageFile) and I can put together slightly more complex scenarios involving cameras, bitmaps, face detection etc.

It feels like I might largely be able to take this approach to writing some of my UWP code in C++/WinRT and have the same code run both inside of the editor and on the device and debug it in both places and not have to go around longer build times while working it up in the editor.

Back to the device…

I spent a few hours in the Unity editor playing around to get to this point in the post and then I went, built and deployed my code to an actual device and it did not work. Heartbreak 😉

I was getting failures to load my DLL on the device and I quickly put them down to my DLL having dependencies on VC runtime DLLs that didn’t seem to be present. I spent a little bit of time doing a blow-by-blow comparison on the build settings of a ‘UWP DLL’ versus a ‘Windows DLL’ but, in the end, decided I could just build my code once in the context of each.

So, I changed my C++ project such that it contained the original “Windows Desktop DLL” along with a “UWP DLL” and the source code is shared between the two as below;

With that in place, I use the 64-bit “Windows Desktop DLL” in the editor and the 32-bit “UWP DLL” on the device (the ‘player’) and that seems to sort things out for me. Note that both projects build a DLL named NativePlugin.dll.

That said, I’d really wanted to avoid this step and thought I was going to get away with it but I fell at the last hurdle but I’d like to revisit and see if I can take away the ‘double build’ but someone will no doubt tell me what’s going on there.

Wrapping Up

As I said at the start of the post, this is just some rough notes but in making calls out to the few APIs that I’ve tried here I’m left feeling that the next time I have to write some Unity/UWP specific code I might try it out in C++/WinRT first with this PInvoke method that I’ve done here & see how it shapes up as the productivity gain of being able to press ‘Play’ in the editor is huge. Naturally, if that then leads to problems that I haven’t encountered in this post then I can flip back, translate the code back to C# and use the regular “conditional compilation” mechanism.

Code

I’m conscious that I pasted quite a lot of code into this post as bitmaps and that’s not very helpful so I just packaged up my projects onto github over here.

Inside of the code, 2 of the scenarios from this post are included – the code for running facial detection on frames from the camera and the code which writes a file into the UWP app’s local data folder.

I’ve tried that code both in the Unity editor and on a HoloLens device & it seems to work fine in both places.

All mistakes are mine, feel free to feed back and tell me what I’ve done wrong! 🙂

Baby Steps with the Azure Spatial Anchors Service

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens or Azure Mixed Reality other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

One of the many, many strands of the exciting, recent announcements around Mixed Reality (see the video here) was the announcement of a set of Azure Mixed Reality Services.

You can find the home page for these services on the web here and they encompass;

  • Azure Spatial Anchors
  • Azure Remote Rendering

Both of these are, to my mind, vital foundational services that Mixed Reality application builders have needed for quite some time so it’s great to see them surface at Azure.

At the time of writing, the Azure Remote Rendering service is in a private preview so I’m not looking at that right now but the Azure Spatial Anchors service is in a public preview and I wanted to experiment with it a little and thought I would write up some notes here as I went along.

Before I do that though…

Stop – Read the Official Docs

There’s nothing that I’m going to say in this post that isn’t covered by the official docs so I’d recommend that you read those before reading anything here and I’m providing some pointers below;

  1. Check out the overview page here if you’re not familiar with spatial anchors.
  2. Have a look at the Quick Start here to see how you can quickly get started in creating a service & making use of it from Unity.
  3. Check out the samples here so that you can quickly get up and running rather than fumbling through adding library references etc. (note that the Quick Start will lead you to the samples anyway).

With that said, here’s some rough notes that I made while getting going with the Azure Spatial Anchors service from scratch.

Please keep in mind that this service is new to me so I’m really writing up my experiments & I may well make some mistakes.

A Spatial WHAT?!

If you’re not coming from a HoloLens background or from some other type of device background where you’re doing MR/AR and creating ‘spatial anchors’ then you might wonder what these things are.

To my mind, it’s a simple concept that’s no doubt fiendishly difficult to implement. Here’s my best attempt;

A spatial anchor is a BLOB of data providing a durable representation of a 3D point and orientation in a space.

That’s how I think of it. You might have other definitions. These BLOBs of data usually involve recognising ‘feature points’ that are captured from various camera frames taken from different poses in a space.

If you’re interested in more of the mechanics of this, I found this video from Apple’s 2018 WWDC conference to be one of the better references that I’ve seen;

Understanding ARKit Tracking and Detection

So, a ‘spatial anchor’ is a BLOB of data that allows a device to capture a 3D point and orientation in space & potentially to identify that point again in the future (often known as ‘re-localising the anchor’). It’s key to note that devices and spaces aren’t perfect and so it’s always possible that a stored anchor can’t be brought back to life at some future date.

I find it useful sometimes to make a human analogy around spatial anchors. I can easily make a ‘spatial anchor’ to give to a human being and it might contain very imprecise notions of positioning in space which can nonetheless yield accurate results.

As an example, I could give this description of a ‘spatial anchor’ to someone;

Place this bottle 2cm in on each side from the corner of the red table which is nearest to the window. Lay the bottle down pointing away from the window.

You can imagine being able to walk into a room with a red table & a window and position the bottle fairly accurately based on that.

You can also imagine that this might work in many rooms with windows and red tables & that humans might even adapt and put the bottle onto a purple table if there wasn’t a red one.

Equally, you can imagine finding yourself in a room with no table and saying “sorry, I can’t figure this out”.

I think it’s worth saying that having this set of ‘instructions’ does not tell the person how to find the room nor whether they are in the right room, that is outside of the scope and the same is true for spatial anchors – you have to be vaguely in the right place to start with or use some other mechanism (e.g. GPS, beacons, markers, etc) to get to that place before trying to re-localise the anchor.

Why Anchor?

Having been involved in building applications for HoloLens for a little while now, I’ve become very used to the ideas of applying anchors and, to my mind, there are 3 main reasons why you would apply an anchor to a point/object and the docs are very good on this;

  • For stability of a hologram or a group of holograms that are positioned near to an anchor.
    • This is essentially about preserving the relationship between a hologram and a real point in the world as the device alters its impression of the structure of the space around it. As humans, we expect a hologram placed on the edge of a table to stay on the edge of that table even if a device is constantly refining its idea of the mesh that makes up that table and the rest of the space around it.
  • For persistence.
    • One of the magical aspects of mixed reality enabled by spatial anchors is the ability for a device to remember the positions of holograms in a space. The HoloLens can put the hologram back on the edge of the table potentially weeks or months after it was originally placed there.
  • For sharing.
    • The second magical aspect of mixed reality enabled by spatial anchors is the ability for a device to read a spatial anchor created by another device in a space and thereby construct a transform from the co-ordinate system of the first device to that of the second. This forms the basis for those magical shared holographic experiences.

Can’t I Already Anchor?

At this point, it’s key to note that for HoloLens developers the notion of ‘spatial anchors’ isn’t new. The platform has supported anchors since day 1 and they work really well.

Specifically, if you’re working in Unity then you can fairly easily do the following;

  • Add the WorldAnchor component to your GameObject in order to apply a spatial anchor to that component.
    • It’s fairly common to use an empty GameObject which then acts as a parent to a number of other game objects.
    • The isLocated property is fairly key here as is the OnTrackingChanged event and note also that there is an opaque form of reference to the  underlying BLOB via GetNativeSpatialAnchorPtr and SetNativeSpatialAnchorPtr.
  • Use the WorldAnchorStore class in order to maintain a persistent set of anchors on a device indexed by a simple string identifier.
  • Use the WorldAnchorTransferBatch class in order to;
    • Export the blob representing the anchor
    • Import a blob representing an anchor that has previously been exported

With this set of tools you can quite happily build HoloLens applications that;

  • Anchor holograms for stability.
  • Persist anchors over time such that holograms can be recreated in their original locations.
  • Share anchors between devices such that they can agree on a common co-ordinate system and present shared holographic experiences.

and, of course, you can do this using whatever transfer or networking techniques you like including, naturally, passing these anchors through the cloud via means such as Azure Blob Storage or ASP.NET SignalR or whatever you want. It’s all up for grabs and has been for the past 3 years or so.

Why A New Spatial Anchor Service?

With all that said, why would you look to the new Azure Spatial Anchor service if you already have the ability to create anchors and push them through the cloud. For me, I think there’s at least 3 things;

  1. The Azure Spatial Anchor service is already built and you can get an instance with a few clicks of the mouse.
    1. You don’t have to go roll your own service and wonder about all the “abilities” of scalability, reliability, availability, authentication, authorisation, logging, monitoring, etc.
    2. There’s already a set of client-side libraries to make this easy to use in your environment.
  2. The Azure Spatial Anchor service/SDK gives you x-platform capabilities for anchors.
    1. The Azure Spatial Anchor service gives you the ability to transfer spatial anchors between applications running on HoloLens, ARKit devices and ARCore devices.
  3. The Azure Spatial Anchor service lets you define metadata with your anchors.
    1. The SDK supports the notion of ‘nearby’ anchors – the SDK lets you capture a group of anchors that are located physically near to each other & then query in the future to find those anchors again.
    2. The SDK also supports adding property sets to anchors to use for your own purposes.

Point #2 above is perhaps the most technically exciting feature here – i.e. I’ve never before seen anchors shared across HoloLens, iOS and Android devices so this opens up new x-device scenarios for developers.

That said, point #1 shouldn’t be underestimated – having a service that’s already ready to run is usually a lot better than trying to roll your own.

So, how do you go about using the service? Having checked out the samples, I then wanted to do a walkthrough on my own and that’s what follows here but keep a couple of things in mind;

  • I’m experimenting here, I can get things wrong Smile
  • The service is in preview.
  • I’m going to take a HoloLens/Unity centric approach as that’s the device that I have to hand.
  • There are going to be places where I’ll overlap with the Quick Start and I’ll just refer to it at that point.

Using the Service Step 1 – Creating an Instance of the Service

Getting to the point where you have a service up and running using (e.g.) the Azure Portal is pretty easy.

I just followed this Quick Start step labelled “Create a Spatial Anchors Resource” and I had my service visible inside the portal inside of 2-3 minutes.

Using the Service Step 2 – Making a Blank Project in Unity

Once I had a service up and running, I wanted to be able to get to it from Unity and so I went and made a blank project suitable for holographic development.

I’m using Unity 2018.3.2f1 at the time of writing (there are newer 2018.3 versions).

I’ve gone through the basics of setting up a project for HoloLens development many times on this blog site before so I won’t cover them here but if you’re new to this then there’s a great reference over here that will walk you through getting the camera, build settings, project settings etc. all ok for HoloLens development.

Using the Service Step 3 – Getting the Unity SDK

Ok, this is the first point at which I got stuck. When I click on this page on the docs site;

image

then the link to the SDK takes me to the detailed doc pages but it doesn’t seem to tell me where I get the actual SDK from – I was thinking of maybe getting a Unity package or similar but I’ve not found that link yet.

This caused me to unpick the sample a little and I learned a few things by doing that. In the official Unity sample you’ll see that the plugins folder (for HoloLens) contains these pieces;

image

and if you examine the post-build step here in this script you’ll see that there’s a function which essentially adds the nuget package Microsoft.Azure.SpatialAnchors.WinCPP into the project when it’s built;

image

and you can see that the script can cope with .NET projects and C++ projects (for IL2CPP) although I’d flag a note in the readme right now which suggests that this doesn’t work for IL2CPP anyway today;

### Known issues for HoloLens

For the il2cpp scripting backend, see this [issue]( https://forum.unity.com/threads/httpclient.460748/ ).

The short answer to the workaround is to:

1. First make a mcs.rsp with the single line `-r:System.Net.Http.dll`. Place this file in the root of your assets folder.
2. Copy the `System.net.http.dll` from `<unityInstallDir>\Editor\Data\MonoBleedingEdge\lib\mono\4.5\System.net.http.dll` into your assets folder.

There is an additional issue on the il2cpp scripting backend case that renders the library unusable in this release.

so please keep that in mind given that IL2CPP is the new default backend for these applications.

I haven’t poked into the iOS/Android build steps at the time of writing so can’t quite say what happens there just yet.

This all means that when I build from Unity I end up with a project which includes a reference to Microsoft.Azure.SpatialAnchors.WinCPP as a Nuget package as below (this is taken from a .NET backend project);

image

so, what’s in that thing? Is it a WinRT component?

I don’t think it is. I had to go and visit the actual Nuget package to try and figure it out but when I took a look I couldn’t find any .winmd file or similar. All I found in that package was a .DLL;

image

and as far as I can tell this is just a flat DLL with a bunch of exported flat functions like these;

image

I can only guess but I suspect then that the SDK is built in C/C++ so as to be portable across iOS, Android & UWP/Unity and then packaged up in slightly different ways to hit those different target environments.

Within Unity, this is made more palatable by having a bridge script which is included in the sample project called AzureSpatialAnchorsBridge.cs;

image

which then does a bunch of PInvokes into that flat DLL like this one;

image

so that’s how that seems to work.

If I then want to take this across to a new project, it feels like I need to package up a few things and I tried to package;

image

hoping to come away with the minimal set of pieces that I need to make this work for HoloLens and that seemed to work when I imported this package into my new, blank project.

I made sure that project had both the InternetClient and Microphone capabilities and importantly SpatialPerception, with that in place, I’m now in my blank project and ready to write some code.

Using the Service Step 4 – Getting a Cloud Session

In true Unity tradition, I made an empty GameObject and threw a script onto it called ‘TestScript’ and then I edited in a small amount of infrastructure code;

using Microsoft.Azure.SpatialAnchors;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.Windows.Speech;
using UnityEngine.XR.WSA;
#if ENABLE_WINMD_SUPPORT
using Windows.Media.Core;
using Windows.Media.Playback;
using Windows.Media.SpeechSynthesis;
#endif // ENABLE_WINMD_SUPPORT

public class TestScript : MonoBehaviour
{
    public Material cubeMaterial;

    void Start()
    {
        this.cubes = new List<GameObject>();

        var speechActions = new Dictionary<Task>()
        {
            ["session"] = this.OnCreateSessionAsync,
            ["cube"] = this.OnCreateCubeAsync,
            ["clear"] = this.OnClearCubesAsync
        };
        this.recognizer = new KeywordRecognizer(speechActions.Keys.ToArray());

        this.recognizer.OnPhraseRecognized += async (s) =>
        {
            if ((s.confidence == ConfidenceLevel.Medium) || 
                (s.confidence == ConfidenceLevel.High))
            {
                Func<Task> value = null;

                if (speechActions.TryGetValue(s.text.ToLower(), out value))
                {
                    await value();
                }
            }
        };
        this.recognizer.Start();
    }
    async Task OnCreateSessionAsync()
    {
        // TODO: Create a cloud anchor session here.
    }
    async Task OnCreateCubeAsync()
    {
        var cube = GameObject.CreatePrimitive(PrimitiveType.Cube);

        cube.transform.localScale = new Vector3(0.2f, 0.2f, 0.2f);

        cube.transform.position = 
            Camera.main.transform.position + 2.0f * Camera.main.transform.forward;

        cube.GetComponent<Renderer>().material = this.cubeMaterial;

        this.cubes.Add(cube);

        var worldAnchor = cube.AddComponent<WorldAnchor>();
    }
    async Task OnClearCubesAsync()
    {
        foreach (var cube in this.cubes)
        {
            Destroy(cube);
        }
        this.cubes.Clear();
    }
    public async Task SayAsync(string text)
    {
        // Ok, this is probably a fairly nasty way of playing a media stream in
        // Unity but it sort of works so I've gone with it for now 🙂
#if ENABLE_WINMD_SUPPORT
        if (this.synthesizer == null)
        {
            this.synthesizer = new SpeechSynthesizer();
        }
        using (var stream = await this.synthesizer.SynthesizeTextToStreamAsync(text))
        {
            using (var player = new MediaPlayer())
            {
                var taskCompletionSource = new TaskCompletionSource<bool>();

                player.Source = MediaSource.CreateFromStream(stream, stream.ContentType);

                player.MediaEnded += (s, e) =>
                {
                    taskCompletionSource.SetResult(true);
                };
                player.Play();
                await taskCompletionSource.Task;
            }
        }

#endif // ENABLE_WINMD_SUPPORT
    }
#if ENABLE_WINMD_SUPPORT
    SpeechSynthesizer synthesizer;
#endif // ENABLE_WINMD_SUPPORT

    KeywordRecognizer recognizer;
    List<GameObject> cubes;
}

and so this gives me the ability to say “session” to create a session, “cube” to create a cube with a world anchor and “clear” to get rid of all my cubes.

Into that, it’s fairly easy to add an instance of CloudSpatialAnchorSession and create it but note that I’m using the easy path at the moment of configuring it with the ID and Key for my service. In the real world, I’d want to configure it to do auth properly and the service is integrated with AAD auth to make that easier for me if I want to go that way.

I added a member variable of type CloudSpatialAnchorSession and then just added in a little code into my OnCreateSessionAsync method;

    async Task OnCreateAsync()
    {
        if (this.cloudAnchorSession == null)
        {
            this.cloudAnchorSession = new CloudSpatialAnchorSession();
            this.cloudAnchorSession.Configuration.AccountId = ACCOUNT_ID;
            this.cloudAnchorSession.Configuration.AccountKey = ACCOUNT_KEY;
            this.cloudAnchorSession.Error += async (s, e) => await this.SayAsync("Error");
            this.cloudAnchorSession.Start();
        }
    }

and that’s that. Clearly, I’m using speech here to avoid having to make “UI”.

Using the Services Step 5 – Creating a Cloud Anchor

Ok, I’ve already got a local WorldAnchor on any and all cubes that get created here so how do I turn these into cloud anchors?

The first thing of note is that the CloudSpatialAnchorSession has these 2 floating point values (0-1) which tell you whether it is ready or not to create a cloud anchor. You call GetSessionStatusAsync and it returns a SessionStatus which reports;

If it’s not ready then you need to get your user to walk around a bit until it is ready with some nice UX and so on and you can even query the UserFeedback to see what you might suggest to the user to get them to improve on the situation.

It looks like you can also get notified of changes to these values by handling the SessionUpdated event as well.

Consequently, I wrote a little method to try and poll these values, checking for something that was over 1.0f;

    async Task WaitForSessionReadyToCreateAsync()
    {
        while (true)
        {
            var status = await this.cloudAnchorSession.GetSessionStatusAsync();

            if (status.ReadyForCreateProgress >= 1.0f)
            {
                break;
            }
            await Task.Delay(250);
        }
    }

and that seemed to work reasonably although, naturally, the hard-coded 250ms delay might not be the smartest thing to do.

With that in place though I can then add this little piece of code to my OnCreateCubeAsync method just after it attaches the WorldAnchor to the cu

        var cloudSpatialAnchor = new CloudSpatialAnchor(
            worldAnchor.GetNativeSpatialAnchorPtr(), false);

        await this.WaitForSessionReadyToCreateAsync();

        await this.cloudAnchorSession.CreateAnchorAsync(cloudSpatialAnchor);

        this.SayAsync("cloud anchor created");
and sure enough I see the portal reflecting that I have created an anchor in the cloud;

image

Ok – anchor creation is working! Let’s move on and see if I can get an anchor re-localised.

Using the Service Step 5 – Localising an Anchor

In so much as I can work out so far, the process of ‘finding’ one or more anchors comes down to using a CloudSpatialAnchorWatcher and asking it to look for some anchors for you in one of two ways by using this AnchorLocateCriteria;

  • I can give the watcher one or more identifiers for anchors that I have previously uploaded (note that the SDK fills in the cloud anchor ID (string (guid)) in the Identifier property of the CloudSpatialAnchor after it has been saved to the cloud).
  • I can ask the watcher to look for anchors that are nearby another anchor.

I guess the former scenario works when my app has some notion of a location based on something like a WiFI network name, a marker, a GPS co-ordinate or perhaps just some setting that the user has chosen and this can then be used to find a bunch of named anchors that are supposed to be associated with that place.

Once one or more of those anchors has been found, the nearby mode can perhaps be used to find other anchors near to that site. The way in which anchors become ‘nearby’ is documented in the “Connecting Anchors” help topic here.

It also looks like I have a choice when loading anchors as to whether I want to include the local cache on the device and whether I want to load anchors themselves or purely their metadata so that I can (presumably) do some more filtering before deciding to load. That’s reflected in the properties BypassCache and RequestedCategories respectively.

In trying to keep my test code here as short as possible, I figured that I would simply store in memory any anchor Ids that have been sent off to the cloud and then I’d add another command “Reload” which attempted to go back to the cloud, get those anchors and recreate the cubes in the locations where they were previously stored.

I set the name of the cube to be the anchor ID from the cloud, i.e. after I create the cloud anchor I just do this;

        await this.cloudAnchorSession.CreateAnchorAsync(cloudSpatialAnchor);

        // NEW!
        cube.name = cloudSpatialAnchor.Identifier;

        this.SayAsync("cloud anchor created");

and so that stores the IDs for me. I also need to change the way in which I create the session in order to handle 2 new events, AnchorLocated and LocateAnchorsCompleted when I create the CloudSpatialAnchorSession;

   async Task OnCreateSessionAsync()
    {
        if (this.cloudAnchorSession == null)
        {
            this.cloudAnchorSession = new CloudSpatialAnchorSession();
            this.cloudAnchorSession.Configuration.AccountId = ACCOUNT_ID;
            this.cloudAnchorSession.Configuration.AccountKey = ACCOUNT_KEY;
            this.cloudAnchorSession.Error += async (s, e) => await this.SayAsync("Error");

            // NEW
            this.cloudAnchorSession.AnchorLocated += OnAnchorLocated;

            // NEW
            this.cloudAnchorSession.LocateAnchorsCompleted += OnLocateAnchorsCompleted;

            this.cloudAnchorSession.Start();
        }
    }

and then I added a new voice command “reload” which grabs all the IDs from the cubes and attempts to create a watcher to reload them;

    async Task OnReloadCubesAsync()
    {
        if (this.cubes.Count > 0)
        {
            var identifiers = this.cubes.Select(c => c.name).ToArray();

            await this.OnClearCubesAsync();

            var watcher = this.cloudAnchorSession.CreateWatcher(
                new AnchorLocateCriteria()
                {
                    Identifiers = identifiers,
                    BypassCache = true,
                    RequestedCategories = AnchorDataCategory.Spatial,
                    Strategy = LocateStrategy.AnyStrategy
                }
            );
        }
    }

and then finally the event handler for each located anchor is as follows – I basically recreate the cube and attach the anchor;

    void OnAnchorLocated(object sender, AnchorLocatedEventArgs args)
    {
        UnityEngine.WSA.Application.InvokeOnAppThread(
            () =>
            {
                var cube = GameObject.CreatePrimitive(PrimitiveType.Cube);

                cube.transform.localScale = new Vector3(0.2f, 0.2f, 0.2f);

                cube.GetComponent<Renderer>().material = this.relocalizedCubeMaterial;

                var worldAnchor = cube.AddComponent<WorldAnchor>();

                worldAnchor.SetNativeSpatialAnchorPtr(args.Anchor.LocalAnchor);

                cube.name = args.Identifier;

                SayAsync("Anchor located");
            },
            false
        );
    }

and the handler for when all anchors have been located just tells me that the process has finished;

    void OnLocateAnchorsCompleted(object sender, LocateAnchorsCompletedEventArgs args)
    {
        SayAsync("Anchor location completed");
        args.Watcher.Stop();
    }

and that’s pretty much it – I found that my anchors reload in much the way that I’d expect them to.

Wrapping Up

As I said at the start of the post, this was just me trying out a few rough ideas and I’ve covered nothing that isn’t already present in the official samples but I found that I learned a few things along the way and I feel like I’m now a little more conversant with this service. Naturally, I need to revisit and go through the process of updating/deleting anchors and also of looking at gathering ‘nearby’ anchors and re-localising them but I think that I “get it” more than I did at the start of the post.

The other thing I need to do is to try this out from a different kind of device, more than likely an Android phone but that’s for another post Smile