Rough Notes on Experiments with UWP APIs in the Unity Editor with C++/WinRT

This post is a bunch of rough notes around a discussion that I’ve been having with myself around working with UWP code in Unity when building mixed reality applications for HoloLens. To date, I’ve generally written the code which calls UWP APIs in .NET code and followed the usual practice around it but in recent times I’ve seen folks doing more around implementing their UWP API calls in native code and so I wanted to experiment with that a little myself.

These notes are very rough so please apply a pinch of salt as I may well have got things wrong (it happens frequently) and I’m really just writing down some experiments rather than drawing a particular conclusion.

With those caveats in place…

Background – .NET and .NET Native

I remember coming to .NET in around 2000/2001.

At the time, I had been working as a C/C++ developer for around 10 years and I was deeply sceptical of .NET and the idea of C# as a new programming language and that I might end up running code that was Just-In-Time compiled.

That said, I was also coming off the back of 10 years of shipping code in C/C++ and the various problems around crashes, hangs, leaks, heap fragmentation, mismatched header files, etc. etc. etc. that afflicted the productivity of C/C++.

So, I was sceptical on the one hand but open to new ideas on the other and, over time, my C++ (more than my C) became rusty as I transitioned to C# and the CLR and its CIL.

There’s a bunch of advantages that come to having binaries made up of a ‘Common Intermediate Language’ underpinned by the CLR rather than native code. Off the top of my head, that might include things like;

  • Re-use of components and tooling across programming languages.
  • CIL was typically a lot smaller than native code representation of the same functionality.
  • One binary could support (and potentially optimize for) any end processor architecture by being compiled just-in-time on the device in question rather than ahead-of-time which requires per-architecture binaries and potentially per-processor variant optimisations.

and there are, no doubt, many more.

Like many things, there are also downsides with one being the potential impact on start-up times and, potentially, memory usage (and the ability to share code across processes) as CIL code is loaded for the first time and the methods JITted to a specific process’ memory in order to have runnable code on the target machine.

Consequently, for the longest time there have been a number of attempts to overcome that JITting overhead by doing ahead of time compilation including the fairly early NGEN tool (which brought with it some of its own challenges) and, ultimately, the development of the .NET Native set of technologies.

.NET Native and the UWP Developer

.NET Native had a big impact on the developer targeting the Universal Windows Platform (UWP) because all applications delivered from the Windows store are ultimately built with the .NET native tool-chain and so developers need to build and test with that tool-chain before submitting their app to the Store.

Developers who had got used to the speed with which a .NET application could be built, run and debugged inside of Visual Studio soon learned that building with .NET Native could introduce a longer build time and also that there were rare occasions where the native code didn’t match the .NET code and so one tool-chain could have bugs that another did not exhibit. That could also happen because of the .NET native compiler’s feature to remove ‘unused’ code/metadata which can have an impact on code – e.g. where reflection is involved.

However, here in 2019 those issues are few and far between & .NET Native is just “accepted” as the tool-chain that’s ultimately used to build a developer’s app when it goes to the Windows Store.

I don’t think that developers’ workload has been affected hugely because I suspect that most UWP developers probably still follow the Visual Studio project structure and use the Debug configuration (.NET compiler) to do their builds during development making use of the regular, JITted .NET compiler and reserve the Release configuration (.NET Native compiler) for their final testing. Either way, your code is being compiled by a Microsoft compiler to CIL and by a Microsoft compiler from CIL to x86/x64/ARM.

It’s worth remembering that whether you write C# code or C++ code the debugger is always doing a nice piece of work to translate between the actual code that runs on the processor and the source that you (or someone else) wrote and want to step through getting stack frames, variable evaluation etc. The compiler/linker/debugger work together to make sure that via symbols (or program databases (PDBs)) this process works so seamlessly that, at times, it’s easy to forget how complicated a process it is and we take it for granted across both ‘regular .NET’ and ‘.NET Native’.

So, this workflow is well baked and understood and, personally, I’d got pretty used to it as a .NET/UWP developer and it didn’t really change whether developing for PC or other devices like HoloLens with the possible exception that deployment/debugging is naturally going to take a little more time on a mobile-powered device than on a huge PC.

Unity and the UWP

But then I came to Unity 😊

In Unity, things initially seem the same for a UWP developer. You write your .NET code in the editor, the editor compiles it “on the fly” as you save those code changes and then you can run and debug that code in the editor.

As an aside, the fact that you can attach the .NET debugger to the Unity Editor is (to me) always technically impressive and a huge productivity gain.

When you want to build and deploy, you press the right keystrokes and Unity generates a C# project for you with/without all your C# code in it (based on the “C# Projects” setting) and you are then back into the regular world of UWP development. You have some C# code, you have your debugger and you can build debug (.NET) or release (.NET Native) just like any other UWP app written with .NET.

Unity and .NET Scripting/IL2CPP

That’s true if you’re using the “.NET Scripting backend” in Unity. However, that backend is deprecated as stated in the article that I just linked to and so, really, a modern developer should be using the IL2CPP backend.

That deprecation has implications. For example, if you want to move to using types from .NET Standard 2.0 in your app then you’ll find that Unity’s support for .NET Standard 2.0 lives only in the IL2CPP backend and hasn’t been implemented in the .NET Scripting backend (because it’s deprecated).

2018.2.16f1, UWP, .NET Scripting Backend, .NET Standard 2.0 Build Errors

With the IL2CPP backend, life in the editor continues as before. Unity builds your .NET code, you attach your .NET debugger and you can step through your code. Again, very productive.

However, life outside of the editor changes in that any code compiled to CIL (i.e. scripts plus dependencies) is translated into C++ code by the compiler. The process of how this works is documented here and I think it’s well worth 5m of your time to read through that documentation if you haven’t already.

This has an impact on build times although I’ve found that if you carefully follow the recommendations that Unity makes on this here then you can get some cycles back but it’s still a longer process than it was previously.

Naturally, when Unity now builds what drops out is not a C#/.NET Visual Studio project but, instead, a C++ Visual Studio project. You can then choose the processor architecture and debug/release etc. but you’re compiling C++ code into native code and that C++ represents all the things you wrote along with translations of lots of things that you didn’t write (e.g. lists, dictionaries, etc. etc.). Those compilations times, again, can get a bit long and you get used to watching the C++ compile churn its way through implementations of things like generics, synchronisation primitives, etc.

Just as with .NET Native, Unity’s C#->C++ translation has the advantage of stripping out things which aren’t used which can impact technologies like reflection and, just like .NET Native, Unity has a way of dealing with that as detailed here.

When it comes to debugging that code, you have two choices. You can either;

  • Debug it at the C# level.
  • Debug the generated C++.
  • Ok, ok, if you’re hardcore you can just debug the assembly but I’ll assume you don’t want to be doing this all the time (although I’ll admit that I did single step some pieces while trying to fix things for this post but it’s more by necessity than choice).

C# debugging involves setting the “Development Build” and “Script Debugging” options as described here and you essentially run up the app on the target device with this debugging support switched on and then ask the Unity debugger to attach itself to that app similarly to the way in which you ask the Unity debugger to attach to the editor. Because this is done over the network, you also have to ensure that you set certain capabilities in your UWP app manifest (InternetClient, InternetClientServer, PrivateNetworkClientServer).

For the UWP/HoloLens developer, this isn’t without its challenges at the time of writing and I mentioned some of those challenges in this post;

A Simple glTF Viewer for HoloLens

and my friend Joost just wrote a long post about how to get this working;

Debugging C# code with Unity IL2CPP projects running on HoloLens or immersive headsets

and that includes screenshots and provides a great guide. I certainly struggled to get this working when I tried it for the first time around as you can see from the forum thread I started below;

Unity 2018.2.16f1, UWP, IL2CPP, HoloLens RS5 and Managed Debugging Problems.

so a guide is very timely and welcome.

As was the case with .NET Native, of course it’s possible that the code generated by IL2CPP differs in its behavior from the .NET code that now runs inside the editor and so it’s possible to get into “IL2CPP bugs” which can seriously impact your productivity.

C# debugging kind of feels a little “weird” at this point as you stare into the internals of the sausage machine. The process makes it very obvious that what you are debugging is code compiled from a C++ project but you point a debugger at it and step through as though it was a direct compilation of your C# code. It just feels a little odd to me although I think it’s mainly perception as I have long since got over the same feeling around .NET Native and it’s a very similar situation.

Clearly, Unity are doing the right thing with making the symbols line up here which is clever in itself but I feel like there are visible signs of the work going on when it comes to performance of debugging and also some of the capabilities (e.g. variable evaluation etc). However, it works and that’s the main thing 😊

In these situations I’ve often found myself with 2 instances of Visual Studio with one debugging the C# code using the Unity debugger support while the other attached as a native debugger to see if I catch exceptions etc. in the real code. It’s a change to the workflow but it’s do-able.

IL2CPP and the UWP Developer

That said, there’s still a bit of an elephant in the room here in that for the UWP developer there’s an additional challenge to throw into this mix in that the Unity editor always uses Mono which means that it doesn’t understand calls to the UWP API set (or WinRT APIs if you prefer) as described here.

This means that it’s likely that a UWP developer (making UWP API calls) takes more pain here than the average Unity developer as to execute the “UWP specific” parts of their code they need to set aside the editor, hit build to turn .NET into C++ and then hit build in Visual Studio to build that C++ code and then they might need to deploy to a device before being able to debug (either the generated C++ or the original .NET code) that calls into the UWP.

The usual pattern for working with UWP code is detailed on this doc page and involves taking code like that below;

which causes the Unity editor some serious concern because it doesn’t understand Windows.* namespaces;

And so we have to take steps to keep this code away from the editor;

And then this will “work” both in the editor and if we compile it out for UWP through the 2-stage compilation process. Note that the use of MessageDialog here is just an example and probably not a great one because there’s no doubt some built-in support in Unity for displaying a dialog without having to resort to a UWP API.

Calling UWP APIs from the Editor

I’ve been thinking about this situation a lot lately and, again, with sympathy for the level of complexity of what’s going on inside that Unity editor – it does some amazing things in making all of this work cross-platform.

I’d assume that trying to bring WinRT/UWP code directly into that Editor environment is a “tall-order” and I think it stems from the editor running on Mono and there not being underlying support there for COM interop although I could be wrong. Either way, part of me understands why the editor can’t run my UWP code.

On the other hand, the UWP APIs aren’t .NET APIs. They are native code APIs in Windows itself and the Unity editor can happily load up native plugins and execute custom native code and so there’s a part of me wonders whether the editor couldn’t get closer to letting me call UWP APIs.

When I first came to look at this a few years ago, I figured that I might be able to “work around it” by trying to “hide” my UWP code inside some .NET assembly and then try to add that assembly to Unity as a plugin but the docs say that managed plugins can’t consume Windows Runtime APIs.

As far as I know, you can’t have a plugin which is;

  • a WinRT component implemented in .NET or in C++.
  • a .NET component that references WinRT APIs or components.

But you can have a native plugin which makes calls out to WinRT APIs so what does it look like to go down that road?

Unity calling Native code calling UWP APIs

I wondered whether this might be a viable option for a .NET developer given the (fairly) recent arrival of C++/WinRT which seems to make programming the UWP APIs much more accessible than it was in the earlier worlds of WRL and/or C++/CX.

To experiment with that, I continued my earlier example and made a new project in Visual C++ as a plain old “Windows Desktop DLL”.

NB: Much later in this post, I will regret thinking that a “plain old Windows Desktop DLL” is all I’m going to need here but, for a while, I thought I would get away with it.

To that project, I can add includes for C++/WinRT to my stdafx.h as described here;


And I can alter my link options to link with WindowsApp.lib;

And then I can maybe write a little function that’s exported from my DLL;

And the implementation there is C++/WinRT – note that I just use a Uri by declaring one rather than leaping through some weird ceremony to make use of it.

If I drag the DLL that I’ve built into Unity as a plugin then my hope is that I can tell Unity to use the 64-bit version purely for the editor and the 32-bit version purely for the UWP player;

I can then P/Invoke from my Unity script into that exported DLL function as below;

And then I can attach my 2 debuggers to Unity and debug both the managed code and the native code and I’m making calls into the UWP! from the editor and life is good & I don’t have to go through a long build cycle.

Here’s my managed debugger attached to the Unity editor;

And here’s the call being returned from the native debugger also attached to the Unity editor;

and it’s all good.

Now, if only life were quite so simple 😊

Can I do that for every UWP API?

It doesn’t take much to break this in that (e.g.) if I go back to my original example of displaying a message box then it’s not too hard to add an additional header file;

And then I can write some exported function that uses MessageDialog;

and I can import it and call it from a script in Unity;

but it doesn’t work. I get a nasty exception here and I think that’s because I chose MessageDialog as my API to try out here and MessageDialog relies on a CoreWindow and I don’t think I have one in the Unity editor. Choosing a windowing API was probably a bad idea but it’s a good illustration that I’m not likely to magically just get everything working here.

There’s commentary in this blog post around challenges with APIs that depend on a CoreWindow.

What about Package Identity?

What about some other APIs. How about this? If I add the include for Windows.Storage.h;

And then add an exported function (I added a DuplicateString function to take that pain away) to get the name of the local application data folder;

and then interop to it from Unity script;

and then this blows up;

Now, this didn’t exactly surprise me. In fact, the whole reason for calling that API was to cause this problem as I knew it was coming as part of that “UWP context” includes having a package identity and Unity (as a desktop app) doesn’t have a package identity and so it’s not really fair to ask for the app data folder when the application doesn’t have one.

There’s a docs page here about this notion of APIs requiring package identity.

Can the Unity editor have a package identity?

I wondered whether there might be some way to give Unity an identity such that these API calls might work in the editor? I could think of 2 ways.

  1. Package Unity as a UWP application using the desktop bridge technologies.
  2. Somehow ‘fake’ an identity such that from the perspective of the UWP APIs the Unity editor seems to have a package identity.

I didn’t really want to attempt to package up Unity and so I thought I’d try (2) and ended up having to ask around and came up with a form of a hack although I don’t know how far I can go with it.

Via the Invoke-CommandInDesktopPackage PowerShell command it seems it’s possible to execute an application in the “context” or another desktop bridge application.

So, I went ahead and made a new, blank WPF project and then I used the Visual Studio Packaging Project to package it as a UWP application using the bridge and that means that it had “FullTrust” as a capability and I also gave it “broadFileSystemAccess” (just in case).

I built an app package from this and installed it onto my system and then I experimented with running Unity within that app’s context as seen below – Unity here has been invoked inside the package identity of my fake WPF desktop bridge app;

I don’t really know to what extent this might break Unity but, so far, it seems to survive ok and work but I haven’t exactly pushed it.

With Unity running in this UWP context, does my code run any better than before?

Well, firstly, I noticed that Unity no longer seemed to like loading my interop DLL. I tried to narrow this down and haven’t figured it out yet but I found that;

  1. First time, Unity wouldn’t find my interop DLL.
  2. I changed the name to something invalid, forcing Unity to look for that and fail.
  3. I changed the name back to the original name, Unity found it.

I’m unsure on the exact thing that’s going wrong there so I need to return to that but I can still get Unity to load my DLL, I just have to play with the script a little first. But, yes, with a little bit of convincing I can get Unity to make that call;


And what didn’t work without an identity now works when I have one so that’s nice!

The next, natural thing to do might be to read/write some data from/to a file. I thought I’d try a read and to do that I used the co_await syntax to do the async pieces and then used the .get() method to ultimately make it a synchronous process as I wasn’t quite ready to think about calling back across the PInvoke boundary.

And that causes a problem depending on how you invoke it. If I invoke it as below;


Then I get an assertion from somewhere in the C++/WinRT headers telling me (I think) that I have called the get() method on an STA thread. I probably shouldn’t call this method directly from my own thread anyway because the way in which I have written it (with the .get()) call blocks the calling thread so regardless of STA/MTA it’s perhaps a bad idea.

However, if I ignore that assertion, the call does seem to actually work and I get the contents of the file back into the Unity editor as below;

But I suspect that I’m not really meant to ignore the assertion and so I can switch the call to something like;

and the assertion goes away and I can read the file contents 😊

It’s worth stating at this point that I’ve not even thought about how I might try to actually pass some notion of an async operation across the PInvoke boundary here, that needs more thought on my part.

Ok, Call some more APIs…

So far, I’ve called dialog.show() and file.read() so I felt like I should try a longer piece of code with a few more API calls in it.

I’ve written a few pieces of code in the past which try to do face detection on frames coming from the camera and I wondered whether I might be able to reproduce that here – maybe write a method which runs until it detects a face in the frames coming from the camera?

I scribbled out some rough code in my DLL;

// Sorry, this shouldn't really be one massive function...
IAsyncOperation<int> InternalFindFaceInDefaultCameraAsync()
{
	auto facesFound(0);

	auto devices = co_await DeviceInformation::FindAllAsync(DeviceClass::VideoCapture);

	if (devices.Size())
	{
		DeviceInformation deviceInfo(nullptr);

		// We could do better here around choosing a device, we just take
		// the front one or the first one.
		for (auto const& device : devices)
		{
			if ((device.EnclosureLocation().Panel() == Panel::Front))
			{
				deviceInfo = device;
				break;
			}
		}
		if ((deviceInfo == nullptr) && devices.Size())
		{
			deviceInfo = *devices.First();
		}
		if (deviceInfo != nullptr)
		{
			MediaCaptureInitializationSettings initSettings;
			initSettings.StreamingCaptureMode(StreamingCaptureMode::Video);
			initSettings.VideoDeviceId(deviceInfo.Id());
			initSettings.MemoryPreference(MediaCaptureMemoryPreference::Cpu);

			MediaCapture capture;
			co_await capture.InitializeAsync(initSettings);

			auto faceDetector = co_await FaceDetector::CreateAsync();
			auto faceDetectorFormat = FaceDetector::GetSupportedBitmapPixelFormats().GetAt(0);

			// We could do better here, we will just take the first frame source and
			// we assume that there will be at least one. 
			auto frameSource = (*capture.FrameSources().First()).Value();
			auto frameReader = co_await capture.CreateFrameReaderAsync(frameSource);

			winrt::slim_mutex mutex;

			handle signal{ CreateEvent(nullptr, true, false, nullptr) };
			auto realSignal = signal.get();

			frameReader.FrameArrived(
				[&mutex, faceDetector, &facesFound, faceDetectorFormat, realSignal]
			(IMediaFrameReader reader, MediaFrameArrivedEventArgs args) -> IAsyncAction
			{
				// Not sure I need this?
				if (mutex.try_lock())
				{
					auto frame = reader.TryAcquireLatestFrame();

					if (frame != nullptr)
					{
						auto bitmap = frame.VideoMediaFrame().SoftwareBitmap();

						if (bitmap != nullptr)
						{
							if (!FaceDetector::IsBitmapPixelFormatSupported(bitmap.BitmapPixelFormat()))
							{
								bitmap = SoftwareBitmap::Convert(bitmap, faceDetectorFormat);
							}
							auto faceResults = co_await faceDetector.DetectFacesAsync(bitmap);

							if (faceResults.Size())
							{
								// We are done, we found a face.
								facesFound = faceResults.Size();
								SetEvent(realSignal);
							}
						}
					}
					mutex.unlock();
				}
			}
			);
			co_await frameReader.StartAsync();

			co_await resume_on_signal(signal.get());

			// Q - do I need to remove the event handler or will the destructor do the
			// right thing for me?
			co_await frameReader.StopAsync();
		}
	}
	return(facesFound);
}

That code is very rough and ready but with an export from the DLL that looks like this;

	__declspec(dllexport) int FindFaceInDefaultCamera()
	{
		int faceCount = InternalFindFaceInDefaultCameraAsync().get();

		return(faceCount);
	}

then I found that I can call it from the editor and, sure enough, the camera lights up on the machine and the code returns that it has detected my face from the camera so that’s using a few UWP classes together to produce a result.

So, I can call into basic APIs (e.g. Uri), I can call into APIs that require package identity (e.g. StorageFile) and I can put together slightly more complex scenarios involving cameras, bitmaps, face detection etc.

It feels like I might largely be able to take this approach to writing some of my UWP code in C++/WinRT and have the same code run both inside of the editor and on the device and debug it in both places and not have to go around longer build times while working it up in the editor.

Back to the device…

I spent a few hours in the Unity editor playing around to get to this point in the post and then I went, built and deployed my code to an actual device and it did not work. Heartbreak 😉

I was getting failures to load my DLL on the device and I quickly put them down to my DLL having dependencies on VC runtime DLLs that didn’t seem to be present. I spent a little bit of time doing a blow-by-blow comparison on the build settings of a ‘UWP DLL’ versus a ‘Windows DLL’ but, in the end, decided I could just build my code once in the context of each.

So, I changed my C++ project such that it contained the original “Windows Desktop DLL” along with a “UWP DLL” and the source code is shared between the two as below;

With that in place, I use the 64-bit “Windows Desktop DLL” in the editor and the 32-bit “UWP DLL” on the device (the ‘player’) and that seems to sort things out for me. Note that both projects build a DLL named NativePlugin.dll.

That said, I’d really wanted to avoid this step and thought I was going to get away with it but I fell at the last hurdle but I’d like to revisit and see if I can take away the ‘double build’ but someone will no doubt tell me what’s going on there.

Wrapping Up

As I said at the start of the post, this is just some rough notes but in making calls out to the few APIs that I’ve tried here I’m left feeling that the next time I have to write some Unity/UWP specific code I might try it out in C++/WinRT first with this PInvoke method that I’ve done here & see how it shapes up as the productivity gain of being able to press ‘Play’ in the editor is huge. Naturally, if that then leads to problems that I haven’t encountered in this post then I can flip back, translate the code back to C# and use the regular “conditional compilation” mechanism.

Code

I’m conscious that I pasted quite a lot of code into this post as bitmaps and that’s not very helpful so I just packaged up my projects onto github over here.

Inside of the code, 2 of the scenarios from this post are included – the code for running facial detection on frames from the camera and the code which writes a file into the UWP app’s local data folder.

I’ve tried that code both in the Unity editor and on a HoloLens device & it seems to work fine in both places.

All mistakes are mine, feel free to feed back and tell me what I’ve done wrong! 🙂

Baby Steps with the Azure Spatial Anchors Service

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens or Azure Mixed Reality other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

One of the many, many strands of the exciting, recent announcements around Mixed Reality (see the video here) was the announcement of a set of Azure Mixed Reality Services.

You can find the home page for these services on the web here and they encompass;

  • Azure Spatial Anchors
  • Azure Remote Rendering

Both of these are, to my mind, vital foundational services that Mixed Reality application builders have needed for quite some time so it’s great to see them surface at Azure.

At the time of writing, the Azure Remote Rendering service is in a private preview so I’m not looking at that right now but the Azure Spatial Anchors service is in a public preview and I wanted to experiment with it a little and thought I would write up some notes here as I went along.

Before I do that though…

Stop – Read the Official Docs

There’s nothing that I’m going to say in this post that isn’t covered by the official docs so I’d recommend that you read those before reading anything here and I’m providing some pointers below;

  1. Check out the overview page here if you’re not familiar with spatial anchors.
  2. Have a look at the Quick Start here to see how you can quickly get started in creating a service & making use of it from Unity.
  3. Check out the samples here so that you can quickly get up and running rather than fumbling through adding library references etc. (note that the Quick Start will lead you to the samples anyway).

With that said, here’s some rough notes that I made while getting going with the Azure Spatial Anchors service from scratch.

Please keep in mind that this service is new to me so I’m really writing up my experiments & I may well make some mistakes.

A Spatial WHAT?!

If you’re not coming from a HoloLens background or from some other type of device background where you’re doing MR/AR and creating ‘spatial anchors’ then you might wonder what these things are.

To my mind, it’s a simple concept that’s no doubt fiendishly difficult to implement. Here’s my best attempt;

A spatial anchor is a BLOB of data providing a durable representation of a 3D point and orientation in a space.

That’s how I think of it. You might have other definitions. These BLOBs of data usually involve recognising ‘feature points’ that are captured from various camera frames taken from different poses in a space.

If you’re interested in more of the mechanics of this, I found this video from Apple’s 2018 WWDC conference to be one of the better references that I’ve seen;

Understanding ARKit Tracking and Detection

So, a ‘spatial anchor’ is a BLOB of data that allows a device to capture a 3D point and orientation in space & potentially to identify that point again in the future (often known as ‘re-localising the anchor’). It’s key to note that devices and spaces aren’t perfect and so it’s always possible that a stored anchor can’t be brought back to life at some future date.

I find it useful sometimes to make a human analogy around spatial anchors. I can easily make a ‘spatial anchor’ to give to a human being and it might contain very imprecise notions of positioning in space which can nonetheless yield accurate results.

As an example, I could give this description of a ‘spatial anchor’ to someone;

Place this bottle 2cm in on each side from the corner of the red table which is nearest to the window. Lay the bottle down pointing away from the window.

You can imagine being able to walk into a room with a red table & a window and position the bottle fairly accurately based on that.

You can also imagine that this might work in many rooms with windows and red tables & that humans might even adapt and put the bottle onto a purple table if there wasn’t a red one.

Equally, you can imagine finding yourself in a room with no table and saying “sorry, I can’t figure this out”.

I think it’s worth saying that having this set of ‘instructions’ does not tell the person how to find the room nor whether they are in the right room, that is outside of the scope and the same is true for spatial anchors – you have to be vaguely in the right place to start with or use some other mechanism (e.g. GPS, beacons, markers, etc) to get to that place before trying to re-localise the anchor.

Why Anchor?

Having been involved in building applications for HoloLens for a little while now, I’ve become very used to the ideas of applying anchors and, to my mind, there are 3 main reasons why you would apply an anchor to a point/object and the docs are very good on this;

  • For stability of a hologram or a group of holograms that are positioned near to an anchor.
    • This is essentially about preserving the relationship between a hologram and a real point in the world as the device alters its impression of the structure of the space around it. As humans, we expect a hologram placed on the edge of a table to stay on the edge of that table even if a device is constantly refining its idea of the mesh that makes up that table and the rest of the space around it.
  • For persistence.
    • One of the magical aspects of mixed reality enabled by spatial anchors is the ability for a device to remember the positions of holograms in a space. The HoloLens can put the hologram back on the edge of the table potentially weeks or months after it was originally placed there.
  • For sharing.
    • The second magical aspect of mixed reality enabled by spatial anchors is the ability for a device to read a spatial anchor created by another device in a space and thereby construct a transform from the co-ordinate system of the first device to that of the second. This forms the basis for those magical shared holographic experiences.

Can’t I Already Anchor?

At this point, it’s key to note that for HoloLens developers the notion of ‘spatial anchors’ isn’t new. The platform has supported anchors since day 1 and they work really well.

Specifically, if you’re working in Unity then you can fairly easily do the following;

  • Add the WorldAnchor component to your GameObject in order to apply a spatial anchor to that component.
    • It’s fairly common to use an empty GameObject which then acts as a parent to a number of other game objects.
    • The isLocated property is fairly key here as is the OnTrackingChanged event and note also that there is an opaque form of reference to the  underlying BLOB via GetNativeSpatialAnchorPtr and SetNativeSpatialAnchorPtr.
  • Use the WorldAnchorStore class in order to maintain a persistent set of anchors on a device indexed by a simple string identifier.
  • Use the WorldAnchorTransferBatch class in order to;
    • Export the blob representing the anchor
    • Import a blob representing an anchor that has previously been exported

With this set of tools you can quite happily build HoloLens applications that;

  • Anchor holograms for stability.
  • Persist anchors over time such that holograms can be recreated in their original locations.
  • Share anchors between devices such that they can agree on a common co-ordinate system and present shared holographic experiences.

and, of course, you can do this using whatever transfer or networking techniques you like including, naturally, passing these anchors through the cloud via means such as Azure Blob Storage or ASP.NET SignalR or whatever you want. It’s all up for grabs and has been for the past 3 years or so.

Why A New Spatial Anchor Service?

With all that said, why would you look to the new Azure Spatial Anchor service if you already have the ability to create anchors and push them through the cloud. For me, I think there’s at least 3 things;

  1. The Azure Spatial Anchor service is already built and you can get an instance with a few clicks of the mouse.
    1. You don’t have to go roll your own service and wonder about all the “abilities” of scalability, reliability, availability, authentication, authorisation, logging, monitoring, etc.
    2. There’s already a set of client-side libraries to make this easy to use in your environment.
  2. The Azure Spatial Anchor service/SDK gives you x-platform capabilities for anchors.
    1. The Azure Spatial Anchor service gives you the ability to transfer spatial anchors between applications running on HoloLens, ARKit devices and ARCore devices.
  3. The Azure Spatial Anchor service lets you define metadata with your anchors.
    1. The SDK supports the notion of ‘nearby’ anchors – the SDK lets you capture a group of anchors that are located physically near to each other & then query in the future to find those anchors again.
    2. The SDK also supports adding property sets to anchors to use for your own purposes.

Point #2 above is perhaps the most technically exciting feature here – i.e. I’ve never before seen anchors shared across HoloLens, iOS and Android devices so this opens up new x-device scenarios for developers.

That said, point #1 shouldn’t be underestimated – having a service that’s already ready to run is usually a lot better than trying to roll your own.

So, how do you go about using the service? Having checked out the samples, I then wanted to do a walkthrough on my own and that’s what follows here but keep a couple of things in mind;

  • I’m experimenting here, I can get things wrong Smile
  • The service is in preview.
  • I’m going to take a HoloLens/Unity centric approach as that’s the device that I have to hand.
  • There are going to be places where I’ll overlap with the Quick Start and I’ll just refer to it at that point.

Using the Service Step 1 – Creating an Instance of the Service

Getting to the point where you have a service up and running using (e.g.) the Azure Portal is pretty easy.

I just followed this Quick Start step labelled “Create a Spatial Anchors Resource” and I had my service visible inside the portal inside of 2-3 minutes.

Using the Service Step 2 – Making a Blank Project in Unity

Once I had a service up and running, I wanted to be able to get to it from Unity and so I went and made a blank project suitable for holographic development.

I’m using Unity 2018.3.2f1 at the time of writing (there are newer 2018.3 versions).

I’ve gone through the basics of setting up a project for HoloLens development many times on this blog site before so I won’t cover them here but if you’re new to this then there’s a great reference over here that will walk you through getting the camera, build settings, project settings etc. all ok for HoloLens development.

Using the Service Step 3 – Getting the Unity SDK

Ok, this is the first point at which I got stuck. When I click on this page on the docs site;

image

then the link to the SDK takes me to the detailed doc pages but it doesn’t seem to tell me where I get the actual SDK from – I was thinking of maybe getting a Unity package or similar but I’ve not found that link yet.

This caused me to unpick the sample a little and I learned a few things by doing that. In the official Unity sample you’ll see that the plugins folder (for HoloLens) contains these pieces;

image

and if you examine the post-build step here in this script you’ll see that there’s a function which essentially adds the nuget package Microsoft.Azure.SpatialAnchors.WinCPP into the project when it’s built;

image

and you can see that the script can cope with .NET projects and C++ projects (for IL2CPP) although I’d flag a note in the readme right now which suggests that this doesn’t work for IL2CPP anyway today;

### Known issues for HoloLens

For the il2cpp scripting backend, see this [issue]( https://forum.unity.com/threads/httpclient.460748/ ).

The short answer to the workaround is to:

1. First make a mcs.rsp with the single line `-r:System.Net.Http.dll`. Place this file in the root of your assets folder.
2. Copy the `System.net.http.dll` from `<unityInstallDir>\Editor\Data\MonoBleedingEdge\lib\mono\4.5\System.net.http.dll` into your assets folder.

There is an additional issue on the il2cpp scripting backend case that renders the library unusable in this release.

so please keep that in mind given that IL2CPP is the new default backend for these applications.

I haven’t poked into the iOS/Android build steps at the time of writing so can’t quite say what happens there just yet.

This all means that when I build from Unity I end up with a project which includes a reference to Microsoft.Azure.SpatialAnchors.WinCPP as a Nuget package as below (this is taken from a .NET backend project);

image

so, what’s in that thing? Is it a WinRT component?

I don’t think it is. I had to go and visit the actual Nuget package to try and figure it out but when I took a look I couldn’t find any .winmd file or similar. All I found in that package was a .DLL;

image

and as far as I can tell this is just a flat DLL with a bunch of exported flat functions like these;

image

I can only guess but I suspect then that the SDK is built in C/C++ so as to be portable across iOS, Android & UWP/Unity and then packaged up in slightly different ways to hit those different target environments.

Within Unity, this is made more palatable by having a bridge script which is included in the sample project called AzureSpatialAnchorsBridge.cs;

image

which then does a bunch of PInvokes into that flat DLL like this one;

image

so that’s how that seems to work.

If I then want to take this across to a new project, it feels like I need to package up a few things and I tried to package;

image

hoping to come away with the minimal set of pieces that I need to make this work for HoloLens and that seemed to work when I imported this package into my new, blank project.

I made sure that project had both the InternetClient and Microphone capabilities and importantly SpatialPerception, with that in place, I’m now in my blank project and ready to write some code.

Using the Service Step 4 – Getting a Cloud Session

In true Unity tradition, I made an empty GameObject and threw a script onto it called ‘TestScript’ and then I edited in a small amount of infrastructure code;

using Microsoft.Azure.SpatialAnchors;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.Windows.Speech;
using UnityEngine.XR.WSA;
#if ENABLE_WINMD_SUPPORT
using Windows.Media.Core;
using Windows.Media.Playback;
using Windows.Media.SpeechSynthesis;
#endif // ENABLE_WINMD_SUPPORT

public class TestScript : MonoBehaviour
{
    public Material cubeMaterial;

    void Start()
    {
        this.cubes = new List<GameObject>();

        var speechActions = new Dictionary<Task>()
        {
            ["session"] = this.OnCreateSessionAsync,
            ["cube"] = this.OnCreateCubeAsync,
            ["clear"] = this.OnClearCubesAsync
        };
        this.recognizer = new KeywordRecognizer(speechActions.Keys.ToArray());

        this.recognizer.OnPhraseRecognized += async (s) =>
        {
            if ((s.confidence == ConfidenceLevel.Medium) || 
                (s.confidence == ConfidenceLevel.High))
            {
                Func<Task> value = null;

                if (speechActions.TryGetValue(s.text.ToLower(), out value))
                {
                    await value();
                }
            }
        };
        this.recognizer.Start();
    }
    async Task OnCreateSessionAsync()
    {
        // TODO: Create a cloud anchor session here.
    }
    async Task OnCreateCubeAsync()
    {
        var cube = GameObject.CreatePrimitive(PrimitiveType.Cube);

        cube.transform.localScale = new Vector3(0.2f, 0.2f, 0.2f);

        cube.transform.position = 
            Camera.main.transform.position + 2.0f * Camera.main.transform.forward;

        cube.GetComponent<Renderer>().material = this.cubeMaterial;

        this.cubes.Add(cube);

        var worldAnchor = cube.AddComponent<WorldAnchor>();
    }
    async Task OnClearCubesAsync()
    {
        foreach (var cube in this.cubes)
        {
            Destroy(cube);
        }
        this.cubes.Clear();
    }
    public async Task SayAsync(string text)
    {
        // Ok, this is probably a fairly nasty way of playing a media stream in
        // Unity but it sort of works so I've gone with it for now 🙂
#if ENABLE_WINMD_SUPPORT
        if (this.synthesizer == null)
        {
            this.synthesizer = new SpeechSynthesizer();
        }
        using (var stream = await this.synthesizer.SynthesizeTextToStreamAsync(text))
        {
            using (var player = new MediaPlayer())
            {
                var taskCompletionSource = new TaskCompletionSource<bool>();

                player.Source = MediaSource.CreateFromStream(stream, stream.ContentType);

                player.MediaEnded += (s, e) =>
                {
                    taskCompletionSource.SetResult(true);
                };
                player.Play();
                await taskCompletionSource.Task;
            }
        }

#endif // ENABLE_WINMD_SUPPORT
    }
#if ENABLE_WINMD_SUPPORT
    SpeechSynthesizer synthesizer;
#endif // ENABLE_WINMD_SUPPORT

    KeywordRecognizer recognizer;
    List<GameObject> cubes;
}

and so this gives me the ability to say “session” to create a session, “cube” to create a cube with a world anchor and “clear” to get rid of all my cubes.

Into that, it’s fairly easy to add an instance of CloudSpatialAnchorSession and create it but note that I’m using the easy path at the moment of configuring it with the ID and Key for my service. In the real world, I’d want to configure it to do auth properly and the service is integrated with AAD auth to make that easier for me if I want to go that way.

I added a member variable of type CloudSpatialAnchorSession and then just added in a little code into my OnCreateSessionAsync method;

    async Task OnCreateAsync()
    {
        if (this.cloudAnchorSession == null)
        {
            this.cloudAnchorSession = new CloudSpatialAnchorSession();
            this.cloudAnchorSession.Configuration.AccountId = ACCOUNT_ID;
            this.cloudAnchorSession.Configuration.AccountKey = ACCOUNT_KEY;
            this.cloudAnchorSession.Error += async (s, e) => await this.SayAsync("Error");
            this.cloudAnchorSession.Start();
        }
    }

and that’s that. Clearly, I’m using speech here to avoid having to make “UI”.

Using the Services Step 5 – Creating a Cloud Anchor

Ok, I’ve already got a local WorldAnchor on any and all cubes that get created here so how do I turn these into cloud anchors?

The first thing of note is that the CloudSpatialAnchorSession has these 2 floating point values (0-1) which tell you whether it is ready or not to create a cloud anchor. You call GetSessionStatusAsync and it returns a SessionStatus which reports;

If it’s not ready then you need to get your user to walk around a bit until it is ready with some nice UX and so on and you can even query the UserFeedback to see what you might suggest to the user to get them to improve on the situation.

It looks like you can also get notified of changes to these values by handling the SessionUpdated event as well.

Consequently, I wrote a little method to try and poll these values, checking for something that was over 1.0f;

    async Task WaitForSessionReadyToCreateAsync()
    {
        while (true)
        {
            var status = await this.cloudAnchorSession.GetSessionStatusAsync();

            if (status.ReadyForCreateProgress >= 1.0f)
            {
                break;
            }
            await Task.Delay(250);
        }
    }

and that seemed to work reasonably although, naturally, the hard-coded 250ms delay might not be the smartest thing to do.

With that in place though I can then add this little piece of code to my OnCreateCubeAsync method just after it attaches the WorldAnchor to the cu

        var cloudSpatialAnchor = new CloudSpatialAnchor(
            worldAnchor.GetNativeSpatialAnchorPtr(), false);

        await this.WaitForSessionReadyToCreateAsync();

        await this.cloudAnchorSession.CreateAnchorAsync(cloudSpatialAnchor);

        this.SayAsync("cloud anchor created");
and sure enough I see the portal reflecting that I have created an anchor in the cloud;

image

Ok – anchor creation is working! Let’s move on and see if I can get an anchor re-localised.

Using the Service Step 5 – Localising an Anchor

In so much as I can work out so far, the process of ‘finding’ one or more anchors comes down to using a CloudSpatialAnchorWatcher and asking it to look for some anchors for you in one of two ways by using this AnchorLocateCriteria;

  • I can give the watcher one or more identifiers for anchors that I have previously uploaded (note that the SDK fills in the cloud anchor ID (string (guid)) in the Identifier property of the CloudSpatialAnchor after it has been saved to the cloud).
  • I can ask the watcher to look for anchors that are nearby another anchor.

I guess the former scenario works when my app has some notion of a location based on something like a WiFI network name, a marker, a GPS co-ordinate or perhaps just some setting that the user has chosen and this can then be used to find a bunch of named anchors that are supposed to be associated with that place.

Once one or more of those anchors has been found, the nearby mode can perhaps be used to find other anchors near to that site. The way in which anchors become ‘nearby’ is documented in the “Connecting Anchors” help topic here.

It also looks like I have a choice when loading anchors as to whether I want to include the local cache on the device and whether I want to load anchors themselves or purely their metadata so that I can (presumably) do some more filtering before deciding to load. That’s reflected in the properties BypassCache and RequestedCategories respectively.

In trying to keep my test code here as short as possible, I figured that I would simply store in memory any anchor Ids that have been sent off to the cloud and then I’d add another command “Reload” which attempted to go back to the cloud, get those anchors and recreate the cubes in the locations where they were previously stored.

I set the name of the cube to be the anchor ID from the cloud, i.e. after I create the cloud anchor I just do this;

        await this.cloudAnchorSession.CreateAnchorAsync(cloudSpatialAnchor);

        // NEW!
        cube.name = cloudSpatialAnchor.Identifier;

        this.SayAsync("cloud anchor created");

and so that stores the IDs for me. I also need to change the way in which I create the session in order to handle 2 new events, AnchorLocated and LocateAnchorsCompleted when I create the CloudSpatialAnchorSession;

   async Task OnCreateSessionAsync()
    {
        if (this.cloudAnchorSession == null)
        {
            this.cloudAnchorSession = new CloudSpatialAnchorSession();
            this.cloudAnchorSession.Configuration.AccountId = ACCOUNT_ID;
            this.cloudAnchorSession.Configuration.AccountKey = ACCOUNT_KEY;
            this.cloudAnchorSession.Error += async (s, e) => await this.SayAsync("Error");

            // NEW
            this.cloudAnchorSession.AnchorLocated += OnAnchorLocated;

            // NEW
            this.cloudAnchorSession.LocateAnchorsCompleted += OnLocateAnchorsCompleted;

            this.cloudAnchorSession.Start();
        }
    }

and then I added a new voice command “reload” which grabs all the IDs from the cubes and attempts to create a watcher to reload them;

    async Task OnReloadCubesAsync()
    {
        if (this.cubes.Count > 0)
        {
            var identifiers = this.cubes.Select(c => c.name).ToArray();

            await this.OnClearCubesAsync();

            var watcher = this.cloudAnchorSession.CreateWatcher(
                new AnchorLocateCriteria()
                {
                    Identifiers = identifiers,
                    BypassCache = true,
                    RequestedCategories = AnchorDataCategory.Spatial,
                    Strategy = LocateStrategy.AnyStrategy
                }
            );
        }
    }

and then finally the event handler for each located anchor is as follows – I basically recreate the cube and attach the anchor;

    void OnAnchorLocated(object sender, AnchorLocatedEventArgs args)
    {
        UnityEngine.WSA.Application.InvokeOnAppThread(
            () =>
            {
                var cube = GameObject.CreatePrimitive(PrimitiveType.Cube);

                cube.transform.localScale = new Vector3(0.2f, 0.2f, 0.2f);

                cube.GetComponent<Renderer>().material = this.relocalizedCubeMaterial;

                var worldAnchor = cube.AddComponent<WorldAnchor>();

                worldAnchor.SetNativeSpatialAnchorPtr(args.Anchor.LocalAnchor);

                cube.name = args.Identifier;

                SayAsync("Anchor located");
            },
            false
        );
    }

and the handler for when all anchors have been located just tells me that the process has finished;

    void OnLocateAnchorsCompleted(object sender, LocateAnchorsCompletedEventArgs args)
    {
        SayAsync("Anchor location completed");
        args.Watcher.Stop();
    }

and that’s pretty much it – I found that my anchors reload in much the way that I’d expect them to.

Wrapping Up

As I said at the start of the post, this was just me trying out a few rough ideas and I’ve covered nothing that isn’t already present in the official samples but I found that I learned a few things along the way and I feel like I’m now a little more conversant with this service. Naturally, I need to revisit and go through the process of updating/deleting anchors and also of looking at gathering ‘nearby’ anchors and re-localising them but I think that I “get it” more than I did at the start of the post.

The other thing I need to do is to try this out from a different kind of device, more than likely an Android phone but that’s for another post Smile

A Simple glTF Viewer for HoloLens

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

A quick note – I’m still not sure whether I should bring this blog back to life having paused it but I had a number of things running around in my head that are easier if written down and so that’s what I’ve done Smile

Viewing 3D Files on a HoloLens

A few weeks ago, a colleague came to me with 2 3D models packaged in files and said “I just want to show these 2 models to a customer on a HoloLens”.

I said to him;

“No problem, open up the files in 3D Viewer on the PC, have a look at them and then transfer them over to HoloLens and view them in 3D Viewer there”

Having passed on this great advice, I thought I’d better try it out myself and, like much of my best advice, it didn’t actually work Winking smile

Here’s why it doesn’t work. I won’t use the actual models in this blog post so let’s assume that it was this model from Remix3D;

image

Now, I can open that model in Paint3D or 3D Viewer, both of which are free and built-in on Windows 10 and I can get a view something like this one;

image

which tells me that this model is 68,000 polygons so it’s not a tiny model but it’s not a particularly big one either and I’d expect that it would display fine on a mobile device which might not be the case if it was 10x or 100x times as big.

Now, knowing that there’s an application on my PC called “3D Viewer” and knowing that there’s one on my HoloLens called “3D Viewer” might lead me to believe that they are the same application with same set of capabilities and so I might just expect to be able to move to the HoloLens, run the Mixed Reality Viewer application and open the same model there.

But I can’t.

3D Viewer on PC

If you run up the 3D Viewer on a PC then you get an app which runs in a Window and which displays a 3D model with a whole range of options including being able to control how the model is rendered, interacting with animations, changing the lighting and so on;

image

The application lets you easily load up model files from the file system or from the Remix3D site;

image

You can also use this application to “insert” the model into the real-world via a “Mixed Reality” mode as below;

image

I’d say that (for me) this is very much at the “Augmented Reality” end of the spectrum in that while the model here might look like it’s sitting on my monitor, I can actually place it in mid-air so I’m not sure that it’s really identifying planes for the model to sit on. I can pick up my laptop and wander around the model and that works to some extent although I find it fairly easy to confuse the app.

One other thing that I’d say in passing is that I have no knowledge around how this application offers this experience or how a developer would build a similar experience – I’m unaware of any platform APIs that help you build this type of thing for a PC using a regular webcam in this way.

3D Viewer on HoloLens

3D Viewer on HoloLens also runs in a window as you can see here;

image

and you can also open up files from the file system or from the Remix3D site or from a pre-selected list of “Holograms” which line up with the content that used to be available in the original “Holograms” app going all the way back to when the device was first made available.

The (understandable) difference here is that when you open a model, it is not displayed as a 3D object inside of the application’s Window as that would be a bit lame on a HoloLens device.

Instead, the model is added to the HoloLens shell as shown below;

image

This makes sense and it’s very cool but on the one hand it’s not really an immersive viewing application – it’s a 2D application which is invoking the shell to display a 3D object.

As an aside, it’s easy to ask the Shell to display a 3D object using a URI scheme and I wrote about that here a while ago and I suspect (i.e. I don’t know) that this is what the 3D Viewer application is doing here;

Placing 3D Models in the Mixed Reality Home

The other aspect of this is that 3D models displayed by the Shell have requirements;

Create 3D models for use in the home

and so you can’t just display an arbitrary model here and I tend to find that most models that I try and use in this way don’t work.

For example, if we go back to the model of a Surface Book 2 that I displayed in 3D Viewer on my PC then I can easily copy that model across to my HoloLens using the built-in “Media Transfer Protocol” support which lets me just see the device’s storage in Explorer once I’ve connected it via USB and then open it up in 3D Viewer where I see;

image

and so I find that regardless of their polygon count most, general models, don’t open within the 3D Viewer on HoloLens – they tend to display this message instead and that’s understandable given that the application is trying to;

  • do the right thing by not having the user open up huge models that won’t then render well
  • integrate the models into the Shell experience which has requirements that presumably can’t just be ignored.

So, if you want a simple viewer which just displays an arbitrary model in an immersive setting then 3D Viewer isn’t quite so general purpose.

This left me stuck with my colleague who wanted something simple to display his models and so I made the classic mistake.

I said “I’ll write one for you” Winking smile

This Does Not Solve the Large/Complex Models Problem

I should stress that me setting off to write a simple, custom viewer is never going to solve the problem of displaying large, complex 3D models on a mobile device like a HoloLens and, typically, you need to think about some strategy for dealing with that type of complexity on a mobile device. There are guidelines around this type of thing here;

Performance recommendations for HoloLens apps

and there are tools/services out there to help with this type of thing including tools like;

My colleague originally provided me with a 40K polygon model and a 500K polygon model.

I left the 40K model alone and used 3DS Max to optimise the 500K poly model down to around 100K which rendered fine for me on HoloLens through the application that I ended up building.

It took a bit of experimentation in the different tools to find the right way to go about it as some tools failed to load the models, others produced results that didn’t look great, etc. but it didn’t take too long to decimate the larger one.

Building glTF Viewer Version 1.0

So, to help out with the promise I’d made to my colleague, I built a simple app. It’s in the Windows Store over here and the source for it is on Github over here.

It’s currently heading towards Version 2.0 when I merge the branches back together and get the Store submission done.

For version 1.0, what I wanted was something that would allow a user to;

  • open a 3D file in .GLB/.GLTF format from their HoloLens storage.
  • display the 3D model from it.
  • manipulate it by scaling, rotating and translating.
  • have as little UI as possible and drive any needed interactions through speech.

and that was pretty much all that I wanted – I wanted to keep it very simple and as part of that I decided to deliberately avoid;

  • anything to do with other 3D model file formats but was, instead, quite happy to assume that people would find conversion tools (e.g. Paint3D, 3D Builder, etc) that could generate single file (.GLB) or multi-file (.GLTF) model files for them to import.
  • any attempt to open up files from cloud locations via OneDrive etc.

With that in mind, I set about trying to build out a new Unity-based application and I made a couple of quick choices;

  • that I would use the Mixed Reality Toolkit for Unity and I chose to use the current version of the Toolkit rather than the vNext toolkit as that’s still “work in progress” although I plan to port at a later point.
    • this meant that I could follow guidance and use the LTS release of Unity – i.e. a 2017.4.* version which is meant to work nicely with the toolkit.
  • that I would use UnityGLTF as a way of reading GLTF files inside of Unity.
  • that I would use sub-modules in git as a way of bringing those two repos into my project as described by my friend Martin over here.

I also made a choice that I would use standard file dialogs for opening up files within my application. This might seem like an obvious choice but those dialogs only really work nicely once your HoloLens is running on the “Redstone 5” version of Windows 10 as documented here;

Mixed Reality Release Notes – Current Release Notes

and so I was limiting myself to only running on devices that are up-to-date but I don’t think that’s a big deal for HoloLens users.

In terms of how the application is put together, it’s a fairly simple Unity application using only a couple of features from the Mixed Reality Toolkit beyond the base support for cameras, input etc.

Generally, beyond a few small snags with Unity when it came to generating the right set of assets for the Windows Store I got that application built pretty quickly & submitted it to the Store.

However, I did hit a few small challenges…

A Small Challenge with UnityGLTF

I did hit a big of a snag because the Mixed Reality Toolkit makes some use of pieces from a specific version of UnityGLTF to provide functionality which loads the Windows Mixed Reality controller models when running on an immersive headset.

UnityGLTF (scripts and binaries) in the Mixed Reality Toolkit

I wanted to be able to bring all of UnityGLTF (a later version) into my project alongside the Mixed Reality Toolkit and so that caused problems because both scripts & binaries would be duplicated and Unity wasn’t very happy about that Smile

I wrote a little ‘setup’ script to remove the GLTF folder from the Mixed Reality Toolkit which was ok except it left me with a single script named MotionControllerVisualizer.cs which wouldn’t build because it had a dependency on UnityGLTF methods that were no longer part of the Unity GLTF code-base (i.e. I happened to have the piece of code which seemed to have an out-of-date dependency).

That was a little tricky for me to fix so I got rid of that script too and fixed up the scripts that took a dependency on it by adding my own, mock implementation of that class into my project knowing that nothing in my project was ever going to display a motion controller anyway.

It’s all a bit “hacky” but it got me to the point where I could combine the MRTK and UnityGLTF in one place and build out what I wanted.

A Small Challenge with Async/Await and CoRoutines

One other small challenge that I hit while putting together my version 1.0 application is the mixing of the C# async/await model with Unity’s CoRoutines.

I’ve hit this before and I fully understand where Unity has come from in terms of using CoRoutines but it still bites me in places and, specifically, it bit me a little here in that I had code which was using routines within the UnityGLTF which are CoRoutine based and I needed to get more information around;

  • when that code completed
  • what exceptions (if any) got thrown by that code

There’s a lot of posts out there on the web around this area including these examples;

and in my specific case I had to write some extra code to try and glue together running a CoRoutine, catching exceptions from it and tying it into async/await but it wasn’t too challenging, it just felt like “extra work” that I’m sure in later years won’t have to be done as these two models get better aligned. Ironically, this situation was possibly more clear-cut when async/await weren’t really available to use inside of Unity’s scripts.

Another Small Challenge with CoRoutines & Unity’s Threading Model

Another small challenge here is that the UnityGLTF code which loads a model needs to, naturally, create GameObjects and other UI constructs inside of Unity which aren’t aren’t thread-safe and have affinity to the UI thread. So, there’s no real opportunity to run this potentially expensive CoRoutine on some background thread but, rather, it hogs the UI thread a bit while it’s loading and creating GameObjects.

I don’t think that’s radically different from other UI frameworks but I did contemplate trying to abstract out the creation of the UI objects so as to defer it until same later point when it could all be done in one go but I haven’t attempted to do that and so, currently, while the GLTF loading is happening my UI is displaying a progress wheel which can miss a few updates Sad smile

Building glTF Viewer Version 2.0

Having produced my little Version 1.0 app and submitted it to the Store, the one thing that I really wanted to add was the support for a “shared holographic experience” such that multiple users could see the same model in the same physical place. It’s a common thing to want to do with HoloLens and it seems to be found more in large, complex, enterprise apps than in just simple, free tools from the Store and so I thought I would try and rectify that a little.

In doing so, I wanted to try and keep any network “infrastructure” as minimal as possible and so I went with the following assumptions.

  • that the devices that wanted to share a hologram were in the same space on the same network and that network would allow multicast packets.
  • sharing is assumed in the sense that the experience would automatically share holograms rather than the user having to take some extra steps.
  • that not all the devices would necessarily have the files for the models that are loaded on the other devices.
  • that there would be no server or cloud connectivity required.

The way in which I implemented this centres around a HoloLens running the glTF Viewer app acting as a basic web server which serves content out of its 3D Objects folder such that other devices can request that content and copy it into their own 3D Objects folder.

The app then operates as below to enable sharing;

  • When a model is opened on a device
    • The model is given a unique ID.
    • A list of all the files involved in the model is collected (as GLTF models can be packaged as many files) as the model is opened.
    • A file is written to the 3D Objects folder storing a relative URI for each of these files to be obtained remotely by another device.
    • A spatial anchor for the model is exported into another file stored in the 3D Objects folder.
    • A UDP message is multi-casted to announce that a new model (with an ID) is now available from a device (with an IP address).
    • The model is made so that it can be manipulated (scale, rotate, translate) and those manipulations (relative to the parent) are multi-cast over the network with the model identifier attached to them.
  • When a UDP message announcing a new model is received on a device
    • The device asks the user whether they want to access that model.
    • The device does web requests to the originating device asking for the URIs for all the files involved in that model.
    • The device downloads (if necessary) each model file to the same location in its 3D Objects folder.
    • The device downloads the spatial anchor file.
    • The device displays the model from its own local storage & attaches the spatial anchor to place it in the same position in the real world.
    • The model is made so that it cannot be manipulated but, instead, picks up any UDP multicasts with update transformations and applies them to the model (relative to its parent which is anchored).

and that’s pretty much it.

This is all predicated on the idea that I can have a HoloLens application which is acting as a web server and I had in mind that this should be fairly easy because UWP applications (from 16299+) now support .NET Standard 2.0 and HttpListener is part of .NET Standard 2.0 and so I could see no real challenge with using that type inside of my application as I’d written about here;

UWP and .NET Standard 2.0–Remembering the ‘Forgotten’ APIs 🙂

but there were a few challenges that I met with along the way.

Challenge Number 1 – Picking up .NET Standard 2.0

I should say that I’m long past the point of being worried about being seen to not understand something and am more at the point of realising that I don’t really understand anything  Smile

I absolutely did not understand the ramifications of wanting to modify my existing Unity project to start making use of HttpListener Smile

Fairly early on, I came to a conclusion that I wasn’t going to be able to use HttpListener inside of a Unity 2017.4.* project.

Generally, the way in which I’ve been developing in Unity for HoloLens runs something like this;

  • I am building for the UWP so that’s my platform.
  • I use the .NET scripting backend.
  • I write code in the editor and I hide quite a lot of code from the editor behind ENABLE_WINMD_SUPPORT conditional compilation because the editor runs on Mono and it doesn’t understand the UWP API surface.
  • I press the build button in Unity to generate a C#/.NET project in Visual Studio.
  • I build that project and can then use it to deploy, debug my C#/UWP application and generate store packages and so on.

It’s fairly simple and, while it takes longer than just working in Visual Studio, you get used to it over time.

One thing that I haven’t really paid attention to as part of that process is that even if I select the very latest Windows SDK in Unity as below;

image

then the Visual Studio project that Unity generates doesn’t pick up the latest .NET packages but, instead, seems to downgrade my .NET version as below;

image

I’d struggled with this before (in this post under the “Package Downgrade Issue”) without really understanding it but I think I came to a better understanding of this as part of trying to get HttpListener into my project here.

In bringing in HttpListener, I hit build problems and I instantly assumed that I needed to upgrade Unity because Unity 2017.* does not offer .NET Standard 2.0 as an API Compatibility Level as below;

image

and I’d assumed that I’d need to move to a Unity 2018.* version in order to pick up .NET Standard 2.0 as I’d seen that Unity 2018.* had started to support .NET Standard 2.0.

Updated scripting runtime in Unity 2018.1: What does the future hold?

and so needing to pick up a Unity 2018.* version and switch in there to use .NET Standard 2.0 didn’t surprise me and so I got version 2018.2.16f1 and I opened up my project in there and switched to .NET Standard 2.0 and that seemed like a fine thing to do;

image

but it left me with esoteric build failures as I hadn’t realised that Unity’s deprecation of the .NET Scripting Backend as per this post;

Deprecation of support for the .Net Scripting backend used by the Universal Windows Platform

had a specific impact in that it meant that new things which came along like SDK 16299 with its support for .NET Standard 2.0 didn’t get implemented in the .NET Scripting Backend for Unity.

They are only present in the IL2CPP backend and I presume that’s why my generated .NET projects have been downgrading the .NET package used.

So, if you want .NET Standard 2.0 then you need SDK 16299+ and that dictates Unity 2018.+ and that dictates moving to the IL2CPP backend rather than the .NET backend.

I verified this over here by asking Unity about it;

2018.2.16f1, UWP, .NET Scripting Backend, .NET Standard 2.0 Build Errors

and that confirms that the .NET Standard 2.0 APIs are usable from the editor and from the IL2CPP back-end but they aren’t going to work if you’re using .NET Scripting Backend.

I did try. I hid my .NET code in libraries and referenced them but, much like the helpful person told me on the forum – “that didn’t work”.

Challenge Number 2 – Building and Debugging with IL2CPP on UWP/HoloLens

Switching to the IL2CPP back-end really changed my workflow around Unity. Specifically, it emphasised that I need to spend as much time in the editor because I find that the two phases of;

  • building inside of the Unity editor
  • building the C++ project generated by the Unity editor

is a much lengthier process than doing the same thing on the .NET backend and Unity has an article about trying to improve this;

Optimizing IL2CPP build times

but I didn’t really find that I could get my build times to come down much and I’d find that maybe a one-line change could take me into a 20m+ build cycle.

The other switch in my workflow was around debugging. There are a couple of options here. It’s possible to debug the generated C++ code and Unity has an article on it here;

Universal Windows Platform: Debugging on IL2CPP Scripting Backend

but I’d have to say that it’s pretty unproductive trying to find the right piece of code and then step your way through generated C++ which looks like;

but you can do it and I’ve had some success with it and one aspect of it is “easy” in that you just open the project, point it at a HoloLens/emulator for deployment & then press F5 and it works.

The other approach is to debug the .NET code because Unity does have support for this as per this thread;

About IL2CPP Managed Debugger

and the details are given again in this article;

Universal Windows Platform: Debugging on IL2CPP Scripting Backend

although I would pay very close attention to the settings that control this as below;

image

and I’d also pay very close attention to the capabilities that your application must have in order to operate as a debuggee. I had to question how to get this working on the Unity Forums;

Unity 2018.2.16f1, UWP, IL2CPP, HoloLens RS5 and Managed Debugging Problems

but I did get it to work pretty reliably on HoloLens in the end but I’d flag a few things that I found;

  • sometimes the debugger wouldn’t attach to my app & I’d have to restart the app. It would be listed as a target in Unity’s “Attach To” dialog in Visual Studio but attaching just did nothing.
  • that the debugger can be very slow – sometimes I’d wait a long time for breakpoints to become active.
  • that the debugger quite often seems to step into places where it can’t figure out the stack frame. Pressing F10 seemed to fix that.
  • that the debugger’s step-over/step-into sometimes didn’t seem to work.
  • that the debugger’s handling of async/await code could be a bit odd – the instruction pointer would jump around in Visual Studio as though it had got lost but the code seemed to be working.
  • that hovering over variables and putting them into the watch windows was quite hit-and-miss.
  • that evaluating arbitrary .NET code in the debugger doesn’t seem to work (I’m not really surprised).
  • breaking on exceptions isn’t a feature as far as I can tell – I think the debugger tells you so as you attach but I’m quite a fan on stopping on first-chance exceptions as a way of seeing what code is doing.

I think that Unity is working on all of this and I’ve found them to be great in responding on their forums and on Twitter, it’s very impressive.

In my workflow, I tended to use both the native debugger & the managed debugger to try and diagnose problems.

One other thing that I did find – I had some differences in behaviour between my app when I built it with “script debugging” and when I didn’t. It didn’t affect me too much but it did lower my overall level of confidence in the process.

Putting that to one side, I’d found that I could move my existing V1.0 project into Unity 2018.* and change the backend from .NET to IL2CPP and I could then make use of types like HttpListener and build and debug.

However, I found that the code stopped working Smile

Challenge 3 – File APIs Change with .NET Standard 2.0 on UWP

I hadn’t quite seen this one coming. There’s a piece of code within UnityGLTF which loads files;

FileLoader.cs

In my app, I open a file dialog, have the user select a file (which might result in loading 1 or many files depending on whether this is a single-file or multi-file model) and it runs through a variant of this FileLoader code.

That code uses File.Exists() and File.OpenRead() and, suddenly, I found that the code was no longer working for files which did exist and which my UWP app did have access to.

It’s important to note that the file in question would be a brokered file for the UWP app (i.e. one which it accesses via a broker to ensure it has the right permissions) rather than just say a file within the app’s own package or it’s own dedicated storage. In particular, my file would reside within the 3D Objects folder.

How could that break? It comes back to .NET Standard 2.0 because these types of File.* functions work differently for UWP brokered files depending on whether you are on SDK 16299+ with .NET Standard 2.0 or on an earlier SDK before .NET Standard 2.0 came along.

The thorny details of that are covered in this forum post;

File IO operations not working when broadFileSystemAccess capability declared

which gives some of the detail but, essentially, for my use case File.Exists and File.OpenRead were now causing me problems and so I had to replace some of that code which brings me back to…

Challenge 4 – Back to CoRoutines, Enumerators and Async

As I flagged earlier, mixing and matching an async model based around CoRoutines in Unity (which is mostly AFAIK about asynchronous rather than concurrent code) with one based around Tasks can be a bit of a challenge.

With the breaking change to File.OpenRead(), I had to revisit the FileLoader code and modify it such that it still presented an IEnumerator-based pattern to the rest of the UnityGLTF code while, internally, it needed to move from using the synchronous File.OpenRead() to the asynchronous StorageFile.OpenReadAsync().

It’s not code that I’m particularly proud of and wouldn’t like to highlight it here but it felt like one of those situations where I got boxed into a corner and had to make the best of what I had to work with Smile

Challenge 5 – ProgressRings in the Mixed Reality Toolkit

I’m embarrassed to admit that I spent a lot longer trying to get a ProgressRing from the Mixed Reality Toolkit to work than I should have.

I’ve used it before, there’s an example over here;

Progress Example

but could I get it to show up? No.

In the end, I decided that there was something broken in the prefab that makes up the progress ring and I switched from using the Solver Radial View to using the Solver Orbital script to manage how the progress ring moves around in front of the user & that seemed to largely get rid of my problems.

Partially, this was a challenge because I hit it at the time when I was struggling to get used to my new mode of debugging and I just couldn’t get this ring to show up.

In the end, I solved it by just making a test scene and watching how that behaved in the editor at runtime before applying that back to my real scene which is quite often how I seem to solve these types of problems in Unity.

Challenge 6 – UDP Multicasting on Emulators and HoloLens

I chose to use UDP multicasting as a way for one device to notify others on the same network that it had a new model for them to potentially share.

This seemed like a reasonable choice but it can make it challenging to debug as I have a single HoloLens and have never been sure whether a HoloLens emulator can/can’t participate in UDP multicasting or whether there’s any settings that can be applied to the virtual machine to make that work.

I know when I wrote this post that I’d failed to get multicasting working on the emulator and this time around I tried a few combinations before giving up and writing a test-harness for my PC to act as a ‘mock’ HoloLens from the point of view of being able to generate/record/playback messages it received from the real HoloLens.

I’ve noticed over time a number of forum posts asking whether a HoloLens can receive UDP traffic at all such as;

and there are more.

I can certainly verify that a UWP app on HoloLens can send/receive UDP multicast traffic but I’d flag that I have seen situations where my current device (running RS5) has got into a situation where UDP traffic seems to fail to be delivered into my application until I reboot the device. I’ve seen it very occasionally but more than once so I’d flag that this can happen on the current bits & might be worth bearing in mind for anyone trying to debug similar code on similar versions.

Closing Off

I learned quite a lot in putting this little test application together – enough to think it was worth opening up my blog and writing down some of the links so that I (or others) can find them again in the future.

If you’ve landed here via search or have read the whole thing ( ! ) then I hope you found something useful.

I’m not sure yet whether this one-off-post is the start of me increasing the frequency of posting here so don’t be too surprised if this blog goes quiet again for a while but do feel very free to reach out if I can help around these types of topics and, of course, feel equally free to point out where I’ve made mistakes & I’ll attempt to fix them Smile

Update – One Last Thing (Challenge 7), FileOpenPicker, Suspend/Resume and SpeechRecognizer

Finding a Suspend/Resume Problem with Speech

I’d closed off this blog post and published it to my blog and I’d shipped version 2.0 of my app to the Store when I came across an extra “challenge” in that I noticed that my voice commands seemed to be working only part of the time and, given that the app is driven by voice commands, that seemed like a bit of a problem.

It took me a little while to figure out what was going on because I took the app from the Store and installed it and opened up a model using the “open” command and all was fine but then I noticed that I couldn’t then use the “open” command for a second time or the “reset” command for a first time.

Naturally, I dusted the code back off and rebuilt it in debug mode and tried it out and it worked fine.

So, I rebuilt in release mode and I got mixed results in finding that sometimes things worked and other times they didn’t and it took me a while to realise that it was the debugger which was making the difference. With the debugger attached, everything worked as I expected but when running outside of the debugger, I would find that the voice commands would only work until the FileOpenPicker had been on the screen for the first time. Once that dialog had been on the screen the voice commands no longer worked and that was true whether a file had been selected or whether the dialog had simply been cancelled.

So, what’s going on? Why would putting a file dialog onto the screen cause the application’s voice commands to break and only when the application was not running under a debugger?

The assumption that I made was that the application was suffering from a suspend/resume problem and that the opening of the file dialog was causing my application to suspend (and somehow break its voice commands) before choosing a file such that when my application resumed the voice commands were broken.

Why would my app suspend/resume just to display a file picker? I’d noticed previously that there is a file dialog process running on HoloLens so perhaps it’s fair to assume/guess that opening a file involves switching to another app altogether and, naturally, that might mean that my application suspends during that process.

I remember that this was also possible under phone implementations and (if I remember correctly) the separate-process model on phones was the reason why the UWP ended up with AndContinue() style APIs in the early days when the phone and PC platforms were being unified.

Taking that assumption further – it’s well known that when you are debugging a UWP app in Visual Studio the “Process Lifecycle Management” (PLM) events are disabled by the debugger. That’s covered in the docs here and so I could understand why my app might be working in the debugger and not working outside of the debugger.

That said, I did find that my app still worked when I manually used the debugger’s capability to suspend/resume (via the toolbar) which was a bit of a surprise as I expected it to break but I was fairly convinced by now that that my problem was due to suspend/resume.

So, it seems like I have a suspend/resume problem. What to do about it?

Resolving the Suspend/Resume Problem with Speech

My original code was using speech services provided by the Mixed Reality Toolkit’s SpeechInputSource.cs and SpeechInputHandler.cs utilities and I tried quite a few experiments around enabling/disabling these around suspend/resume events from the system but I didn’t find a recipe that made them work.

I took away my use of that part of the MRTK and started directly using SpeechRecognizer myself so that I had more control of the code & I kept that code as minimal as possible.

I still hit problems. My code was organised around spinning up a single SpeechRecognizer instance, keeping hold of it and repeatedly asking it via the RecognizeAsync() method to recognise voice commands.

I would find that this code would work fine until the process had suspended/resumed and then it would break. Specifically, the RecognizeAsync() code would return Status values of Success and Confidence values of Rejected.

So, it seemed that having a SpeechRecognizer kicking around across suspend/resume cycles wasn’t the best strategy and I moved to an implementation which takes the following approach;

  • instantiate SpeechRecognizer
  • add to its Constraints collection an instance of SpeechRecognitionListConstraint
  • compile the constraints via CompileConstraintsAsync
  • call RecognizeAsync making a note of the Text result if the API returns Success and confidence is Medium/High
  • Dispose of the SpeechRecognizer and repeat regardless of whether RecognizeAsync returns a relevant value or not

and the key point seemed to be to avoid keeping a SpeechRecognizer instance around in memory and repeatedly calling RecognizeAsync on it expecting that it would continue to work across suspend/resume cycles.

I tried that out, it seems to work & I shipped it off into Store as a V3.0.

I have to admit that it didn’t feel like a very scientific approach to getting something to work – it was more trial and error so if someone has more detail here I’d welcome it but, for the moment, it’s what I settled on.

One last point…

Debugging this Scenario

One interesting part of trying to diagnose this problem was that I found the Unity debugger to be quite helpful.

I found that I could do a “script debugging” build from within Unity and then run that up on my device. I could then use my first speech command to open/cancel the file picker dialog before attaching Unity’s script debugger to that running instance in order to take a look around the C# code and see how opening/cancelling the file dialog had impacted my code that was trying to handle speech.

In some fashion, I felt like I was then debugging the app (via Unity) without really debugging the app (via Visual Studio). It could be a false impression but, ultimately, I think I got it working via this route Smile