More Scene Understanding – The Largest Table in the Room…

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens or Azure Mixed Reality other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should alwaysย check out the official developer siteย for the product documentation.

This is purely experimental but after writing yesterday’s post I was thinking about how this notion of scene understanding can potentially do all kinds of things for your application and so I wanted to experiment a little more.

Returning to ‘Spatial Understanding’

As an example – one of the earliest pieces of ‘magic’ that I saw on the original HoloLens device was in the application ‘Fragments’ where there are scenes that make use of the furniture within a room. For example, in the scene below the application ‘magically’ seats a character on a real-world sofa or chair;

Fragments by Asobo with characters using in-room furniture

Now, a few points about this;

  • The ‘magic’ part here is that the user of the application does not have to tell the application that there is a sofa or anything like that. The application works it out on its own.
  • ‘Working this out’ means operating on the spatial mesh from the HoloLens to determine ‘surfaces’ or ‘planes’.
  • That sort of functionality (‘spatial understanding‘) was open sourced in a library and many developers used it in their applications.

but, most importantly, the library in question was working in software on the device & required a ‘scanning’ phase where you first showed it the space that you were going to work in. Any time you see this library in use you’ll see a ‘setup phase’ in order to bootstrap the library’s algorithms.

So, while the device had an innate ability to understand the space and create a mesh from it, the ability to work at the higher level abstraction of ‘surfaces’, ‘planes’, ‘walls’ etc. was left to the application developer and this library was often used to do the heavy lifting.

From the user’s point of view, this can take a minute or two to walk a space and be given enough direction to ensure that enough space is scanned for an application to operate.

But, nonetheless, that ‘magic’ moment is very much there when you realise that the device has figured out your space to the extent that it can put holograms onto your walls, floor and maybe table & chairs ๐Ÿ™‚

Scene Understanding

Going back to the ‘Scene Understanding SDK’ that I experimented with in yesterday’s post, things are different .

As the docs clearly state, there’s a ‘scene understanding runtime’ already on the device;

Scene Understanding Runtime on the device

and there is a cost to invoking this and having it try to turn spatial mesh into ‘scene objects’;

The process of converting the raw sensor data into a Scene is a potentially expensive operation that could take seconds for medium spaces (~10x10m) to minutes for very large spaces (~50x50m) and therefore it is not something that is being computed by the device without application request

a fundamental difference to me is the statement that;

On the left hand side is a diagram of the mixed reality runtime which is always on and running in its own process. This runtime is responsible for performing device tracking, surface reconstruction, and other operations that Scene Understanding uses to understand and reason about the world around you

So, there’s a runtime which is always running and which has access to the sensor data and so there is no need to have the user bootstrap these scene understanding algorithms by artificially asking them to map out the room as they would have done with the older ‘spatial understanding’ mechanism.

Thus, this understanding of space in terms of ‘surfaces’, ‘planes’, ‘walls’, ‘platforms’ becomes an an innate ability of the device & the developer consumes the data rather than having to rely on their own or 3rd party algorithms.

Having said that, it’s worth considering that the device can’t reason about things that it hasn’t ever seen – e.g. if you take a device and turn it on for the first time in a room where all surfaces are 10m away then the device isn’t going to be able to do much about finding walls and tables until it has taken a look around.

That’s an extreme case though – for regular use, you’d imagine that where the user has entered a room, put on a device and then got to the point of running an app it’s likely that the device has already seen a reasonable portion of that room.

The Largest Table…

I wanted to try out a little of this ‘magic’ for myself and the simple scenario that I conjured up was the idea that the device could initially place a piece of content for the user in the centre of the largest table in the room.

I’ve seen lots of Mixed Reality applications where a piece of holographic content (e.g. interactive map, architectural model, car engine model, etc) is being viewed in a business setting like a meeting room and the first step in that application is to;

Place the content!

and it would be ‘nice’ if the application could perhaps make a highly educated guess by automatically placing the content on the big, obvious meeting room table that’s in the centre of the room;

The content goes in the middle of the table, right?

Naturally, an application would get additional bonus points for allowing the user to re-position the content if they didn’t want it in the middle of the table and an application could get double-bonus-points for anchoring the content for stability and super-double-bonus-points if that anchor was persisted such that the application could remember where the content was placed for next time around (perhaps x-device using Azure Spatial Anchors).

So, what would it look like to sketch out the basics of placing a hologram in the centre of the largest table of the room? I went back to my project from yesterday to add another scene and try it out.

Adding Another Scene

I made a quick branch, added another scene to my project and renamed the original scene so that anyone who looked at these 2 blog posts in the future might have a clue what I was doing;

Scenes added to project

and then I’m not worried about multi-scenes here as I’m only ever going to build one scene at a time so I moved to that second scene and added in the MixedRealityToolkit and fed it the DefaultHoloLens2ConfigurationProfile for now;

Adding the Mixed Reality Toolkit

I wanted some ‘content’ so I went off and found a 3D model of an office from the soon-to-be-closing-down remix3D, brought it into Paint3D, saved without a canvas as FBX and imported it into Unity where I scaled it, applied the legacy material option & then made a prefab out of it at the origin as below;

Office prefab in Unity scene

I then applied the toolkit scripts ManipulationHandler, NearInteractionGrabbable, BoundingBox to my prefab instance;

Manipulation scripts on office block

and gave that a quick try out in the editor to ensure that I could translate and rotate the office block (I could). I didn’t try and tune any settings here, this is all defaults.

Testing out interactions with the office block in the editor

Ok, I’ve got a scene with an office block positioned at the origin and I can then move that office block. How about positioning it at the middle of the largest table in the room by default?

I have no idea how to do that…but I don’t see why that should get in the way of experimenting…

Adding a Simple ‘Largest Platform’ Behaviour

I cooked up a quick MonoBehaviour and added it to my office block object as below;

Adding a positioning behaviour to my office building

Naturally, this could offer a tonne of configurable parameters rather than just being ‘largest platform’ but I’m just experimenting here. That behaviour looks like this;

using Microsoft.MixedReality.Toolkit.Utilities;
using UnityEngine;

public class LargePlatformPositioningBehaviour : MonoBehaviour
{
    // Naturally, this could have parameters for things like;
    // 1) the type of object to look for (wall, platform, etc)
    // 2) the search radius
    // 3) UI to display while positioning
    // 4) UI to display if positioning can't be done
    // 5) the minimum size of the object to look for
    // etc. etc.
    // All I've done so far is to assume no UI and 'position on the largest platform'
    // and that implementation is sketchy.
    async void Update()
    {        
        if (!this.positionAttempted)
        {
            this.positionAttempted = true;

#if ENABLE_WINMD_SUPPORT

            var canCompute = await SceneUnderstandingHelper.CanComputeAsync();

            if (canCompute)
            {
                var parent = await SceneUnderstandingHelper.ParentGameObjectOnLargestPlatformAsync(this.gameObject);

                // Not yet sure whether I should be checking the orientation of the platform
                // that we have found here and then rotating based upon it but, for the moment
                // I'm going to say that this model should face the user and should not be rotated
                // around x,z so that it (hopefully) sits flat on the platform in question.
                var lookPos = CameraCache.Main.transform.position;
                lookPos.y = this.gameObject.transform.position.y;

                this.gameObject.transform.LookAt(lookPos);
            }
#endif
        }
    }
    bool positionAttempted = false;
}

and that then relies on a SceneUnderstandingHelper class which I wrote as below to bring together a few of the fragments of code that I had in the previous blog post;

using NumericsConversion;
using System;
using System.Threading.Tasks;
using UnityEngine;

#if ENABLE_WINMD_SUPPORT
using Microsoft.MixedReality.SceneUnderstanding;


internal static class SceneUnderstandingHelper 
{
    internal async static Task<bool> CanComputeAsync()
    {
        if (!canCompute.HasValue)
        {
            canCompute = SceneObserver.IsSupported();

            if ((bool)canCompute)
            {
                var access = await SceneObserver.RequestAccessAsync();

                canCompute = access == SceneObserverAccessStatus.Allowed;
            }
        }
        return ((bool)canCompute);
    }
    internal async static Task<GameObject> ParentGameObjectOnLargestPlatformAsync(GameObject gameObject,
        float searchRadius = 3.0f)
    {
        GameObject parent = null;

        var querySettings = new SceneQuerySettings()
        {
            EnableWorldMesh = false,
            EnableSceneObjectQuads = true,
            EnableSceneObjectMeshes = false,
            EnableOnlyObservedSceneObjects = false
        };
        var scene = await SceneObserver.ComputeAsync(querySettings, searchRadius);

        if (scene != null)
        {
            // Note - we are taking the position of the 'largest' (by area) scene object
            // of type platform here by looking at the quads that make it up.
            // We might need to, instead, query those quads & find their centre position
            // via the FindCentermostPlacement() method and then somehow coalesce those
            // positions to come up with a position instead. i.e. not sure this is at all
            // the 'right' thing to do in terms of coming up with a position.
            var largestPlatform = scene.LargestSceneObjectOfType(SceneObjectKind.Platform);

            if (largestPlatform != null)
            {
                // Where would this platform be in Unity space?
                var unityTransform = scene.GetUnityTransform();

                if (unityTransform.HasValue)
                {
                    parent = new GameObject();

                    parent.transform.SetPositionAndRotation(
                        unityTransform.Value.GetColumn(3), unityTransform.Value.rotation);

                    gameObject.transform.SetParent(parent.transform, false);

                    gameObject.transform.localPosition = largestPlatform.Position.ToUnity();
                    gameObject.transform.localRotation = largestPlatform.Orientation.ToUnity();
                }
            }
        }
        return (parent);
    }
    static bool? canCompute = null;
}
#endif

and, in turn, that relies on a few extension methods that I added to some of the Scene Understanding SDK classes;

using System.Collections.Generic;
using UnityEngine;
using System.Linq;
using System.Runtime.InteropServices;
using NumericsConversion;

#if ENABLE_WINMD_SUPPORT

using Microsoft.MixedReality.SceneUnderstanding;
using Windows.Perception.Spatial.Preview;
using Windows.Perception.Spatial;
using UnityEngine.XR.WSA;

internal static class SceneUnderstandingExtensions
{
    internal static IEnumerable<SceneObject> SceneObjectsOfType(this Scene scene, SceneObjectKind kind)
    {
        return (scene.SceneObjects.Where(so => so.Kind == kind));
    }
    internal static float Area(this SceneQuad sceneQuad)
    {
        return (sceneQuad.Extents.X * sceneQuad.Extents.Y);
    }
    internal static float QuadArea(this SceneObject sceneObject)
    {
        return (sceneObject.Quads.Sum(q => q.Area()));
    }
    internal static SceneObject LargestSceneObjectOfType(this Scene scene, SceneObjectKind kind)
    {
        var objectsOfKind = scene.SceneObjectsOfType(kind);

        // MaxBy is what I want really. 
        // See https://stackoverflow.com/questions/1101841/how-to-perform-max-on-a-property-of-all-objects-in-a-collection-and-return-th
        return (objectsOfKind.OrderByDescending(s => s.QuadArea()).FirstOrDefault());
    }
    internal static Matrix4x4? GetUnityTransform(this Scene scene)
    {
        Matrix4x4? transform = null;

        var sceneCoordSystem = SpatialGraphInteropPreview.CreateCoordinateSystemForNode(scene.OriginSpatialGraphNodeId);

        var unityCoordSystem =
            (SpatialCoordinateSystem)System.Runtime.InteropServices.Marshal.GetObjectForIUnknown(
                WorldManager.GetNativeISpatialCoordinateSystemPtr());

        var unityTransform = sceneCoordSystem.TryGetTransformTo(unityCoordSystem);

        if (unityTransform.HasValue)
        {
            transform = unityTransform.Value.ToUnity();
        }
        // TODO: Am I supposed to Release() this or not?
        Marshal.ReleaseComObject(unityCoordSystem);

        return (transform);
    }
}
#endif

and that’s pretty much it.

Trying it out in a Small Space

If I try this on a device then, as you’d expect, what initially happens is that the application runs up and loads the model of the office at the origin (0,0,0) – i.e. right on top of my head which is never pleasant ๐Ÿ˜‰ but then it quickly jumps over to place itself on the largest platform in my room.

Here’s a screenshot from how that looks in my small home office;

Office model naturally positioned on largest platform in the space

It’s kind of a small piece of ‘magic’ ๐Ÿ™‚ but it’s far quicker to have the device position this content here in a ‘sensible place’ and then make minor adjustments than have the content just appear (e.g.) 2m in front of my head and then have to adjust it from there.

It still needs a lot of consideration though – e.g. the app will position the model here on this desk even if I’m looking in the opposite direction so there’d need to be some kind of wayfinding UX to tell me where the model had just gone (“it’s behind you!”) and, equally, UX to handle the scenario where the model can’t be found.

It’s perhaps also worth thinking that my code is currently running some very basic (and probably wrong) tests to find the ‘largest platform’ and then assuming that’s a table. It might not be and perhaps some more tests (e.g. size, height, etc.) could be added to try and determine whether it’s likely a table that we’re choosing.

A Form of Spatial Anchor?

At the moment, my positioning logic involves trying to find the ‘largest’ SceneObject of type ‘platform’ within a radius (hard-coded to 3.0m).

In doing that, I try to determine ‘largest’ by figuring out the sums of the areas of the SceneQuad objects that make up any SceneObjects of type ‘platform’ and then choosing the largest.

Once I’ve found that SceneObject I take its position and that’s where I place my hologram, ignoring the orientation that the SceneQuad objects within the SceneObject report. That might not be the right thing to do but I don’t take too much account of the orientation of the SceneObject itself either because I have code which treats it as a flat table-top and rotates the hologram to face the user with no rotation around X, Z.

But, if I simply took that transform returned by the SceneObject, could I use it as a form of anchor that would be stable across application invocations?

Purely from experimentation, that does seem to be the case and it could mean that I could write code which stores hologram positions relative to the position of that ‘anchor’ and have the application be able to restore those holograms just like a native spatial anchor or an Azure Spatial Anchor.

What would be interesting to me in that case would be to see what happens if I moved the table in question because I’d kind of expect the device to still see it as the ‘largest platform’ and so be able to find it again & potentially still restore the holograms relative to the new position of the table.

Whether that’s likely useful as an experience for a user is, of course, another question ๐Ÿ™‚

The other question I’d have is whether this form of anchor could be stable across different devices – i.e. if 2 devices look for the ‘largest platform in the room’ and then swap its transform between them then can they use that to establish a common coordinate system & offer a shared holographic experience based on it? I’ve not even experimented with this, it’s just a thought right now and I’ve no code to test it out in any way.

Speaking of ‘code’…

Code

As per the previous project, the code for this is all here on github and it’s all experimental so please apply a pinch of salt to what you see and, of course, always refer to the official documentation for the definitive view on these things.

Baby Steps with the Scene Understanding SDK

For a while, I’ve had my eye on this documentation page which explains the ‘Scene Understanding SDK’ for HoloLens 2.

Scene Understanding SDK Overview

While a HoloLens device has a rich understanding of the space around it, building a representation as a 3D mesh which the developer can query, sometimes you’d prefer a higher level abstraction.

For example, I’ve seen many applications where the first step is to locate and place some content (map, building, car, etc.) onto a floor, a table or a wall. Having access to the 3D mesh is certainly essential to this type of functionality but there’s likely more work to be done with some kind of processing on that mesh to be able to reason about ‘surfaces’ rather than just polygons.

This is where ‘Scene Understanding’ comes in by enabling you to unlock more capabilities of the device in order to be able to reason in terms of ‘scene objects’ which, at the time of writing include (as the docs say!) objects of type;

Background, Wall, Floor, Ceiling, Platform, World and Unknown

Interestingly, working at this level doesn’t lose you the lower level detail in that it’s possible to use the APIs in the SDK to dig below the ‘scene objects’ into the quads/meshes that they are composed of.

Having read the documentation a couple of times, I wanted to make a simple Unity application to try things out and so I took the Mixed Reality Toolkit for Unity and brought it into a blank project in Unity.

In that project, I simply set up the Toolkit as per the docs and then cloned a few of the input system profiles so as to set up some custom speech commands which correspond to a subset of the ‘scene object’ types as you can see below;

I added a ‘marker’ prefab to show the X,Y,Z axes and dropped one at the origin of my scene so that it will sit right on the user’s head when they run the app (not usually the best idea);

I also added a ‘parent’ or Root object that I can use to parent any other content that gets created in the scene;

This object becomes important because I will align its coordinate system with that of the ‘scene’ that is returned by the ‘Scene Understanding’ APIs such that objects placed at positions & orientations under this parent can take their placement directly from the Scene Understanding APIs.

I made sure that my application had the capabilities to use the microphone and spatial perception and then I added a GameObject with a couple of scripts attached to it.

The first of those is just the standard toolkit Speech Input Handler which, as you can see below, routes the 4 speech commands that I have onto handlers in my script named MyScript;

The second script is the MyScript itself which provides handlers for those 4 calls. It handles them by calling into the ‘Scene Understanding’ APIs trying to find ‘scene objects’ of the type in question (e.g. walls, platforms) and then it tries to display;

  • An instance of the ‘marker’ prefab at the position and orientation of the ‘scene object’
  • An instance of a (flattened) cube at the position and size of each of the quads that makes up that scene object.

and that’s pretty much it. My script then needs access to this parent object, the prefab to create and also a material to texture the cube with and so these are parameters to the script as below;

That’s pretty much it – I built this out and ran it in my small home office and tried out saying the ‘platform’ voice command and the results were really nice ๐Ÿ˜‰ as below;

and, similarly, trying out the keyword ‘ceiling’ seemed to work reasonably too;

The script that’s sitting behind this is very experimental but it’s largely just packaging up the pieces from the documentation and it’s listed below. It seemed to work quite well quite quickly so it’s more than possible that I am benefiting from some kind of ‘false positive’ and perhaps it’s working better in this one location than it does in others but, still, it’s a starting point ๐Ÿ™‚

using System.Collections.Generic;
using UnityEngine;
using System;
using System.Threading.Tasks;
using NumericsConversion;
#if ENABLE_WINMD_SUPPORT
using Microsoft.MixedReality.SceneUnderstanding;
using Windows.Perception.Spatial;
using Windows.Perception.Spatial.Preview;
using UnityEngine.XR.WSA;
#endif

public class MyScript : MonoBehaviour
{
    public GameObject parentObject;
    public GameObject markerPrefab;
    public Material quadMaterial;

    public MyScript()
    {
        this.markers = new List<GameObject>();
        this.quads = new List<GameObject>();
        this.initialised = false;
    }
    void Update()
    {
#if ENABLE_WINMD_SUPPORT
        // TODO: Doing all of this every update right now based on what I saw in the doc 
        // here https://docs.microsoft.com/en-us/windows/mixed-reality/scene-understanding-sdk#dealing-with-transforms
        // but that might be overkill.
        // Additionally, wondering to what extent I should be releasing these COM objects
        // as I've been lazy to date.
        // Hence - apply a pinch of salt to this...
        if (this.lastScene != null)
        {
            var node = this.lastScene.OriginSpatialGraphNodeId;

            var sceneCoordSystem = SpatialGraphInteropPreview.CreateCoordinateSystemForNode(node);

            var unityCoordSystem =
                (SpatialCoordinateSystem)System.Runtime.InteropServices.Marshal.GetObjectForIUnknown(
                    WorldManager.GetNativeISpatialCoordinateSystemPtr());

            var transform = sceneCoordSystem.TryGetTransformTo(unityCoordSystem);

            if (transform.HasValue)
            {
                var sceneToWorldUnity = transform.Value.ToUnity();

                this.parentObject.transform.SetPositionAndRotation(
                    sceneToWorldUnity.GetColumn(3), sceneToWorldUnity.rotation);
            }
        }
#endif
    }
    // These 4 methods are wired to voice commands via MRTK...
    public async void OnWalls()
    {
#if ENABLE_WINMD_SUPPORT
        await this.ComputeAsync(SceneObjectKind.Wall);
#endif
    }
    public async void OnFloor()
    {
#if ENABLE_WINMD_SUPPORT
        await this.ComputeAsync(SceneObjectKind.Floor);
#endif
    }
    public async void OnCeiling()
    {
#if ENABLE_WINMD_SUPPORT
        await this.ComputeAsync(SceneObjectKind.Ceiling);
#endif
    }
    public async void OnPlatform()
    {
#if ENABLE_WINMD_SUPPORT
        await this.ComputeAsync(SceneObjectKind.Platform);
#endif
    }
    void ClearChildren()
    {
        foreach (var child in this.markers)
        {
            Destroy(child);
        }
        foreach (var child in this.quads)
        {
            Destroy(child);
        }
        this.markers.Clear();
        this.quads.Clear();
    }
#if ENABLE_WINMD_SUPPORT
    async Task InitialiseAsync()
    {
        if (!this.initialised)
        {
            if (SceneObserver.IsSupported())
            {
                var access = await SceneObserver.RequestAccessAsync();

                if (access == SceneObserverAccessStatus.Allowed)
                {
                    this.initialised = true;
                }
            }
        }
    }
    async Task ComputeAsync(SceneObjectKind sceneObjectKind)
    {
        this.ClearChildren();

        await this.InitialiseAsync();

        if (this.initialised)
        {
            var querySettings = new SceneQuerySettings()
            {
                EnableWorldMesh = false,
                EnableSceneObjectQuads = true,
                EnableSceneObjectMeshes = false,
                EnableOnlyObservedSceneObjects = false
            };
            this.lastScene = await SceneObserver.ComputeAsync(querySettings, searchRadius);

            if (this.lastScene != null)
            {
                foreach (var sceneObject in this.lastScene.SceneObjects)
                {
                    if (sceneObject.Kind == sceneObjectKind)
                    {
                        var marker = GameObject.Instantiate(this.markerPrefab);

                        marker.transform.SetParent(this.parentObject.transform);

                        marker.transform.localPosition = sceneObject.Position.ToUnity();
                        marker.transform.localRotation = sceneObject.Orientation.ToUnity();

                        this.markers.Add(marker);

                        foreach (var sceneQuad in sceneObject.Quads)
                        {
                            var quad = GameObject.CreatePrimitive(PrimitiveType.Cube);

                            quad.transform.SetParent(marker.transform, false);

                            quad.transform.localScale = new Vector3(
                                sceneQuad.Extents.X, sceneQuad.Extents.Y, 0.025f);

                            quad.GetComponent<Renderer>().material = this.quadMaterial;
                        }
                    }
                }
            }
        }
    }
#endif

#if ENABLE_WINMD_SUPPORT
    Scene lastScene;
#endif

    List<GameObject> markers;
    List<GameObject> quads;
    bool initialised;
    static readonly float searchRadius = 5.0f;
}

I’ve also shared the whole project up here on github if you want to have a play with it yourself – I’m pleased to have got started with this myself & looking to doing some more exploration of what’s possible here.

Simple Shared Holograms with Photon Networking (Part 3)

Following on from the previous post where I’d got to the point where I had an app using Photon which could;

  • Connect to the Photon network
  • Connect to a (hard-coded) Photon room by name
  • Check a property of the room to see if an anchorId had been stored there
    • If so, talk to the Azure Spatial Anchors service, download that anchor and locate a Root game object in my scene with the anchor
    • If not, create an anchor for the game object Root in my scene and store it to the Azure Spatial Anchor service, getting an ID which can then be added as a property of the room
  • Give all users a voice command “cube” to create a cube which is then synchronised with all participants in the room
  • Let all users manipulate cubes so as to translate, scale and rotate them and keep those transforms synchronised across users
  • Ensure that if users left/join the room then, within a timeout, the state of the cubes is preserved

it made sense to allow users to remove cubes from the room and it didn’t seem like it would too difficult to achieve so…

With that in mind, I added a new voice command “Remove” to the profile in the mixed reality toolkit;

and then I want this to only be relevant when the user is focused on a cube and so I added the handler for it onto my cube prefab itself, making sure that focus was required;

and I wired that through to a method which simply called PhotonNetwork.Destroy;

    public void OnRemove()
    {
        this.SetViewIdCustomRoomProperty(null);
        PhotonNetwork.Destroy(this.gameObject);
    }

Because I have this set of custom properties (see the previous post for details) which store a Key:Value pair of ViewID:LastRecordedTransform I also really need to clear out the key for my PhotonView at this point if I am destroying the object. I didn’t seem to see a method on a Photon Room for deleting or clearing a custom property and so I set the value to null as you can see above where the SetViewIdCustomRoomProperty is just a little function that does;

    void SetViewIdCustomRoomProperty(string value)
    {
        PhotonNetwork.CurrentRoom.SetCustomProperties(
            new Hashtable()
            {
                    {  this.ViewIDAsString, value }
            }
        );
    }

and, with that I can delete cubes on one device and see them disappear on another one.

Input Simulation in the Editor & HoloLens 1

A small tip that I’ll pass along at this point is that when working with MRTK V2 but targeting HoloLens V1 I find it useful to switch the input simulation mode in the Unity editor to move it away from ‘articulated hands’ mode to ‘gestures’ mode. You can find that setting here;

without that setting, I find that for my HoloLens 1 device target the editor is getting ahead of itself and behaving a little too much like HoloLens 2 ๐Ÿ˜‰

Representing the User

It’s pretty common in these shared experiences to have a representation of the other users who are a part of the experience. If they are in the same physical room, it’s common to float something above their heads (e.g. a halo or a model of a HoloLens) whereas if they are in a different physical place then it’s common to bring in some level of avatar. I’ll call it a ‘halo’ for both cases.

Technically, that comes down to displaying the ‘halo’ at the position and orientation of the other user’s camera perhaps with an offset to avoid occluding the user.

This feels very much like the same scenario as what I did in synchronising the cubes but with perhaps a couple of differences;

  • the transform of the ‘halo’ does not need to survive the user leaving the room – it leaves the room with the user it represents & so I don’t need to take any steps to preserve it as I did with the cubes.
  • a user may [not] expect to have their own ‘halo’ visible although you can argue how important that is if (e.g.) it’s floating above their head such that they can’t easily see it ๐Ÿ™‚

The easiest way to do this would seem to be to create a ‘halo’ object, make it a child of the main camera in the scene (with an offset) and then synchronise its transform over the network. The only slight challenge in that, is that I would need to take care to synchronise its position relative to the anchored Root object which (unlike my cubes) would not be its parent. That’s because the Root object represents a known place and orientation in the real world.

I found a 3D model of a HoloLens somewhere and made a prefab out of it as below;

I have it configured such that it lives in the Resources folder and I have added both a PhotonView script to it along with a PhotonRelativeTransformView script as you can see in the screenshot above.

What’s a PhotonRelativeTransformView? This is another ‘copy’ of the PhotonTransferView script which I modified to be much simpler in that it takes the name of a GameObject (the relative transform) and then attempts to synchronise just the transform and rotation of the ‘halo’ object relative to this relative transform object as below;

namespace Photon.Pun
{
    using UnityEngine;

    [RequireComponent(typeof(PhotonView))]
    public class PhotonRelativeTransformView : MonoBehaviour, IPunObservable
    {
        [SerializeField]
        string relativeTransformGameObjectName;

        GameObject relativeGameObject;


        Vector3 RelativePosition
        {
            get
            {
                return (this.gameObject.transform.position - this.RelativeGameObject.transform.position);
            }
            set
            {
                this.gameObject.transform.position = this.RelativeGameObject.transform.position + value;
            }
        }
        Quaternion RelativeRotation
        {
            get
            {
                return (Quaternion.Inverse(this.RelativeGameObject.transform.rotation) * this.transform.rotation);
            }
            set
            {
                this.gameObject.transform.rotation = this.RelativeGameObject.transform.rotation;
                this.gameObject.transform.rotation *= value;
            }
        }

        GameObject RelativeGameObject
        {
            get
            {
                if (this.relativeGameObject == null)
                {
                    this.relativeGameObject = GameObject.Find(this.relativeTransformGameObjectName);
                }
                return (this.relativeGameObject);
            }
        }

        public void OnPhotonSerializeView(PhotonStream stream, PhotonMessageInfo info)
        {
            if (stream.IsWriting)
            {
                stream.SendNext(this.RelativePosition);
                stream.SendNext(this.RelativeRotation);
            }
            else
            {
                this.RelativePosition = (Vector3)stream.ReceiveNext();
                this.RelativeRotation = (Quaternion)stream.ReceiveNext();
            }
        }
    }
}

With that in play, I added a slot onto my main script (the PhotonScript) to store this Halo Prefab;

and then just used PhotonNetwork.Instantiate to create an instance of that prefab whenever the script first starts up and joins the network. My hope is that if the player leaves the room then Photon will take it away again.I parent that instance off the camera;

    public async override void OnJoinedRoom()
    {
        base.OnJoinedRoom();

        // Note that the creator of the room also joins the room...
        if (this.roomStatus == RoomStatus.None)
        {
            this.roomStatus = RoomStatus.JoinedRoom;
        }
        await this.PopulateAnchorAsync();

        var halo = PhotonNetwork.Instantiate(this.haloPrefab.name, Vector3.zero, Quaternion.identity);
        halo.transform.SetParent(CameraCache.Main.transform);
    }

I gave that a very quick test and it seems like (across the HoloLens and the editor at least) it was doing the right thing as you can see from capture below taken from the HoloLens;

where the large circle is the HoloLens displaying the position of the other user represented by the Unity editor and the small circle is the editor displaying the position of the HoloLens.

That all seems to work out quite nicely. I’ve updated the repo here. At the time of writing, I’m not sure whether I’ll revisit this again and add anything more but I’ll post it here if I do…