Rough Notes on UWP and webRTC (Part 4–Adding some Unity and a little HoloLens)

Following up on my previous post, I wanted to take the very basic test code that I’d got working ‘reasonably’ on UWP on my desktop PC and see if I could move it to HoloLens running inside of a Unity application.

The intention would be to preserve the very limited functionality that I have which goes something like;

  • The app runs up, is given the details of the signalling service (from the PeerCC sample) to connect to and it then connects to it
  • The app finds a peer on the signalling service and tries to get a two-way audio/video call going with that peer displaying local/remote video and capturing local audio while playing remote audio.

That’s what I currently have in the signalling branch here and the previous blog post was about abstracting some of that out such that I could use it in a different environment like Unity.

Now it’s time to see if that works out…

Getting Some Early Bits to Work With

In order to even think about this I needed to try and pick up a version of UWP webRTC that works in that environment and which has some extra pieces to help me out and, as far as I know, at the time of writing that involves picking up bits that are mentioned in this particular issue over on github by the UWP webRTC team;

Expose “raw” MediaCapture Object #11

and there are instructions in that post around how to get hold of some bits;

Instructions for Getting Bits

and so I followed those instructions and built the code from that branch of that repo.

From there, I’ve been working with my colleague Pete to put together some of those pieces with the pieces that I already had from the previous blog posts.

First, a quick look around the bits that the repo gives us…

Exploring the WebRtcUnity PeerCC Sample Solution

As is often the case, this process looks like it is going to involve standing on the shoulder of some other giants because there’s already code in the UWP webRTC repo that I pointed to above that shows how to put this type of app together.

The code in question is surfaced through this solution in the repo;

image

Inside of that solution, there’s a project which builds out the equivalent of original XAML+MediaElement PeerCC sample but in a modified way which here doesn’t have to use MediaElement to render and that shift in the code here is represented by its additional Unity dependencies;

image

This confused me for a little while – I was wondering why this XAML based application suddenly had a big dependency on Unity until I realised that what’s been done here to show that media can be rendered by Unity is that the original sample code has been modified such that (dependent on the conditional compilation constant UNITY) this app can either render media streams;

  1. Using MediaElement as it did previously
  2. Using Unity rendering pieces which are then hosted inside of a SwapChainPanel inside of the XAML UI.

Now, I’ve failed to get this sample to run on my machine which I think is down to the versions of Unity that I’m running and so I had to go through a process of picking through the code a little ‘cold’ but in so far as I can see what goes on here is that there are a couple of subprojects involved in making this work…

The Org.WebRtc.Uwp Project

This project was already present in the original XAML-based solution and in my mind this is involved with wrapping some C++/CX code around the webrtc.lib library in order to bring types into a UWP environment. I haven’t done a delta to try and see how much/little is different in this branch of this project over the original sample so there may be differences.

image

The MediaEngineUWP and WebRtcScheme Projects

Then there’s 2 projects within the Unity sample’s MediaEngine folder which I don’t think were present in the original purely XAML-based PeerCC sample;

image

The MediaEngineUWP and WebRtcScheme projects build out DLLs which seem to take on a couple of roles although I’m more than willing to admit that I don’t have this all worked out in my head at the time of writing but I think they are about bridging between the Unity platform, the Windows Media platform and webRTC and I think they do this by;

  • The existing work in the Org.WebRtc.Uwp project which integrates webRTC pieces into the Windows UWP media pipeline. I think this is done by adding a webRTC VideoSinkInterface which then surfaces the webRTC pieces as the UWP IMediaSource and IMediaStreamSource types.
  • The MediaEngineUWP.dll having an export UnityPluginLoad function which grabs an IUnityGraphics and offers a number of other exports that can be called via PInvoke from Unity to set up the textures for local/remote video rendering of video frames in Unity by code inside of this DLL.
    • There’s a class in this project named MediaEnginePlayer which is instanced per video stream and which seems to do the work of grabbing frames from the incoming Windows media pipeline and transferring them into Unity textures.
    • The same class looks to use the IMFMediaEngineNotify callback interface to be notified of state changes for the media stream and responds by playing/stopping etc.

The wiring together of this MediaEnginePlayer into the media pipeline is a little opaque to me but I think that it follows what is documented here and under the topic Source Resolver here. This seems to involve the code associating a URL (of form webrtc:GUID) with each IMediaStream and having an activatable class which the media pipeline then invokes with the URL to be linked up to the right instance of the player.

That may be a ‘much less than perfect’ description of what goes on in these projects as I haven’t stepped through all of that code.

What I think it does mean though is that the code inside of the WebRtcScheme project requires that the .appxmanifest for an app that consumes it needs to include a section that looks like;

 <Extensions>
    <Extension Category="windows.activatableClass.inProcessServer">
      <InProcessServer>
        <Path>WebRtcScheme.dll</Path>
        <ActivatableClass ActivatableClassId="WebRtcScheme.SchemeHandler" ThreadingModel="both" />
      </InProcessServer>
    </Extension>
  </Extensions>

I don’t know of a way of setting this up inside of a Unity project so I ended up just letting Unity build the Visual Studio solution and then I manually hack the manifest to include this section

Exploring the Video Control Solution

I looked into another project within that github repo which is a Unity project contained within this folder;

image

There’s a Unity scene which has a (UI) Canvas and a couple of Unity Raw Image objects which can be used to render to;

image

and a Control script which is set up to PInvoke into the MediaEngineUWP to pass the pieces from the Unity environment into the DLL. That script looks like this;

using System;
using System.Runtime.InteropServices;
using UnityEngine;
using UnityEngine.UI;

#if !UNITY_EDITOR
using Org.WebRtc;
using Windows.Media.Core;
#endif

public class ControlScript : MonoBehaviour
{
    public uint LocalTextureWidth = 160;
    public uint LocalTextureHeight = 120;
    public uint RemoteTextureWidth = 640;
    public uint RemoteTextureHeight = 480;
    
    public RawImage LocalVideoImage;
    public RawImage RemoteVideoImage;

	void Awake()
    {
    }
    
    void Start()
    {
	}

    private void OnInitialized()
    {
    }

    private void OnEnable()
    {
    }

    private void OnDisable()
    {
    }

    void Update()
    {
    }

    public void CreateLocalMediaStreamSource(object track, string type, string id)
    {
        Plugin.CreateLocalMediaPlayback();
        IntPtr nativeTex = IntPtr.Zero;
        Plugin.GetLocalPrimaryTexture(LocalTextureWidth, LocalTextureHeight, out nativeTex);
        var primaryPlaybackTexture = Texture2D.CreateExternalTexture((int)LocalTextureWidth, (int)LocalTextureHeight, TextureFormat.BGRA32, false, false, nativeTex);
        LocalVideoImage.texture = primaryPlaybackTexture;
#if !UNITY_EDITOR
        MediaVideoTrack videoTrack = (MediaVideoTrack)track;
        var source = Media.CreateMedia().CreateMediaStreamSource(videoTrack, type, id);
        Plugin.LoadLocalMediaStreamSource((MediaStreamSource)source);
        Plugin.LocalPlay();
#endif
    }

    public void DestroyLocalMediaStreamSource()
    {
        LocalVideoImage.texture = null;
        Plugin.ReleaseLocalMediaPlayback();
    }

    public void CreateRemoteMediaStreamSource(object track, string type, string id)
    {
        Plugin.CreateRemoteMediaPlayback();
        IntPtr nativeTex = IntPtr.Zero;
        Plugin.GetRemotePrimaryTexture(RemoteTextureWidth, RemoteTextureHeight, out nativeTex);
        var primaryPlaybackTexture = Texture2D.CreateExternalTexture((int)RemoteTextureWidth, (int)RemoteTextureHeight, TextureFormat.BGRA32, false, false, nativeTex);
        RemoteVideoImage.texture = primaryPlaybackTexture;
#if !UNITY_EDITOR
        MediaVideoTrack videoTrack = (MediaVideoTrack)track;
        var source = Media.CreateMedia().CreateMediaStreamSource(videoTrack, type, id);
        Plugin.LoadRemoteMediaStreamSource((MediaStreamSource)source);
        Plugin.RemotePlay();
#endif
    }

    public void DestroyRemoteMediaStreamSource()
    {
        RemoteVideoImage.texture = null;
        Plugin.ReleaseRemoteMediaPlayback();
    }

    private static class Plugin
    {
        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "CreateLocalMediaPlayback")]
        internal static extern void CreateLocalMediaPlayback();

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "CreateRemoteMediaPlayback")]
        internal static extern void CreateRemoteMediaPlayback();

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "ReleaseLocalMediaPlayback")]
        internal static extern void ReleaseLocalMediaPlayback();

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "ReleaseRemoteMediaPlayback")]
        internal static extern void ReleaseRemoteMediaPlayback();

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "GetLocalPrimaryTexture")]
        internal static extern void GetLocalPrimaryTexture(UInt32 width, UInt32 height, out System.IntPtr playbackTexture);

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "GetRemotePrimaryTexture")]
        internal static extern void GetRemotePrimaryTexture(UInt32 width, UInt32 height, out System.IntPtr playbackTexture);

#if !UNITY_EDITOR
        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LoadLocalMediaStreamSource")]
        internal static extern void LoadLocalMediaStreamSource(MediaStreamSource IMediaSourceHandler);

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LoadRemoteMediaStreamSource")]
        internal static extern void LoadRemoteMediaStreamSource(MediaStreamSource IMediaSourceHandler);
#endif

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LocalPlay")]
        internal static extern void LocalPlay();

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "RemotePlay")]
        internal static extern void RemotePlay();

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LocalPause")]
        internal static extern void LocalPause();

        [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "RemotePause")]
        internal static extern void RemotePause();
    }
}

and so it’s essentially giving me the pieces that I need to wire up local/remote media streams coming from webRTC into the pieces that can render them in Unity.

If feels like across these projects are the pieces needed to plug together with my basic library project in order to rebuild the app that I had in the previous blog post and have it run inside of a 3D Unity app rather than a 2D XAML app…

Plugging Together the Pieces

Pete put together a regular Unity project targeting UWP for HoloLens and in the scene at the moment we have only 2 quads that we try to render the local and remote video to.

image

and then there’s an empty GameObject named Control with a script on it configured as below;

image

and you can see that this configuration is being used to do a couple of things;

  • Set up the properties that my conversation library code from the previous blog post needed to try and start a conversation over webRTC
    • The signalling server IP address, port number, whether to initiate a conversation or not and, if so, whether there’s a particular peer name to initiate that conversation with.
  • Set up some properties that will facilitate rendering of the video into the materials texturing the 2 quads in the scene.
    • Widths, heights to use.
    • The GameObjects that we want to render our video streams to.

Pete re-worked the original sample code to render to a texture of a material applied to a quad rather than the original rendering to a 2D RawImage.

Now, it’s fairly easy to then add my conversation library into this Unity project so that we can make use of that code. We simply drop it into the Assets of the project and configure up the appropriate build settings for Unity;

image

and also drop in the MediaEngineUWP, Org.WebRtc.dll and WebRtcScheme.dlls;

image

and the job then becomes one of adapting the code that I wrote in the previous blog post to suit the Unity environment which means being able to implement the IMediaManager interface that I came up with for Unity rather than for XAML.

How to go about that? Firstly, We took those PInvoke signatures from the VideoControlSample and put them into a separate static class named Plugin.

Secondly, we implemented that IMediaManager interface on top of the pieces that originated in the sample;

#if ENABLE_WINMD_SUPPORT

using ConversationLibrary.Interfaces;
using ConversationLibrary.Utility;
using Org.WebRtc;
using System;
using System.Linq;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.WSA;
using Windows.Media.Core;

public class MediaManager : IMediaManager
{
    // This constructor will be used by the cheap IoC container
    public MediaManager()
    {
        this.textureDetails = CheapContainer.Resolve<ITextureDetailsProvider>();
    }
    // The idea is that this constructor would be used by a real IoC container.
    public MediaManager(ITextureDetailsProvider textureDetails)
    {
        this.textureDetails = textureDetails;
    }
    public Media Media => this.media;

    public MediaStream UserMedia => this.userMedia;

    public MediaVideoTrack RemoteVideoTrack { get => remoteVideoTrack; set => remoteVideoTrack = value; }

    public async Task AddLocalStreamAsync(MediaStream stream)
    {
        var track = stream?.GetVideoTracks()?.FirstOrDefault();

        if (track != null)
        {
            // TODO: stop hardcoding I420?.
            this.InvokeOnUnityMainThread(
                () => this.CreateLocalMediaStreamSource(track, LOCAL_VIDEO_FRAME_FORMAT, "SELF"));
        }
    }

    public async Task AddRemoteStreamAsync(MediaStream stream)
    {
        var track = stream?.GetVideoTracks()?.FirstOrDefault();

        if (track != null)
        {
            // TODO: stop hardcoding I420?.
            this.InvokeOnUnityMainThread(
                () => this.CreateRemoteMediaStreamSource(track, REMOTE_VIDEO_FRAME_FORMAT, "PEER"));
        }
    }
    void InvokeOnUnityMainThread(AppCallbackItem callback)
    {
        UnityEngine.WSA.Application.InvokeOnAppThread(callback,false);
    }
    void InvokeOnUnityUIThread(AppCallbackItem callback)
    {
        UnityEngine.WSA.Application.InvokeOnUIThread(callback, false);
    }
    public async Task CreateAsync(bool audioEnabled = true, bool videoEnabled = true)
    {
        this.media = Media.CreateMedia();

        // TODO: for the moment, turning audio off as I get an access violation in
        // some piece of code that'll take some debugging.
        RTCMediaStreamConstraints constraints = new RTCMediaStreamConstraints()
        {
            // TODO: switch audio back on, fix the crash.
            audioEnabled = false,
            videoEnabled = true
        };
        this.userMedia = await media.GetUserMedia(constraints);
    }

    public void RemoveLocalStream()
    {
        // TODO: is this ever getting called?
        this.InvokeOnUnityMainThread(
            () => this.DestroyLocalMediaStreamSource());
    }

    public void RemoveRemoteStream()
    {
        this.DestroyRemoteMediaStreamSource();
    }

    public void Shutdown()
    {
        if (this.media != null)
        {
            if (this.localVideoTrack != null)
            {
                this.localVideoTrack.Dispose();
                this.localVideoTrack = null;
            }
            if (this.RemoteVideoTrack != null)
            {
                this.RemoteVideoTrack.Dispose();
                this.RemoteVideoTrack = null;
            }
            this.userMedia = null;
            this.media.Dispose();
            this.media = null;
        }
    }
    void CreateLocalMediaStreamSource(object track, string type, string id)
    {
        Plugin.CreateLocalMediaPlayback();
        IntPtr playbackTexture = IntPtr.Zero;
        Plugin.GetLocalPrimaryTexture(
            this.textureDetails.Details.LocalTextureWidth, 
            this.textureDetails.Details.LocalTextureHeight, 
            out playbackTexture);

        this.textureDetails.Details.LocalTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = 
            (Texture)Texture2D.CreateExternalTexture(
                (int)this.textureDetails.Details.LocalTextureWidth, 
                (int)this.textureDetails.Details.LocalTextureHeight, 
                (TextureFormat)14, false, false, playbackTexture);

#if ENABLE_WINMD_SUPPORT
        Plugin.LoadLocalMediaStreamSource(
            (MediaStreamSource)Org.WebRtc.Media.CreateMedia().CreateMediaStreamSource((MediaVideoTrack)track, type, id));
#endif
        Plugin.LocalPlay();
    }

    void DestroyLocalMediaStreamSource()
    {
        this.textureDetails.Details.LocalTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = null;
        Plugin.ReleaseLocalMediaPlayback();
    }

    void CreateRemoteMediaStreamSource(object track, string type, string id)
    {
        Plugin.CreateRemoteMediaPlayback();

        IntPtr playbackTexture = IntPtr.Zero;

        Plugin.GetRemotePrimaryTexture(
            this.textureDetails.Details.RemoteTextureWidth, 
            this.textureDetails.Details.RemoteTextureHeight, 
            out playbackTexture);

        // NB: creating textures and calling GetComponent<> has thread affinity for Unity
        // in so far as I can tell.
        var texture = (Texture)Texture2D.CreateExternalTexture(
           (int)this.textureDetails.Details.RemoteTextureWidth,
           (int)this.textureDetails.Details.RemoteTextureHeight,
           (TextureFormat)14, false, false, playbackTexture);

        this.textureDetails.Details.RemoteTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = texture;

#if ENABLE_WINMD_SUPPORT
        Plugin.LoadRemoteMediaStreamSource(
            (MediaStreamSource)Org.WebRtc.Media.CreateMedia().CreateMediaStreamSource((MediaVideoTrack)track, type, id));
#endif
        Plugin.RemotePlay();
    }

    void DestroyRemoteMediaStreamSource()
    {
        this.textureDetails.Details.RemoteTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = null;
        Plugin.ReleaseRemoteMediaPlayback();
    }
    Media media;
    MediaStream userMedia;
    MediaVideoTrack remoteVideoTrack;
    MediaVideoTrack localVideoTrack;
    ITextureDetailsProvider textureDetails;

    // TODO: temporary hard coding...
    static readonly string LOCAL_VIDEO_FRAME_FORMAT = "I420";
    static readonly string REMOTE_VIDEO_FRAME_FORMAT = "H264";
}
#endif

Naturally, this is very “rough” code right now and there’s some hard-coding going on in there but it didn’t take too much effort to plug these pieces under that interface that I’d brought across from my original, minimal XAML-based project.

So…with all of that said…

Does It Work?

Sort of Smile Firstly, you might notice in the code above that audio is hard-coded to be switched off because we currently have a crash if we switch audio on and it’s some release of some smart pointer in the webRTC pieces that we haven’t yet tracked down.

Minus audio, it’s possible to run the Unity app here on HoloLens and have it connect via the sample-provided signalling service to the original XAML-based PeerCC sample running (e.g.) on my Surface Book and video streams flow and are visible in both directions.

Here’s a screenshot of that “in action” from the point of view of the desktop app receiving video stream from HoloLens;

image

and that screenshot is display 4 things;

  • Bottom right is the local PC’s video stream off its webcam – me wearing a HoloLens.
  • Upper left 75% is the remote stream coming from the webcam on the HoloLens including its holographic content which currently includes;
    • Upper left mid section is the remote video stream from the PC replayed on the HoloLens.
    • Upper right mid section is the local HoloLens video stream replayed on the HoloLens which looked to disappear when I was taking this screenshot.

You might see some numbers in there that suggest 30fps but I think that was a temporary thing and at the time of writing the performance so far is fairly bad but we’ve not had any look at what’s going on there just yet – this ‘play’ sample needs some more investigation.

Where’s the Code?

If you’re interested in following these experiments along as we go forward then the code is in a different location to the previous repo as it’s over here on Pete’s github account;

https://github.com/peted70/web-rtc-test

Feel free to feedback but, of course, apply the massive caveats that this is very rough experimentation at the moment – there’s a long way to go Smile

Experiments with Shared Holograms and Azure Blob Storage/UDP Multicasting (Part 7)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

A follow-up to my previous post around experiments with shared holograms using Azure blob storage and UDP multicasting techniques.

At the end of the previous post, I said that I might return and make a slightly better ‘test scene’ for the Unity project this post is my write up of my attempts to do that.

What’s in the New Test Scene?

I found a model of a house on Remix3D.com;

image

and I made the test scene about visualising that model in a consistent place on multiple devices with the ability to rotate, scale and move it such that the multiple devices keep a consistent view.

What I built is pretty simple and the essential steps involved in the scene are;

  • The app runs and waits for the underlying library to tell it whether there are already other devices on the same network or not. During this period, it displays a ‘waiting screen’ for up to 5 seconds if it doesn’t receive notification that there are other devices on the network.

20180110_130146_HoloLens

  • If the app determines that no-other devices are on the network then it pops up a model of a house gaze-locked to the device so that the user can potentially move it around and say ‘done’ to place it.

20180110_125124_HoloLens

  • Once positioned, the app replaces the model displayed by using the APIs detailed in the previous posts to create a shared hologram which is exactly the same as the house and in the same position etc. At this point, its creation will be multicast around the network and the blob representing its world anchor will be uploaded to Azure.
  • If the app determines that there are other devices on the network at start-up time then it will inform the user of this;

20180110_125554_HoloLens

  • and it will stop the user from positioning the model while waiting to bring the position data (world anchor) from Azure. The same thing should happen in the race condition where multiple users start the app at the same time and then one of them becomes the first to actually position the model.

20180110_125733_HoloLens

  • Once the model has been positioned on the local device (in whichever way) it enters into a mode which allows for voice commands to be used to enter ‘rotate’, ‘scale’ and ‘move’ modes to move it around;

20180110_125155_HoloLens

  • those transformations are then multicast to other devices on the network such that they all display the same model of a house in the same place.

and that’s pretty much it Smile

How’s the Test Scene Structured?

I already had a test scene within the Unity project that I’d published to github and so I just altered it rather than starting from scratch.

It’s very simple – the scene starts with the main camera parenting both a text object (to give a very poor Heads-Up-Display) and the model of the house (to give a very poor gaze-locked positioning system) as below;

image

there is then one object called ScriptHolder which has an instance of the Shared Hologram Controller component (and its dependency) that I discussed in the previous posts;

image

I’ve ommitted the details of my own Azure configuration so that would need to be filled in to specify the storage details and I’ve also told the script that I want to synchronise transforms on a fairly high frequency which, realistically, I think I could drop down a little.

Beyond that, I also have a script here called Main Script which contains the logic for the scene with the positive part of it being that there’s not too much of it;

using SharedHolograms;
using System;
using System.Linq;
using UnityEngine;
using UnityEngine.Windows.Speech;

public class MainScript : MonoBehaviour, ICreateGameObjects
{
    // Text to display output messages on
    public TextMesh StatusDisplayTextMesh;

    // GameObject to use as a marker to position the model (i.e. the house)
    public GameObject PositionalModel;

    // Implementation of ICreateGameObject - because we are not creating a Unity primitive
    // I've implemented this here and 'plugged it in' but our creation is very simple in
    // that we duplicate the object that we're using as the PositionalModel (i.e. the
    // house in my version).
    public void CreateGameObject(string gameObjectSpecifier, Action<GameObject> callback)
    {
        // Right now, we know how to create one type of thing and we do it in the most
        // obvious way but we could do it any which way we like and even get some other
        // componentry to do it for us.
        if (gameObjectSpecifier == "house")
        {
            var gameObject = GameObject.Instantiate(this.PositionalModel);
            gameObject.SetActive(true);
            callback(gameObject);
        }
        else
        {
            // Sorry, only know about "house" right now.
            callback(null);
        }
    }
    void Start()
    {
        // Set up our keyword handling. Originally, I imagined more than one keyword but
        // we ended up just with "Done" here.
        var keywords = new[]
        {
            new { Keyword = "done", Handler = (Action)this.OnDoneKeyword }
        };
        this.keywordRecognizer = new KeywordRecognizer(keywords.Select(k => k.Keyword).ToArray());

        this.keywordRecognizer.OnPhraseRecognized += (e) =>
        {
            var understood = false;

            if ((e.confidence == ConfidenceLevel.High) ||
                (e.confidence == ConfidenceLevel.Medium))
            {
                var handler = keywords.FirstOrDefault(k => k.Keyword == e.text.ToLower());

                if (handler != null)
                {
                    handler.Handler();
                    understood = true;
                }
            }
            if (!understood)
            {
                this.SetStatusDisplayText("I might have missed what you said...");
            }
        };
        // We need to know when various things happen with the shared holograms controller.
        SharedHologramsController.Instance.SceneReady += OnSceneReady;
        SharedHologramsController.Instance.Creator.BusyStatusChanged += OnBusyStatusChanged;
        SharedHologramsController.Instance.Creator.HologramCreatedRemotely += OnRemoteHologramCreated;
        SharedHologramsController.Instance.Creator.GameObjectCreator = this;

        // Wait to see whether we should make the positional model active or not.
        this.PositionalModel.SetActive(false);
        this.SetStatusDisplayText("waiting...");
    }
    void OnDoneKeyword()
    {
        if (!this.busy)
        {
            this.keywordRecognizer.Stop();

            this.SetStatusDisplayText("working, please wait...");

            if (this.PositionalModel.activeInHierarchy)
            {
                // Get rid of the placeholder.
                this.PositionalModel.SetActive(false);

                // Create the shared hologram in the same place as the placeholder.
                SharedHologramsController.Instance.Creator.Create(
                    "house",
                    this.PositionalModel.transform.position,
                    this.PositionalModel.transform.forward,
                    Vector3.one,
                    gameObject =>
                    {
                        this.SetStatusDisplayText("object created and shared");
                        this.houseGameObject = gameObject;
                        this.AddManipulations();
                    }
                );
            }
        }
    }
    void OnBusyStatusChanged(object sender, BusyStatusChangedEventArgs e)
    {
        this.busy = e.Busy;

        if (e.Busy)
        {
            this.SetStatusDisplayText("working, please wait...");
        }
    }
    void OnSceneReady(object sender, SceneReadyEventArgs e)
    {
        // Are there other devices around or are we starting alone?
        if (e.Status == SceneReadyStatus.OtherDevicesInScene)
        {
            this.SetStatusDisplayText("detected other devices, requesting sync...");
        }
        else
        {
            this.SetStatusDisplayText("detected no other devices...");

            // We need this user to position the model so switch it on
            this.PositionalModel.SetActive(true);
            this.SetStatusDisplayText("walk to position the house then say 'done'");

            // Wait for the 'done' keyword.
            this.keywordRecognizer.Start();
        }
    }
    void OnRemoteHologramCreated(object sender, HologramEventArgs e)
    {
        // Someone has beaten this user to positioning the model
        // turn off the model.
        this.PositionalModel.SetActive(false);

        this.SetStatusDisplayText("sync'd...");

        // Stop waiting for the 'done' keyword (if we are)
        this.keywordRecognizer.Stop();

        this.houseGameObject = GameObject.Find(e.ObjectId.ToString());

        // Make sure we can manipulate what the other user has placed.
        this.AddManipulations();
    }
    void AddManipulations()
    {
        this.SetStatusDisplayText("say 'move', 'rotate' or 'scale'");

        // The Manipulations script contains a keyword recognizer for 'move', 'rotate', 'scale'
        // and some basic logic to wire those to hand manipulations
        this.houseGameObject.AddComponent<Manipulations>();
    }
    void SetStatusDisplayText(string text)
    {
        if (this.StatusDisplayTextMesh != null)
        {
            this.StatusDisplayTextMesh.text = text;
        }
    }
    KeywordRecognizer keywordRecognizer;
    GameObject houseGameObject;
    bool busy;
}

if someone (anyone! please! please! Winking smile) had been following the previous set of blog scripts closely they might have noticed that in order to write that code I had to change my existing code to at least;

  • Fire an event when the device joins the network such that code can be notified of whether the messaging layer has seen other devices on the network or not.
  • Fire events when other devices on the network create/delete holograms causing them to be imported and created by the local device.
  • Fire an event as/when the underlying code is ‘busy’ doing some downloading or uploading or similar.

Having tried to implement this scene it was immediately obvious to me that this was needed but it wasn’t so obvious to me that I implemented those pieces beforehand and so that was a useful output of writing this test scene.

The other thing that’s used in the scene is a MonoBehaviour named Manipulations. This is a version of a script that I’ve used in a few places in the past and it’s a very cheap and cheerful way to provide rotate/scale/move behaviour on a focused object in response to voice commands and hand manipulations.

I placed this script and the other script that is specific to the test scene in the ‘Scene Specific’ folder;

image

and the Manipulations script has a dependency on the 3 materials in the Resources folder that it uses for drawing different coloured boxed around an object while it is being rotated/scaled/moved;

image

and that’s pretty much it.

One thing that I’d note is that when I’d used this Manipulations scripts before it was always in projects that were making use of the Mixed Reality Toolkit for Unity and, consequently, I had written the code to depend on some items of the toolkit – specifically around the IManipulationHandler interface and the IInputClickHandler interface.

I don’t currently have any use of the toolkit in this test project and it felt like massive overkill to add it just to enable this one script and so I reworked the script to move it away from having a dependency on the toolkit and I was very pleased to find that this was only a small piece of work – i.e. the toolkit had mostly done a bit of wrapping on the raw Unity APIs and so it wasn’t difficult to unpick that dependency here.

Wrapping Up

I don’t intend to write any more posts in this mini-series around using Azure blob storage and UDP multicasting to enable shared holograms, I think I’ve perhaps gone far enough Smile

The code is all up on github should anyone want to explore it, try it, take some pieces for their own means.

I’m always open to feedback so feel free to do that if you want to drop me a line and be aware that I’ve only tested this code in a limited way as I wrote it all on a single HoloLens device using the (supplied) test programs to simulate responses from a second device but I’m ‘reasonably’ happy that it’s doing sensible things.

Experiments with Shared Holograms and Azure Blob Storage/UDP Multicasting (Part 6)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

A follow-up to my previous post around experiments with shared holograms using Azure blob storage and UDP multicasting techniques.

I doubt that anyone’s following along in great detail Smile but at the end of “Part 2” in this little series of posts I had ended the post with a bit of a “to do” list on my experiments which was as below;

  • Create objects other than primitives – I added something around this in Part 3.
  • Transform objects after they are created – I added something around this in Part 4 and Part 5.
  • Have some ‘memory’ of messages that a client has missed such that not all clients have to joint a scene at the same time.

I wanted to return and make some notes on that last point around a ‘memory’.

Prior to this post, I’ve set up some scripts and a library such that code based on my scripts running on one HoloLens device can be used to dynamically instantiate holograms in various places around the physical world and the scripts make it relatively easy to;

  • Create a shared hologram
    • A simple Create() API takes the type of the hologram and its position, scale which…
      • Creates the hologram
      • Automatically parents it from a world-anchored object such that no hologram is more than 3m from its world anchor, dynamically creating and anchoring the parent if necessary.
      • Exports the details of any newly created anchor to Azure blob storage.
      • Multicasts a message around the network to let other devices respond and create their own replica of the hologram using the world anchor downloaded from Azure etc.
      • Optionally attaches a ‘behaviour’ which will multicast changes to the local position, rotation, scale of the hologram around the network on some frequency so that changes made to those values will reflect across all the devices.
  • Delete a shared hologram
    • A simple Delete() API which…
      • Removes the object from the scene
      • Multicasts a message around the network to let other devices remove the object locally.

and that all seems to work reasonably well.

However, there’s a lack of ‘memory’ in the sense that if an app based on this code was to run on one device and take actions such as creating, transforming, deleting holograms prior to the app running on a second device then there’s no mechanism via which that second device can join the scene and catch up with what’s been happening on the first device.

There’s no way to sync beyond having all the apps running at the same time which isn’t very realistic.

I wanted to try and address this – there’s no doubt lots of different ways of doing it but I considered;

  • Adding some centralised state such that some blob/table in the cloud records the current state of play and any device can update/query it at any time
  • Adding some centralised state such that one ‘master’ device maintains a list that can be queried by other devices
  • Making minimal changes such that the de-centralised state already present on each device can be used to reconstruct the scene on a newly-arrived device

I went with the 3rd option as it felt like a relatively small change to what I already had in my code.

With that in mind, I didn’t make any changes to my MessagingLibrary project but I added new messages into the Unity project;

image

With the essential scheme being something along the lines of;

  • When a device first runs up it creates a GUID to identify itself and multicasts a NewDeviceAnnouncementMessage
  • Other devices respond to those messages by replying with a ExistingDeviceMessage which contains their own ID
  • A new device that receives such responses within the first few seconds of start-up can choose one of the replies and construct a SceneRequestMessage and multicast it (it contains both the destination device ID and the source device ID)
  • The device that receives the SceneRequestMessage multicasts back a sequence of SceneResponseObjectMessage messages, one for each shared hologram in the scene. These messages also contain the intended recipient device ID so that other devices can ignore them.

The SceneResponseObjectMessage is essentially the same as the initial CreatedObjectMessage which is multicast when the Create() API creates a shared hologram and so the handling of those messages doesn’t require lots of new code – it’s the same code that would handle the creation messages if the receiving app had been alive at the time that the holograms were created.

The changes to send/receive/process these messages then become relatively minor and the code’s up to date on github.

I also updated the console-based test application that I’ve been using to test out the code when only running with one HoloLens although I must admit that the code in that application is perhaps only really usable by me – it’d need some detailed explanation for someone else to pick it up and figure out what the heck I had in mind for using this test application but it has helped a lot along the way.

I’m not planning to add more code into this series of posts. The only addition that I’d like to make (beyond testing properly on multiple devices Smile) is to add a better test scene.

The one that I have in the Unity project really is only there for me to test out my code, I’d like to replace it with one that someone coming new to this code could easily run, understand and use to get a basic shared hologram app up and running on multiple devices in a short time. If I get a chance to look into this then I’ll add one more post to this series when I’ve got that new test scene put in place…