Hitchhiking the HoloToolkit-Unity, Leg 4 – Text To Speech

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Voice is really important as both an input and output mechanism on the HoloLens and there’s a great section in the documentation over here on working with it;

Voice input

and it’s fairly easy to add the ability to have the device speak in that I can add the “Text To Speech Manager” script onto a game object in Unity;

image

and that script comes from the Utilities section of the HoloToolkit-Unity;

image

and once I’ve got that in place I can easily ask it to speak on my behalf with code such as this which is running within a method on a script that is attached to my GameObject;

    var textToSpeech = this.GetComponent<TextToSpeechManager>();
    textToSpeech.SpeakText("One");
    textToSpeech.SpeakText("Two");
    textToSpeech.SpeakText("Three");

And you can see the code for the TextToSpeechManager over here on GitHub.

There’s one caveat though in the code above and that’s that the model employed by the Text to Speech manager code here is one of ‘fire and forget’ and so my code above which is trying to have 3 distinct pieces of speech spoken doesn’t really work and what I usually hear when I run that code is a spoken output of;

“Three”

because the third call overruns the second call which overruns the first call.

I wanted to see if I could tweak that a little and so I wrote some exploratory code based on what I saw in a slightly earlier check-in of the Text to Speech manager and my main change was to try and alter the implementation such that calls to SpeakText or SpeakSsml were effectively queued such that a second call would execute after the first one had completed playing.

I initially thought that this would be pretty easy but it turned out that the underlying Unity AudioSource object that underpins this code doesn’t really seem to have a great way of letting you know when the audio has stopped playing. It seems that the options are to either;

  • Call the Play() method or the PlayScheduled() method with some kind of delay so as to delay a particular piece of speech until the pieces of speech that have gone before it have already finished playing.
  • Poll the isPlaying property to see when playback has ended.

There may be other/better mechanisms but that’s all I found by doing a search around the web.

The existing TextToSpeechManager code already does work (on lines 239 onwards of function PlaySpeech) to ensure that it moves most of the work of generating speech from text via the UWP’s SpeechSynthesizer APIs into a separate task but (as the comments in the code around line 297 of function PlaySpeech say) the actual playback of the audio has to happen on the main Unity thread although I don’t believe that the call to Play is a blocking call that would halt that Unity thread.

I wanted to leave as much of this code in place as possible while adding in my extra pieces that queued up speech rather than always trying to play it even if the AudioSource was already busy which is what the existing code seems to do and it felt like to do that I would have to implement some kind of queuing mechanism which took into account;

  • Needing to be able to deal with the idea that the production of the speech itself is done asynchronously.
  • Needing to be able to poll the isPlaying flag on some frequency once the speech playback had started in order to determine completion by polling.

Update? Coroutines? Tasks?

At this point, I could see a few different ways in which I might be able to implement this functionality with Unity and I figured that I could maybe;

  • Do some work from a call to Update() to poll to see if any current speech had finished playing.
  • Use Unity’s Coroutines in order to see if I could poll the speech status from there.
  • Try and wrap up something that used a TaskCompletionSource and which presented the polling as something that could be awaited in C#.

The last one is perhaps the most elegant but in the end I went with using the InvokeRepeating method to schedule some work to be checked ‘every so often’. There are probably better ways of doing this but it’s all part of learning Smile

In order to get something going, I took the existing code and did a few things.

1 – Refactoring into a UnityAudioHelper

I took some of the code from the existing TextToSpeechManager and refactored it into this ‘audio helper’ class below. Largely, this is just the original code move into its own static class;

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in the project root for license information.
using System;
using System.Collections;
using System.Collections.Generic;
using UnityEngine;

#if WINDOWS_UWP
using System.Threading.Tasks;
using Windows.Foundation;
using Windows.Media.SpeechSynthesis;
using System.Runtime.InteropServices.WindowsRuntime;
using Windows.Storage.Streams;
#endif

namespace HoloToolkit.Unity
{
  public static class UnityAudioHelper
  {
    /// <summary>
    /// Converts two bytes to one float in the range -1 to 1 
    /// </summary>
    /// <param name="firstByte">
    /// The first byte.
    /// </param>
    /// <param name="secondByte">
    /// The second byte.
    /// </param>
    /// <returns>
    /// The converted float.
    /// </returns>
    private static float BytesToFloat(byte firstByte, byte secondByte)
    {
      // Convert two bytes to one short (little endian)
      short s = (short)((secondByte << 8) | firstByte);

      // Convert to range from -1 to (just below) 1
      return s / 32768.0F;
    }

    /// <summary>
    /// Dynamically creates an <see cref="AudioClip"/> that represents raw Unity audio data.
    /// </summary>
    /// <param name="name">
    /// The name of the dynamically generated clip.
    /// </param>
    /// <param name="audioData">
    /// Raw Unity audio data.
    /// </param>
    /// <param name="sampleCount">
    /// The number of samples in the audio data.
    /// </param>
    /// <param name="frequency">
    /// The frequency of the audio data.
    /// </param>
    /// <returns>
    /// The <see cref="AudioClip"/>.
    /// </returns>
    internal static AudioClip ToClip(string name, float[] audioData, int sampleCount, int frequency)
    {
      // Create the audio clip
      var clip = AudioClip.Create(name, sampleCount, 1, frequency, false);

      // Set the data
      clip.SetData(audioData, 0);

      // Done
      return clip;
    }

    /// <summary>
    /// Converts raw WAV data into Unity formatted audio data.
    /// </summary>
    /// <param name="wavAudio">
    /// The raw WAV data.
    /// </param>
    /// <param name="sampleCount">
    /// The number of samples in the audio data.
    /// </param>
    /// <param name="frequency">
    /// The frequency of the audio data.
    /// </param>
    /// <returns>
    /// The Unity formatted audio data.
    /// </returns>
    internal static float[] ToUnityAudio(byte[] wavAudio, out int sampleCount, out int frequency)
    {
      // Determine if mono or stereo
      int channelCount = wavAudio[22];     // Speech audio data is always mono but read actual header value for processing

      // Get the frequency
      frequency = BitConverter.ToInt32(wavAudio, 24);

      // Get past all the other sub chunks to get to the data subchunk:
      int pos = 12;   // First subchunk ID from 12 to 16

      // Keep iterating until we find the data chunk (i.e. 64 61 74 61 ...... (i.e. 100 97 116 97 in decimal))
      while (!(wavAudio[pos] == 100 && wavAudio[pos + 1] == 97 && wavAudio[pos + 2] == 116 && wavAudio[pos + 3] == 97))
      {
        pos += 4;
        int chunkSize = wavAudio[pos] + wavAudio[pos + 1] * 256 + wavAudio[pos + 2] * 65536 + wavAudio[pos + 3] * 16777216;
        pos += 4 + chunkSize;
      }
      pos += 8;

      // Pos is now positioned to start of actual sound data.
      sampleCount = (wavAudio.Length - pos) / 2;     // 2 bytes per sample (16 bit sound mono)
      if (channelCount == 2) sampleCount /= 2;      // 4 bytes per sample (16 bit stereo)

      // Allocate memory (supporting left channel only)
      float[] unityData = new float[sampleCount];

      // Write to double array/s:
      int i = 0;
      while (pos < wavAudio.Length)
      {
        unityData[i] = BytesToFloat(wavAudio[pos], wavAudio[pos + 1]);

        pos += 2;
        if (channelCount == 2)
        {
          pos += 2;
        }
        i++;
      }

      // Done
      return unityData;
    }
#if WINDOWS_UWP
    internal static async Task<byte[]> SynthesizeToUnityDataAsync(
      string text,
      Func<string, IAsyncOperation<SpeechSynthesisStream>> speakFunc)
    {
      byte[] buffer = null;

      // Speak and get stream
      using (var speechStream = await speakFunc(text))
      {
        // Create buffer
        buffer = new byte[speechStream.Size];

        // Get input stream and the size of the original stream
        using (var inputStream = speechStream.GetInputStreamAt(0))
        {
          await inputStream.ReadAsync(buffer.AsBuffer(),
            (uint)buffer.Length, InputStreamOptions.None);
        }
      }
      return (buffer);
    }
#endif
  }
}

2 – Add a ‘Queue Worker’

I added in my own abstract base class which tries to encapsulate the idea of a queue of work to be processed where items are taken off the queue and worked upon in sequence. This is quite a generic problem to solve and you can get into aspects of multi-threading and so on which I’ve avoided here and this little class isn’t as generic as it could be because I’m bending it to my specific requirements here in that;

  • A queue is polled periodically rather than (e.g.) signalling some kind of synchronization object when work is available. This isn’t how I’d perhaps usually write this sort of class but I have a specific requirement to poll the AudioSource plus Unity’s model is quite amenable to polling.
  • An item of work is de-queued.
  • The item of work is executed and it is assumed that the item will take steps to avoid blocking.
  • The completion of the item of work is determined by polling some method to check a ‘completed’ status.

and my little queue class ended up looking like this;

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in the project root for license information.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System;

public abstract class IntervalWorkQueue : MonoBehaviour
{
  [Tooltip("The interval (sec) on which to check queued speech.")]
  [SerializeField]
  private float queueInterval = 0.25f;

  public enum WorkState
  {
    Idle,
    Starting,
    PollingForCompletion
  }
  public IntervalWorkQueue()
  {
    this.workState = WorkState.Idle;
    this.queueEntries = new Queue<object>();
  }
  public void AddWorkItem(object workItem)
  {
    this.queueEntries.Enqueue(workItem);
  }
  public void Start()
  {
    base.InvokeRepeating("ProcessQueue", queueInterval, queueInterval);
  }
  void ProcessQueue()
  {
    if ((this.workState == WorkState.Starting) &&
      (this.WorkIsInProgress))
    {
      this.workState = WorkState.PollingForCompletion;
    }

    if ((this.workState == WorkState.PollingForCompletion) &&
      (!this.WorkIsInProgress))
    {
      this.workState = WorkState.Idle;
    }

    if ((this.workState == WorkState.Idle) &&
      (this.WorkedIsQueued))
    {
      this.workState = WorkState.Starting;
      object workEntry = this.queueEntries.Dequeue();
      this.DoWorkItem(workEntry);
    }
  }
  protected bool WorkedIsQueued
  {
    get
    {
      return (this.queueEntries.Count > 0);
    }
  }
  protected abstract void DoWorkItem(object item);
  protected abstract bool WorkIsInProgress { get; }
  WorkState workState;
  Queue<object> queueEntries;
}

3 – Derive a Text to Speech Manager

I derive a new variant of the original TextToSpeechManager from my new IntervalWorkQueue as in the code snippet below and this class is making calls out to the refactored UnityAudioHelper which I listed earlier. The main ‘features’ here are that the Start() method calls base.Start() in order to get the interval work queue up and running and that the DoWorkItem and WorkIsInProgress methods and properties have been overridden to call into the original code whereas the original PlaySpeech method has been reworked to simply call base.AddWorkItem to add an entry onto a queue.

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in the project root for license information.
using System;
using UnityEngine;
using System.Collections;

#if WINDOWS_UWP
using Windows.Foundation;
using Windows.Media.SpeechSynthesis;
using Windows.Storage.Streams;
using System.Linq;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.Runtime.InteropServices.WindowsRuntime;
#endif // WINDOWS_UWP

namespace HoloToolkit.Unity
{
  /// <summary>
  /// The well-know voices that can be used by <see cref="TextToSpeechManager"/>.
  /// </summary>
  public enum TextToSpeechVoice
  {
    /// <summary>
    /// The default system voice.
    /// </summary>
    Default,

    /// <summary>
    /// Microsoft David Mobile
    /// </summary>
    David,

    /// <summary>
    /// Microsoft Mark Mobile
    /// </summary>
    Mark,

    /// <summary>
    /// Microsoft Zira Mobile
    /// </summary>
    Zira
  }

  public class TextToSpeechManager : IntervalWorkQueue
  {
    [Tooltip("The audio source where speech will be played.")]
    [SerializeField]
    private AudioSource audioSource;

    [Tooltip("The voice that will be used to generate speech.")]
    [SerializeField]
    private TextToSpeechVoice voice;

    public AudioSource AudioSource
    {
      get
      {
        return (this.audioSource);
      }
      set
      {
        this.audioSource = value;
      }
    }
    public TextToSpeechVoice Voice
    {
      get
      {
        return (this.voice);
      }
      set
      {
        this.voice = value;
      }
    }

    /// <summary>
    /// Speaks the specified SSML markup using text-to-speech.
    /// </summary>
    /// <param name="ssml">
    /// The SSML markup to speak.
    /// </param>
    public void SpeakSsml(string ssml)
    {
      // Make sure there's something to speak
      if (string.IsNullOrEmpty(ssml)) { return; }

      // Pass to helper method
#if WINDOWS_UWP
      PlaySpeech(ssml, this.voice, synthesizer.SynthesizeSsmlToStreamAsync);
#else
      LogSpeech(ssml);
#endif
    }

    /// <summary>
    /// Speaks the specified text using text-to-speech.
    /// </summary>
    /// <param name="text">
    /// The text to speak.
    /// </param>
    public void SpeakText(string text)
    {
      // Make sure there's something to speak
      if (string.IsNullOrEmpty(text)) { return; }

      // Pass to helper method
#if WINDOWS_UWP
      PlaySpeech(text, this.voice, synthesizer.SynthesizeTextToStreamAsync);
#else
      LogSpeech(text);
#endif
    }
    /// <summary>
    /// Logs speech text that normally would have been played.
    /// </summary>
    /// <param name="text">
    /// The speech text.
    /// </param>
    void LogSpeech(string text)
    {
      Debug.LogFormat("Speech not supported in editor. \"{0}\"", text);
    }
    public new void Start()
    {
      base.Start();

      try
      {
        if (audioSource == null)
        {
          Debug.LogError("An AudioSource is required and should be assigned to 'Audio Source' in the inspector.");
        }
        else
        {
#if WINDOWS_UWP
          this.synthesizer = new SpeechSynthesizer();
#endif
        }
      }
      catch (Exception ex)
      {
        Debug.LogError("Could not start Speech Synthesis");
        Debug.LogException(ex);
      }
    }
    protected override void DoWorkItem(object item)
    {
#if WINDOWS_UWP

      try
      {
        SpeechEntry speechEntry = item as SpeechEntry;

        // Need await, so most of this will be run as a new Task in its own thread.
        // This is good since it frees up Unity to keep running anyway.
        Task.Run(async () =>
        {
          this.ChangeVoice(voice);

          var buffer = await UnityAudioHelper.SynthesizeToUnityDataAsync(
            speechEntry.Text,
            speechEntry.SpeechGenerator);

          // Convert raw WAV data into Unity audio data
          int sampleCount = 0;
          int frequency = 0;
          float[] unityData = null;

          unityData = UnityAudioHelper.ToUnityAudio(
            buffer, out sampleCount, out frequency);

          // The remainder must be done back on Unity's main thread
          UnityEngine.WSA.Application.InvokeOnAppThread(
            () =>
            {
                // Convert to an audio clip
                var clip = UnityAudioHelper.ToClip(
                  "Speech", unityData, sampleCount, frequency);

                // Set the source on the audio clip
                audioSource.clip = clip;

                // Play audio
                audioSource.Play();
            },
            false);
        });
      }
      catch (Exception ex)
      {
        Debug.LogErrorFormat("Speech generation problem: \"{0}\"", ex.Message);
      }
#endif 
    }
    protected override bool WorkIsInProgress
    {
      get
      {
#if WINDOWS_UWP
        return (this.audioSource.isPlaying);
#else
        return (false);
#endif
      }
    }

#if WINDOWS_UWP
    class SpeechEntry
    {
      public string Text { get; set; }
      public TextToSpeechVoice Voice { get; set; }
      public Func<string, IAsyncOperation<SpeechSynthesisStream>> SpeechGenerator { get; set; }
    }
    private SpeechSynthesizer synthesizer;
    private VoiceInformation voiceInfo;

    /// <summary>
    /// Executes a function that generates a speech stream and then converts and plays it in Unity.
    /// </summary>
    /// <param name="text">
    /// A raw text version of what's being spoken for use in debug messages when speech isn't supported.
    /// </param>
    /// <param name="speakFunc">
    /// The actual function that will be executed to generate speech.
    /// </param>
    void PlaySpeech(
      string text,
      TextToSpeechVoice voice,
      Func<string, IAsyncOperation<SpeechSynthesisStream>> speakFunc)
    {
      // Make sure there's something to speak
      if (speakFunc == null)
      {
        throw new ArgumentNullException(nameof(speakFunc));
      }

      if (synthesizer != null)
      {
        base.AddWorkItem(
          new SpeechEntry()
          {
            Text = text,
            Voice = voice,
            SpeechGenerator = speakFunc
          }
        );
      }
      else
      {
        Debug.LogErrorFormat("Speech not initialized. \"{0}\"", text);
      }
    }
    void ChangeVoice(TextToSpeechVoice voice)
    {
      // Change voice?
      if (voice != TextToSpeechVoice.Default)
      {
        // Get name
        var voiceName = Enum.GetName(typeof(TextToSpeechVoice), voice);

        // See if it's never been found or is changing
        if ((voiceInfo == null) || (!voiceInfo.DisplayName.Contains(voiceName)))
        {
          // Search for voice info
          voiceInfo = SpeechSynthesizer.AllVoices.Where(v => v.DisplayName.Contains(voiceName)).FirstOrDefault();

          // If found, select
          if (voiceInfo != null)
          {
            synthesizer.Voice = voiceInfo;
          }
          else
          {
            Debug.LogErrorFormat("TTS voice {0} could not be found.", voiceName);
          }
        }
      }
    }
#endif // WINDOWS_UWP
  }
}

Wrapping Up

That’s pretty much it for this post. If I now add an instance of my TextToSpeechManager to a game component as in the picture below;

image

where the interval has been set to poll the queue at an (aggressive!) 250ms interval then I find that code like this;

    var textToSpeech = this.GetComponent<TextToSpeechManager>();
    textToSpeech.SpeakText("One");
    textToSpeech.SpeakText("Two");
    textToSpeech.SpeakText("Three");

now plays 3 distinct lines of speech rather than one which is what I had been hoping for when I started the post.

Hitchhiking the HoloToolkit-Unity, Leg 3–Spatial Understanding (& Mapping)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

This post follows on from these two previous posts;

One of the fundamental and magical abilities of the HoloLens is to understand the user’s surroundings and offer those surroundings as a ‘mixed reality canvas’ onto which a developer can paint their holograms for the user to interact with.

There are some great examples of this in action and I’d highlight experiences like ‘Fragments’;

where the story-telling and characters within it take advantage of the room in which you are using the app and also ‘Young Conker’;

where the character takes detailed routes around your space.

Both examples show that the device and the apps have an understanding of the topography of the room and that’s provided by the ‘Spatial Mapping’ capabilities of  the device and the platform that are described here;

HoloLens Spatial Mapping

and from the Unity perspective here;

Spatial Mapping in Unity

and there’s great information in this response on the HoloLens Forums that adds detail to both of those articles;

Brain Dump on Spatial Mapping

Spatial mapping brings the data that the device has gathered into the Unity developer’s realm as a mesh so that it can be rendered and so that it can be used as the basis for collisions in order for Holograms to interact realistically with the world.

Also within those document pages, the topic of ‘Spatial Understanding’ is introduced;

“When placing holograms in the physical world it is often desirable to go beyond spatial mapping’s mesh and surface planes. When placement is done procedurally, a higher level of environmental understanding is desirable. This usually requires making decisions about what is floor, ceiling, and walls. In addition, the ability to optimize against a set of placement constraints to determining the most desirable physical locations for holographic objects.”

and the doc pages describe how some of the core spatial understanding that has been used in ‘Young Conker’ and ‘Fragments’ has been shared so as to make this process easier for a developer.

The HoloToolkit-Unity contains functionality both for spatial mapping and for spatial understanding and there are (great) samples of both in the toolkit itself;

and the spatial understanding sample is particularly good and offers a showcase of lots of the functionality that the module can do. If you have a device, it’s really worth trying out that sample and seeing what it can do within some of your own spaces.

I’ve played with that sample a lot and it inspired me to try and pick it apart and see if I could come up with my own (smaller, lesser) sample just so I could try and figure out some of the things that it was doing for myself and I’ve made a screencast below of putting that smaller sample together from scratch;

Enjoy🙂

Hitchhiking the HoloToolkit-Unity, Leg 2 – Input Scripts

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Following on from this previous post;

Hitchiking the HoloToolkit-Unity, Leg 1

I took some exploratory steps into some of the scripts that live in the Input section of the toolkit and which handle things like gaze, gesture, cursors, keyword-based speech and a few other pieces and I noted down what came out of that exploration in the screen capture below;

I learned a few things while exploring, hence sharing here in case others find some of those ‘notes’ useful.

Hitchhiking the HoloToolkit-Unity, Leg 1 – Getting Set Up

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

If you’re interested in building for HoloLens in Unity then one of the first and important things that you’ll come across is the HoloLens Toolkit-Unity which is referenced from the official ‘Getting Started’ docs here;

and is part of the 3 open source projects referenced over here;

and ultimately lives on GitHub here;

There’s quite a lot of functionality in the toolkit and I’ve experimented with some of it and wanted to start making some notes about what’s present there mostly so that I can return to them myself in the future but also for others who might come across these pages.

There’s a good page on the toolkit pages around how to get started with the toolkit;

and I thought it might be useful to have a screen capture of working through those steps in Unity and so that’s in the video below;

I’ll follow up with something a little more involved in a later post.

Windows 10, 1607, UWP and Experimenting with the Kinect for Windows V2 Update

I was really pleased to see this blog post;

Kinect demo code and new driver for UWP now available

announcing a new driver which provides more access to the functionality of the Kinect for Windows V2 into Windows 10 including for the UWP developer.

I wrote a little about this topic in this earlier post around 10 months ago when some initial functionality became available for the UWP developer;

Kinect V2, Windows Hello and Perception APIs

and so it’s great to see that more functionality has become available and, specifically, that skeletal data is being surfaced.

I plugged my Kinect for Windows V2 into my Surface Pro 3 and had a look at the driver being used for Kinect.

image

and I attempted to do an update but didn’t seem to see one but it’s possible that the version of the driver which I have;

image

is the latest driver as it seems to be a week or two old. At the time of writing, I haven’t confirmed this driver version but I went on to download the C++ sample from GitHub;

Camera Stream Correlation Sample

and ran it up on my Surface Pro 3 where it initially displayed the output of the rear webcam;

image

and so I pressed the ‘Next Source’ button and it attempted to work with the RealSense camera on my machine;

image

and so I pressed the ‘Next Source’ button and things seemed to hang. I’m unsure of the status of my RealSense drivers on this machine and so I disabled the RealSense virtual camera driver;

image

and then re-ran the sample and, sure enough, I could use the ‘Next Source’ button to move to the Kinect for Windows V2 sensor and then I used the ‘Toggle Depth Fading’ button to turn that option off and the ‘Toggle Skeletal Overlay’ button to switch that option on and, sure enough, I’ve got a (flat) skeletal overlay on the colour frames and it’s delivering very smooth performance here;

image

and so that’s great to see working. Given that the sample seemed to be C++ code, I wondered what this might look like for a C# developer working with the UWP and so I set about seeing if I could reproduce some of the core of what the sample is doing here.

Getting Skeletal Data Into a C# UWP App

Rather than attempting to ‘port’ the C++ sample, I started by lifting pieces of the code that I’d written for that earlier blog post into a new project.

I made a blank app targeting SDK 14393, made sure that it had access to webcam and microphone and then added in win2d.uwp as a NuGet package and added a little UI;

<Page
    x:Class="KinectTestApp.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:KinectTestApp"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:w2d="using:Microsoft.Graphics.Canvas.UI.Xaml"
    mc:Ignorable="d">

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <TextBlock
            FontSize="36"
            HorizontalAlignment="Center"
            VerticalAlignment="Center"
            TextAlignment="Center"
            Text="No Cameras" />
        <w2d:CanvasControl
            x:Name="canvasControl"
            Visibility="Collapsed"
            SizeChanged="OnCanvasControlSizeChanged"
            Draw="OnDraw"/>
    </Grid>
</Page>

From there, I wanted to see if I could get a basic render of the colour frame from the camera along with an overlay of some skeletal points.

I’d spotted that the official samples include a project which builds out a WinRT component that is then used to interpret the custom data that comes from the Kinect via a MediaFrameReference and so I included a reference to this project into my solution so that I could use it in my C# code. That project is here and looks to stand independent of the surrounding sample. I made my project reference as below;

image

and then set about trying to see if I could write some code that got colour data and skeletal data onto the screen.

I wrote a few, small, supporting classes and named them all with an mt* prefix to try and make it more obvious which code here is mine rather than in the framework or the sample. This simple class delivers a SoftwareBitmap containing the contents of the colour frame to be fired as an event;

namespace KinectTestApp
{
  using System;
  using Windows.Graphics.Imaging;

  class mtSoftwareBitmapEventArgs : EventArgs
  {
    public SoftwareBitmap Bitmap { get; set; }
  }
}

whereas this class delivers the data that I’ve decided I need in order to draw a subset of the skeletal data onto the screen;

namespace KinectTestApp
{
  using System;

  class mtPoseTrackingFrameEventArgs : EventArgs
  {
    public mtPoseTrackingDetails[] PoseEntries { get; set; }
  }
}

and it’s a simple array which will be populated with one of these types below for each user being tracked by the sensor;

namespace KinectTestApp
{
  using System;
  using System.Linq;
  using System.Numerics;
  using Windows.Foundation;
  using Windows.Media.Devices.Core;
  using WindowsPreview.Media.Capture.Frames;

  class mtPoseTrackingDetails
  {
    public Guid EntityId { get; set; }
    public Point[] Points { get; set; }

    public static mtPoseTrackingDetails FromPoseTrackingEntity(
      PoseTrackingEntity poseTrackingEntity,
      CameraIntrinsics colorIntrinsics,
      Matrix4x4 depthColorTransform)
    {
      mtPoseTrackingDetails details = null;

      var poses = new TrackedPose[poseTrackingEntity.PosesCount];
      poseTrackingEntity.GetPoses(poses);

      var points = new Point[poses.Length];

      colorIntrinsics.ProjectManyOntoFrame(
        poses.Select(p => Multiply(depthColorTransform, p.Position)).ToArray(),
        points);

      details = new mtPoseTrackingDetails()
      {
        EntityId = poseTrackingEntity.EntityId,
        Points = points
      };
      return (details);
    }
    static Vector3 Multiply(Matrix4x4 matrix, Vector3 position)
    {
      return (new Vector3(
        position.X * matrix.M11 + position.Y * matrix.M21 + position.Z * matrix.M31 + matrix.M41,
        position.X * matrix.M12 + position.Y * matrix.M22 + position.Z * matrix.M32 + matrix.M42,
        position.X * matrix.M13 + position.Y * matrix.M23 + position.Z * matrix.M33 + matrix.M43));
    }
  }
}

which would be a simple class containing a GUID to identify the tracked person and an array of Points representing their tracked joints except that I wanted those 2D Points to be in the colour space which means having to map them from the depth space that the sensor presents them in and so the FromPoseTrackingEntity() method takes a PoseTrackingEntity which is one of the types from the referenced C++ project and;

  1. Extracts the ‘poses’ (i.e. joints in my terminology)
  2. Uses the CameraIntrinsics from the colour camera to project them onto its frame having first transformed them using a matrix which maps from depth space to colour space.

Step 2 is code that I largely duplicated from the original C++ sample after trying a few other routes which didn’t end well for me Smile

I then wrote this class which wraps up a few areas;

namespace KinectTestApp
{
  using System;
  using System.Linq;
  using System.Threading.Tasks;
  using Windows.Media.Capture;
  using Windows.Media.Capture.Frames;

  class mtMediaSourceReader
  {
    public mtMediaSourceReader(
      MediaCapture capture, 
      MediaFrameSourceKind mediaSourceKind,
      Action<MediaFrameReader> onFrameArrived,
      Func<MediaFrameSource, bool> additionalSourceCriteria = null)
    {
      this.mediaCapture = capture;
      this.mediaSourceKind = mediaSourceKind;
      this.additionalSourceCriteria = additionalSourceCriteria;
      this.onFrameArrived = onFrameArrived;
    }
    public bool InitialiseWithMediaCapture()
    {
      this.mediaSource = this.mediaCapture.FrameSources.FirstOrDefault(
        fs =>
          (fs.Value.Info.SourceKind == this.mediaSourceKind) &&
          ((this.additionalSourceCriteria != null) ? 
            this.additionalSourceCriteria(fs.Value) : true)).Value;   

      return (this.mediaSource != null);
    }
    public async Task OpenReaderAsync()
    {
      this.frameReader =
        await this.mediaCapture.CreateFrameReaderAsync(this.mediaSource);

      this.frameReader.FrameArrived +=
        (s, e) =>
        {
          this.onFrameArrived(s);
        };

      await this.frameReader.StartAsync();
    }
    Func<MediaFrameSource, bool> additionalSourceCriteria;
    Action<MediaFrameReader> onFrameArrived;
    MediaFrameReader frameReader;
    MediaFrameSource mediaSource;
    MediaCapture mediaCapture;
    MediaFrameSourceKind mediaSourceKind;
  }
}

This type takes a MediaCapture and a MediaSourceKind and can then report via the Initialise() method whether that media source kind is available on that media capture. It can also apply some additional criteria if they are provided in the constructor. This class can also create a frame reader and redirect its FrameArrived events into the method provided to the constructor. There should be some way to stop this class as well but I haven’t written that yet.

With those classes in place, I added the following mtKinectColorPoseFrameHelper;

namespace KinectTestApp
{
  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Numerics;
  using System.Threading.Tasks;
  using Windows.Media.Capture;
  using Windows.Media.Capture.Frames;
  using Windows.Media.Devices.Core;
  using Windows.Perception.Spatial;
  using WindowsPreview.Media.Capture.Frames;

  class mtKinectColorPoseFrameHelper
  {
    public event EventHandler<mtSoftwareBitmapEventArgs> ColorFrameArrived;
    public event EventHandler<mtPoseTrackingFrameEventArgs> PoseFrameArrived;

    public mtKinectColorPoseFrameHelper()
    {
      this.softwareBitmapEventArgs = new mtSoftwareBitmapEventArgs();
    }
    internal async Task<bool> InitialiseAsync()
    {
      bool necessarySourcesAvailable = false;

      // Find all possible source groups.
      var sourceGroups = await MediaFrameSourceGroup.FindAllAsync();

      // We try to find the Kinect by asking for a group that can deliver
      // color, depth, custom and infrared. 
      var allGroups = await GetGroupsSupportingSourceKindsAsync(
        MediaFrameSourceKind.Color,
        MediaFrameSourceKind.Depth,
        MediaFrameSourceKind.Custom,
        MediaFrameSourceKind.Infrared);

      // We assume the first group here is what we want which is not
      // necessarily going to be right on all systems so would need
      // more care.
      var firstSourceGroup = allGroups.FirstOrDefault();

      // Got one that supports all those types?
      if (firstSourceGroup != null)
      {
        this.mediaCapture = new MediaCapture();

        var captureSettings = new MediaCaptureInitializationSettings()
        {
          SourceGroup = firstSourceGroup,
          SharingMode = MediaCaptureSharingMode.SharedReadOnly,
          StreamingCaptureMode = StreamingCaptureMode.Video,
          MemoryPreference = MediaCaptureMemoryPreference.Cpu
        };
        await this.mediaCapture.InitializeAsync(captureSettings);

        this.mediaSourceReaders = new mtMediaSourceReader[]
        {
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Color, this.OnFrameArrived),
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Depth, this.OnFrameArrived),
          new mtMediaSourceReader(this.mediaCapture, MediaFrameSourceKind.Custom, this.OnFrameArrived,
            DoesCustomSourceSupportPerceptionFormat)
        };

        necessarySourcesAvailable = 
          this.mediaSourceReaders.All(reader => reader.Initialise());

        if (necessarySourcesAvailable)
        {
          foreach (var reader in this.mediaSourceReaders)
          {
            await reader.OpenReaderAsync();
          }
        }
        else
        {
          this.mediaCapture.Dispose();
        }
      }
      return (necessarySourcesAvailable);
    }
    void OnFrameArrived(MediaFrameReader sender)
    {
      var frame = sender.TryAcquireLatestFrame();

      if (frame != null)
      {
        switch (frame.SourceKind)
        {
          case MediaFrameSourceKind.Custom:
            this.ProcessCustomFrame(frame);
            break;
          case MediaFrameSourceKind.Color:
            this.ProcessColorFrame(frame);
            break;
          case MediaFrameSourceKind.Infrared:
            break;
          case MediaFrameSourceKind.Depth:
            this.ProcessDepthFrame(frame);
            break;
          default:
            break;
        }
        frame.Dispose();
      }
    }
    void ProcessDepthFrame(MediaFrameReference frame)
    {
      if (this.colorCoordinateSystem != null)
      {
        this.depthColorTransform = frame.CoordinateSystem.TryGetTransformTo(
          this.colorCoordinateSystem);
      }     
    }
    void ProcessColorFrame(MediaFrameReference frame)
    {
      if (this.colorCoordinateSystem == null)
      {
        this.colorCoordinateSystem = frame.CoordinateSystem;
        this.colorIntrinsics = frame.VideoMediaFrame.CameraIntrinsics;
      }
      this.softwareBitmapEventArgs.Bitmap = frame.VideoMediaFrame.SoftwareBitmap;
      this.ColorFrameArrived?.Invoke(this, this.softwareBitmapEventArgs);
    }
    void ProcessCustomFrame(MediaFrameReference frame)
    {
      if ((this.PoseFrameArrived != null) &&
        (this.colorCoordinateSystem != null))
      {
        var trackingFrame = PoseTrackingFrame.Create(frame);
        var eventArgs = new mtPoseTrackingFrameEventArgs();

        if (trackingFrame.Status == PoseTrackingFrameCreationStatus.Success)
        {
          // Which of the entities here are actually tracked?
          var trackedEntities =
            trackingFrame.Frame.Entities.Where(e => e.IsTracked).ToArray();

          var trackedCount = trackedEntities.Count();

          if (trackedCount > 0)
          {
            eventArgs.PoseEntries =
              trackedEntities
              .Select(entity =>
                mtPoseTrackingDetails.FromPoseTrackingEntity(entity, this.colorIntrinsics, this.depthColorTransform.Value))
              .ToArray();
          }
          this.PoseFrameArrived(this, eventArgs);
        }
      }
    }
    async static Task<IEnumerable<MediaFrameSourceGroup>> GetGroupsSupportingSourceKindsAsync(
      params MediaFrameSourceKind[] kinds)
    {
      var sourceGroups = await MediaFrameSourceGroup.FindAllAsync();

      var groups =
        sourceGroups.Where(
          group => kinds.All(
            kind => group.SourceInfos.Any(sourceInfo => sourceInfo.SourceKind == kind)));

      return (groups);
    }
    static bool DoesCustomSourceSupportPerceptionFormat(MediaFrameSource source)
    {
      return (
        (source.Info.SourceKind == MediaFrameSourceKind.Custom) &&
        (source.CurrentFormat.MajorType == PerceptionFormat) &&
        (Guid.Parse(source.CurrentFormat.Subtype) == PoseTrackingFrame.PoseTrackingSubtype));
    }
    SpatialCoordinateSystem colorCoordinateSystem;
    mtSoftwareBitmapEventArgs softwareBitmapEventArgs;
    mtMediaSourceReader[] mediaSourceReaders;
    MediaCapture mediaCapture;
    CameraIntrinsics colorIntrinsics;
    const string PerceptionFormat = "Perception";
    private Matrix4x4? depthColorTransform;
  }
}

This is essentially doing;

  1. InitialiseAsync
    1. Using the MediaFrameSourceGroup type to try and find a source group that looks like it is Kinect by searching for Infrared+Color+Depth+Custom source kinds. This isn’t a complete test and it might be better to make it more complete. Also, there’s an assumption that the first group found is the best which isn’t likely to always hold true.
    2. Initialising a MediaCapture for the group found in step 1 above.
    3. Initialising three of my mtMediaSourceReader types for the Color/Depth/Custom source kinds and adding some extra criteria for the Custom source type to try and make sure that it supports the ‘Perception’ media format – this code is essentially lifted from the original sample.
    4. Opening frame readers on those three items and handling the events as frame arrives.
  2. OnFrameArrived simply passes the frame on to sub-functions based on type and this could have been done by deriving specific mtMediaSourceReaders.
  3. ProcessDepthFrame tries to get a transformation from depth space to colour space for later use.
  4. ProcessColorFrame fires the ColorFrameArrived event with the SoftwareBitmap that has been received.
  5. ProcessCustomFrame handles the custom frame by;
    1. Using the PoseTrackingFrame.Create() method from the referenced C++ project to interpret the raw data that comes from the custom sensor.
    2. Determining how many bodies are being tracked by the data.
    3. Converts the data types from the referenced C++ project to my own data types which include less of the data and which try to map the positions of joints given using 3D depth points to their respective 2D colour space points.

Lastly, there’s some code-behind which tries to glue this into the UI;

namespace KinectTestApp
{
  using Microsoft.Graphics.Canvas;
  using Microsoft.Graphics.Canvas.UI.Xaml;
  using System.Numerics;
  using System.Threading;
  using Windows.Foundation;
  using Windows.Graphics.Imaging;
  using Windows.UI;
  using Windows.UI.Core;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += this.OnLoaded;
    }
    void OnCanvasControlSizeChanged(object sender, SizeChangedEventArgs e)
    {
      this.canvasSize = new Rect(0, 0, e.NewSize.Width, e.NewSize.Height);
    }
    async void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.helper = new mtKinectColorPoseFrameHelper();

      this.helper.ColorFrameArrived += OnColorFrameArrived;
      this.helper.PoseFrameArrived += OnPoseFrameArrived;

      var suppported = await this.helper.InitialiseAsync();

      if (suppported)
      {
        this.canvasControl.Visibility = Visibility.Visible;
      }
    }
    void OnColorFrameArrived(object sender, mtSoftwareBitmapEventArgs e)
    {
      // Note that when this function returns to the caller, we have
      // finished with the incoming software bitmap.
      if (this.bitmapSize == null)
      {
        this.bitmapSize = new Rect(0, 0, e.Bitmap.PixelWidth, e.Bitmap.PixelHeight);
      }

      if (Interlocked.CompareExchange(ref this.isBetweenRenderingPass, 1, 0) == 0)
      {
        this.lastConvertedColorBitmap?.Dispose();

        // Sadly, the format that comes in here, isn't supported by Win2D when
        // it comes to drawing so we have to convert. The upside is that 
        // we know we can keep this bitmap around until we are done with it.
        this.lastConvertedColorBitmap = SoftwareBitmap.Convert(
          e.Bitmap,
          BitmapPixelFormat.Bgra8,
          BitmapAlphaMode.Ignore);

        // Cause the canvas control to redraw itself.
        this.InvalidateCanvasControl();
      }
    }
    void InvalidateCanvasControl()
    {
      // Fire and forget.
      this.Dispatcher.RunAsync(CoreDispatcherPriority.High, this.canvasControl.Invalidate);
    }
    void OnPoseFrameArrived(object sender, mtPoseTrackingFrameEventArgs e)
    {
      // NB: we do not invalidate the control here but, instead, just keep
      // this frame around (maybe) until the colour frame redraws which will 
      // (depending on race conditions) pick up this frame and draw it
      // too.
      this.lastPoseEventArgs = e;
    }
    void OnDraw(CanvasControl sender, CanvasDrawEventArgs args)
    {
      // Capture this here (in a race) in case it gets over-written
      // while this function is still running.
      var poseEventArgs = this.lastPoseEventArgs;

      args.DrawingSession.Clear(Colors.Black);

      // Do we have a colour frame to draw?
      if (this.lastConvertedColorBitmap != null)
      {
        using (var canvasBitmap = CanvasBitmap.CreateFromSoftwareBitmap(
          this.canvasControl,
          this.lastConvertedColorBitmap))
        {
          // Draw the colour frame
          args.DrawingSession.DrawImage(
            canvasBitmap,
            this.canvasSize,
            this.bitmapSize.Value);

          // Have we got a skeletal frame hanging around?
          if (poseEventArgs?.PoseEntries?.Length > 0)
          {
            foreach (var entry in poseEventArgs.PoseEntries)
            {
              foreach (var pose in entry.Points)
              {
                var centrePoint = ScalePosePointToDrawCanvasVector2(pose);

                args.DrawingSession.FillCircle(
                  centrePoint, circleRadius, Colors.Red);
              }
            }
          }
        }
      }
      Interlocked.Exchange(ref this.isBetweenRenderingPass, 0);
    }
    Vector2 ScalePosePointToDrawCanvasVector2(Point posePoint)
    {
      return (new Vector2(
        (float)((posePoint.X / this.bitmapSize.Value.Width) * this.canvasSize.Width),
        (float)((posePoint.Y / this.bitmapSize.Value.Height) * this.canvasSize.Height)));
    }
    Rect? bitmapSize;
    Rect canvasSize;
    int isBetweenRenderingPass;
    SoftwareBitmap lastConvertedColorBitmap;
    mtPoseTrackingFrameEventArgs lastPoseEventArgs;
    mtKinectColorPoseFrameHelper helper;
    static readonly float circleRadius = 10.0f;
  }
}

I don’t think there’s too much in there that would require explanation other than that I took a couple of arbitrary decisions;

  1. That I essentially process one colour frame at a time using a form of ‘lock’ to try and drop any colour frames that arrive while I am still in the process of drawing the last colour frame and that ‘drawing’ involves both the method OnColorFrameArrived and the async call to OnDraw it causes.
  2. That I don’t force a redraw when a ‘pose’ frame arrives. Instead, the data is held until the next OnDraw call which comes from handling the colour frames.It’s certainly possible that the various race conditions involved there might cause that frame to be dropped and another to replace it in the meantime.

Even though there’s a lot of allocations going on in that code as it stands, here’s a screenshot of it running and the performance isn’t bad at all running it from my Surface Pro 3 and I’m particularly pleased with the red nose that I end up with here Smile

image

The code is quite rough and ready as I was learning as I went along and some next steps might be to;

  1. Draw joints that are inferred in a different colour to those that are properly tracked.
  2. Draw the skeleton rather than just the joints.
  3. Do quite a lot of optimisations as the code here allocates a lot.
  4. Do more tracking around entities arriving/leaving based on their IDs and handle multiple people with different colours.
  5. Refactor to specialise the mtMediaSourceReader class to have separate types for Color/Depth/Custom and thereby tidy up the code which uses this type.

but, for now, I was just trying to get some basics working.

Here’s the code on GitHub if you want to try things out and note that you’d need that additional sample code from the official samples to make it work.

Windows 10, 1607, UWP – Screencast of a WPF App Calling UWP APIs With/Without an .APPX Package

This post is just a companion post to a number of other posts that I’ve written around the desktop app converter, especially;

I’d made a demo of making a simple, blank WPF app and then using it to;

  • Call into UWP APIs
  • Call into UWP APIs that need a package identity
  • Package into a UWP .appx package
  • Show the deployment project template in Visual Studio “15” Preview that helps with debugging

All of that was in the original post referenced above but the video below is a little more up to date. It’s also a little ‘rough and ready’ so excuse the production values Smile

It also lines up with this post where I have a short screen capture of automatically making a .appx package from a .MSI via the Desktop App Converter going end-to-end from the installation of the converter through to installing/uninstalling the app;

A Quick Skip Through the Desktop App Converter

I thought I’d publish this after seeing the great post that @qmatteoq flagged on Twitter today which covers very similar ground with more detail so definitely check out that post as well if you’re interested in this area.

Windows 10, UWP, HoloLens and Switching 2D/3D Views

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

One of the things that I wanted to understand a bit better when I wrote this post;

Baby Steps on my HoloLens Developer Journey

was the interchange between 2D and 3D views on the HoloLens as discussed in this document;

App Model – Switching Views

and I wanted to experiment with seeing if I could get an app up and running which switched between a 2D XAML based view and a 3D Unity based view.

To get going with that, I made a fairly blank Unity project in accordance with the steps here;

Configuring a Unity project for HoloLens

and then I added a cube into my scene so that I had something to look at;

image

and then made sure that I was exporting my project as a XAML based project as I mentioned in this previous post;

Windows 10, UWP, Unity, HoloLens– Small Explorations of D3D and XAML based Unity Projects 

as I had a suspicion that the code that I was going to write might be dependent on having the initial view in the app come from the 2D/XAML world rather than the 3D/D3D world although I have yet to test that suspicion so apply a pinch of salt.

I placed a simple script onto my Cube in the scene above although the script is really a global handler so it didn’t need to be attached onto the cube but I needed something to hang my hat on and so I used the Cube;

image

and that script looks like this;

using UnityEngine;
using System.Collections;
using UnityEngine.VR.WSA.Input;

public class TestScript : MonoBehaviour
{

  GestureRecognizer recognizer;
  // Use this for initialization
  void Start()
  {
    this.recognizer = new GestureRecognizer();
    this.recognizer.TappedEvent += OnTapped;
    this.recognizer.StartCapturingGestures();
  }

  private void OnTapped(InteractionSourceKind source, int tapCount, Ray headRay)
  {
#if !UNITY_EDITOR
    ViewLibrary.ViewManagement.SwitchTo2DViewAsync();
#endif
  }

  // Update is called once per frame
  void Update()
  {

  }
}

and so it’s a very simple script and it’s just waiting for a tap (anywhere) before making a call into this SwitchTo2DViewAsync function and I’ve hidden that from the Unity editor so that it doesn’t have to think about it. The Tap isn’t specific to the Cube in any way hence my earlier comment about the script not really ‘belonging’ to the Cube.

That ViewLibrary code lives in a separate class library that I have tried to bring in to the Unity environment as a plugin;

image

and the way I did that came from this previous blog post;

Windows 10 UWP Unity and Adding a Reference to a (UWP) Class Library

The code inside that ViewManagement class looks like this and it’s a bit experimental at the time of writing but it “seems to work”;

namespace ViewLibrary
{
  using System;
  using System.Threading.Tasks;
  using Windows.ApplicationModel.Core;
  using Windows.UI;
  using Windows.UI.Core;
  using Windows.UI.ViewManagement;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;
  using Windows.UI.Xaml.Media;

  public static class ViewManagement
  {
    public static async Task SwitchTo2DViewAsync()
    {
      if (coreView3d == null)
      {
        coreView3d = CoreApplication.MainView;
      }
      if (coreView2d == null)
      {
        coreView2d = CoreApplication.CreateNewView();

        await RunOnDispatcherAsync(
          coreView2d, 
          async () =>
          {
            Window.Current.Content = Create2dUI();
          }
        );
      }
      await RunOnDispatcherAsync(coreView2d, SwitchViewsAsync);
    }
    static UIElement Create2dUI()
    {
      var button = new Button()
      {
        HorizontalAlignment = HorizontalAlignment.Stretch,
        VerticalAlignment = VerticalAlignment.Stretch,
        Content = "Back to 3D",
        Background = new SolidColorBrush(Colors.Red)
      };
      button.Click += async (s, e) =>
      {
        await SwitchTo3DViewAsync();
      };
      return (button);
    }
    static async Task RunOnDispatcherAsync(CoreApplicationView view, 
      Func<Task> action)
    {
      await view.Dispatcher.RunAsync(CoreDispatcherPriority.Normal,
        () => action());
    }
    public static async Task SwitchTo3DViewAsync()
    {
      await RunOnDispatcherAsync(coreView3d, SwitchViewsAsync);
    }
    static async Task SwitchViewsAsync()
    {
      var view = ApplicationView.GetForCurrentView();
      await ApplicationViewSwitcher.SwitchAsync(view.Id);
      Window.Current.Activate();
    }
    static CoreApplicationView coreView3d;
    static CoreApplicationView coreView2d;
  }
}

Mostly, that code came from this blog post about using multiple views in a regular UWP app but I manipulated it around a little here.

If I run this up on the emulator or an a device then I see my initial holographic view of the app containing my Cube;

image

and then if I tap I see;

image

and then if I Click I see;

image

I wouldn’t say that I have a 100% grip on this at the time of finishing this post but I think I understand it better than when I started writing it Smile

I’d like to dig into whether this same approach works with a project that has been exported as D3D rather than as XAML and I’ll update the post as/when I figure that out.