Windows 10, UWP, HoloLens and PeerFinder

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

A very short post. I was experimenting with whether I might be able to use the PeerFinder class for an app running on HoloLens to find and connect to an instance of the same app running on another HoloLens. I didn’t spot it in the list of APIs that have limitations here so I gave it a whirl with this piece of code;

image

and you can see from the debugger there that running on the device seems to suggest that the SupportedDiscoveryTypes on the device is returning None so I assume that this means that the API isn’t going to work.

Naturally, let me know if you know different and I’ll update the post accordingly.

In the meantime, I’m going to look to a different mechanism via which one ‘peer’ might find another with the minimal of pre-configuration.

Windows 10, UWP, IoT Core, SpeechSynthesizer, Raspberry PI and ‘Audio Popping’

My reader mailed in with a query about speech synthesis on Windows 10 and the Universal Windows Platform.

Essentially, they were doing something similar to things that I’d shown in demos in this Channel9 show about speech;

image

and in this article on the Windows blog;

Using speech in your UWP apps: It’s good to talk

and the core of the code was to synthesize various pieces of text to speech and then play them one after another – something like the sample code below which I made to try and reproduce the situation and it’s an event  handler taken from a fairly blank UWP application;


async void OnLoaded(object sender, RoutedEventArgs args)
    {
      using (var synth = new SpeechSynthesizer())
      {
        using (var mediaPlayer = new MediaPlayer())
        {
          TaskCompletionSource<bool> source = null;

          mediaPlayer.MediaEnded += (s, e) =>
          {
            source.SetResult(true);
          };
          for (int i = 0; i < 100; i++)
          {
            var speechText = $"This is message number {i + 1}";
            source = new TaskCompletionSource<bool>();

            using (var speechStream = await synth.SynthesizeTextToStreamAsync(speechText))
            {
              mediaPlayer.Source = MediaSource.CreateFromStream(speechStream, speechStream.ContentType);
              mediaPlayer.Play();              
            }
            await source.Task;
            await Task.Delay(1000);
          }
        }
      }
    }

Now, if I run that code on my PC then everything works as I would expect – I get 100 spoken messages separated by at least 1 second of silence.

However, as my reader pointed out – if I run this on Windows IoT Core on Raspberry PI (2 or 3) then each spoken message is preceded by a popping sound on the audio and it’s not something that you’d want to listen to in a real-world scenario.

I hadn’t come across this before and so did a bit of searching around and found this thread on the MSDN forums;

Clicking sound during start and stop of audio playback

and the upshot of that thread seems to be that there’s an idea that this problem is caused by an issue in the firmware on the Raspberry PI that’s not going to be fixed and so there doesn’t seem to really be a solution there.

The thread does, though, suggest that this problem might be mitigated by using the AudioGraph APIs instead of using MediaPlayer as I’ve done in my code snippet above.

That proves to be a little more tricky though because the AudioGraph APIs seem to allow you to construct inputs from;

and I don’t see an obvious way in which any of these can be used to model a stream of data which is what I get back when I perform Text To Speech using the SpeechSynthesizer class.

The only way to proceed would appear to be to copy the speech stream into some file stream and then have an AudioFileInputNode reading from that stream.

With that in mind, I tried to write code which would;

  1. Create a temporary file
  2. Create an audio graph consisting of a connection between
    1. An AudioFileInputNode representing my temporary file
    2. An AudioDeviceOutputNode for the default audio rendering device on the system
  3. Perform Text to Speech
  4. Write the resulting stream to the temporary file
  5. Have the AudioGraph notice that the input file had been written to, thereby causing it to play the media from that file out of the default audio rendering device on the system

and my aim here was to avoid;

  1. having to recreate either the entire AudioGraph or any of the two input/output nodes within it for each piece of speech
  2. having to create a separate temporary file for every piece of speech
  3. having to create an ever-growing temporary file containing all the pieces of speech concatenated together

and I had hoped to be able to rely on the ability of nodes in an AudioGraph (and the graph itself) all having Start/Stop/Reset methods.

In practice, I’ve yet to get this to really work. I can happily get an AudioInputFileNode to play audio from a file out through its connected output node. However, once that input node has finished playing I don’t seem to be able to find any combination of Start/Stop/Reset/Seek which will get it to play subsequent audio that might arrive in the file by my code altering the file contents.

The closest that I’ve got to working code is what follows below where I create new AudioFileInputNode instances for each piece of speech that is to be spoken.

    async void OnLoaded(object sender, RoutedEventArgs args)
    {
      var temporaryFile = await TemporaryFileCreator.CreateTemporaryFileAsync();

      using (var speechSynthesizer = new SpeechSynthesizer())
      {
        var graphResult = await AudioGraph.CreateAsync(new AudioGraphSettings(AudioRenderCategory.Media));

        if (graphResult.Status == AudioGraphCreationStatus.Success)
        {
          using (var graph = graphResult.Graph)
          {
            var outputResult = await graph.CreateDeviceOutputNodeAsync();

            if (outputResult.Status == AudioDeviceNodeCreationStatus.Success)
            {
              graph.Start();

              using (var outputNode = outputResult.DeviceOutputNode)
              {
                for (int i = 0; i < 100; i++)
                {
                  var speechText = $"This is message number {i + 1}";

                  await speechSynthesizer.SynthesizeTextToFileAsync(speechText, temporaryFile);

                  // TBD: I want to avoid this creating of 100 input file nodes but
                  // I don't seem (yet) to be able to get away from it so right now
                  // I keep creating new input nodes over the same file which changes
                  // every iteration of the loop.
                  var inputResult = await graph.CreateFileInputNodeAsync(temporaryFile);

                  if (inputResult.Status == AudioFileNodeCreationStatus.Success)
                  {
                    using (var inputNode = inputResult.FileInputNode)
                    {
                      inputNode.AddOutgoingConnection(outputNode);
                      await inputNode.WaitForFileCompletedAsync();
                    }
                  }
                  await Task.Delay(1000);
                }
              }
              graph.Stop();
            }
          }
        }
      }
      await temporaryFile.DeleteAsync();
    }

and that code depends on a class that can create temporary files;

  public static class TemporaryFileCreator
  {
    public static async Task<StorageFile> CreateTemporaryFileAsync()
    {
      var fileName = $"{Guid.NewGuid()}.bin";

      var storageFile =
        await ApplicationData.Current.TemporaryFolder.CreateFileAsync(fileName);

      return (storageFile);
    }
  }

and also on an extension to the SpeechSynthesizer which will take the speech and write it to a file;

 public static class SpeechSynthesizerExtensions
  {
    public static async Task<StorageFile> SynthesizeTextToTemporaryFileAsync(this SpeechSynthesizer synthesizer, string text)
    {
      var storageFile = await TemporaryFileCreator.CreateTemporaryFileAsync();

      await SynthesizeTextToFileAsync(synthesizer, text, storageFile);

      return (storageFile);
    }
    public static async Task SynthesizeTextToFileAsync(this SpeechSynthesizer synthesizer, string text, StorageFile file)
    {
      using (var speechStream = await synthesizer.SynthesizeTextToStreamAsync(text))
      {
        using (var fileStream = await file.OpenAsync(FileAccessMode.ReadWrite))
        {
          await RandomAccessStream.CopyAndCloseAsync(speechStream, fileStream);
        }
      }
    }
  }

and also on an extension to the AudioFileInputNode class which takes the FileCompleted event that it fires and turns it into something that can be awaited;

  public static class AudioInputFileNodeExtensions
  {
    public static async Task WaitForFileCompletedAsync(this AudioFileInputNode inputNode)
    {
      TypedEventHandler<AudioFileInputNode, object> handler = null;
      TaskCompletionSource<bool> completed = new TaskCompletionSource<bool>();

      handler = (s, e) =>
      {
        s.FileCompleted -= handler;
        completed.SetResult(true);
      };
      inputNode.FileCompleted += handler;

      await completed.Task;
    }
  }

This code seems to work fine on both PC and Raspberry PI but I find that on Raspberry PI I still get an audible ‘pop’ when the code first starts up but I don’t then get an audible ‘pop’ for every piece of speech – i.e. it feels like the situation is improved but not perfect. I’d ideally like to;

  • Get rid of the code that ends up creating N AudioFileInputNode instances rather than 1 and somehow make the Start/Stop/Reset/Seek approach work.
  • Get rid of that initial audible ‘pop’.

I’ll update the post if I manage to come up with a better solution and do feel very free to add comments below if you know of either a solution to the original problem or a better solution than I’ve found to date…

Hitchhiking the HoloToolkit-Unity, Leg 9–Holes in the Walls

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

I must admit that two words which can sometimes strike fear into my heart are the words;

Case Study Winking smile

I’m partially kidding but I’m not a huge fan of case studies which can sometimes be fairly dry write-ups of the form;

“Company C took technology T and solved problem P in time T and saved D dollars”

Of course, that sort of stuff is really important and it is always going to depend on the write-up but I don’t generally put reading case studies to the top of my to-do list.

Against that backdrop, I’ve been very pleasantly surprised by the really interesting HoloLens developer case studies that are published on this site;

HoloLens Developer Case Studies

and I’ve been working my way through them because they are much more at the level of;

“developer wanted to achieve X, this is how they went about it and the challenges they came across in doing it”

and one of the entries that I read quite a long time ago was this one about how to make holes in your walls, ceilings and floors;

“Case study – Looking through holes in your reality”

and it’s brilliant because this idea of being able to “look through surfaces” in HoloLens apps is one of the things that I’ve found to be truly magical across apps like Fragments, RoboRaid and others and the technique isn’t really a complicated one so it’s great to see some of the magic revealed here.

If you haven’t seen aliens coming through your walls in an app like RoboRaid then there’s a video here which shows it in action and also talks a little about what the device and the app developer are doing to pull off the illusion;

Inside of the HoloToolkit, there are some pieces that can help with experimenting with this type of effect and so I went off and got the toolkit (as per this video) and I brought the sections Build, Input, UI, Utilities and SpatialMapping into my project and set it up (as per this video).

Within the Utilities section of the toolkit there is a pre-baked scene called WindowOcclusion;

image

and that has a camera, 4 quads and 5 cubes set up to provide the ‘looking through a window’ effect that the Case Study talks about;

image

and those 4 quads are kind of ‘interesting’ in that they aren’t immediately visible here;

image

but they are being shaded so as to occlude the content behind them;

image

and so the user gets the illusion of looking through the window as it’s essentially the hole left by the 4 quads surrounding it which occlude everything else.

For me, that illusion works best if that window is positioned on a wall whereas this pre-baked scene places the window approx 1.7m in front of wherever the user was looking when the app started. It might be relatively simple though to use spatial mapping and the ‘tap to place’ behaviour to change that and have the user position the window onto a surface.

I thought I’d give that a spin…

Adding Tap to Place

I took the two components from that scene, collected them into an empty game object and then made a prefab from that in Unity called WindowAndContent.

image

and then I add a simple blank placeholder (at the origin) and a quad into my scene 2m in front of the origin. The idea of the quad is to give me something that I can tap and place onto a wall in order to position where I want my window (and the content beyond the window) to appear;

image

and so the intended process is going to be something like;

  • Create quad 2m in front of the user.
  • Allow the user to tap on the quad.
  • Have the quad follow the user’s gaze around their walls (this is the tap to place behaviour).
  • Allow the user to tap.
  • Remove the quad and replace it with the WindowAndContent prefab positioned at the same place and oriented the same way.

To be able to position the quad and the window onto a wall involves spatial mapping and so I made sure that I had spatial perception switched on as a UWP capability and I added the SpatialMapping prefab from the HoloToolkity as you can see in the screenshot above.

I then gave my placeholder object a bunch of behaviours in order to facilitate using the Tap to Place script on my quad;

image

and then I added the Tap to Place script to that quad;

image

but I hacked that script ever so slightly in order to change two things;

  1. By default, the script makes the spatial mapping mesh visible when the object has been tapped and is following the user’s gaze but I didn’t want this so I took it out.
  2. I added a line or two of code such that when the object is placed, the script would fire a Placed event.

I wanted that Placed event so that I could add a script to my quad as shown below;

image

and that script handles the Placed event on my modified TapToPlace component in order to try and get rid of the quad and replace it with the WindowAndContent prefab so that the window would appear where the quad had been positioned. There’s probably a better way of doing this but here’s the script that I used;

using HoloToolkit.Unity;
using UnityEngine;

public class QuadScript : MonoBehaviour
{
  public Transform prefab;

  void Start()
  {
    this.tapToPlace = this.GetComponent<TapToPlace>();
    this.tapToPlace.Placed += this.OnPlaced;
  }

  void OnPlaced(object sender, System.EventArgs e)
  {
    // We're done now.
    this.tapToPlace.Placed -= this.OnPlaced;
    this.tapToPlace = null;

    var windowAndContent = Instantiate(prefab);
    windowAndContent.transform.parent = this.transform.parent;
    windowAndContent.transform.localPosition = this.transform.localPosition;
    windowAndContent.transform.forward = this.transform.forward;
    this.GetComponent<MeshRenderer>().enabled = false;
  }
  TapToPlace tapToPlace;
}

and, sure enough, I can now display a quad and then tap to position it on a wall in my environment and when I tap again it gets replaced with a window that I can look through into a ‘virtual world’ on the other side of that wall.

But that ‘virtual world’ is just a cube, I need something more interesting when I look through my window…

Making the View from the Window more Interesting

I figured that I’d make the window a bit larger and then would look at seeing if I could put some more interesting content on the other side of it.

I went out to the Unity Asset Store and found this set of town models and materials;

image

and it came with a nice scene or two demonstrating lots of the models and so I chopped one of those down in terms of the size of the scene and positioned it on the other side of my window.

Below is a screenshot of what I ended up with – you can see the scene and the relative position of my ‘window’ positioned such that everything in the scene is ‘in front’ of the window;

image

and this screenshot shows the reverse view of the quads shaded to occlude the buildings from the viewer on the other side of the window so that the window provides the only view;

image

and I baked all of this into my WindowAndContent prefab (replacing the existing cube) such that when I placed my quad on a wall this set of buildings would be instantiated on the other side of my window.

Trying this out, it all works surprisingly well.

What doesn’t work quite so well is showing how this looks in captured screenshots but hopefully the pictures below give an idea of two views through the same window.

It’s a very convincing effect when you’re actually using it – here’s a view taken from the right of the window. Note that you can see that I’ve left a little gap to the upper left of the window frame which needs closing in the Unity designer – i.e. that’s an artefact that I can fix rather than something from the device;

20161231_172822_HoloLens

and here’s a view when I’m standing more to the centre of the window;

20161231_172842_HoloLens

This is such a clever effect – I’m going to experiment some more but I’m also going to work way my through some more of those case studies, there’s a lot to learn…