Are You There? Windows 10/UWP & the UserConsentVerifier

One thing that passed me by was the functionality offered by the UserConsentVerifier class on Windows 10.

I think the docs are a bit misleading here because they say that this class;

Checks for availability of a biometric (fingerprint) verifier device and performs a biometric verification.”

and I’m not sure whether that’s quite right on Windows 10 and if I look at the doc for the RequestVerificationAsync method then it says that what it does is;

|Performs a fingerprint (biometric) verification.|

Which doesn’t really seem right to the results that I see if if I run this code which uses the API on my Surface Pro 3;

      var result = await UserConsentVerifier.CheckAvailabilityAsync();

      if (result == UserConsentVerifierAvailability.Available)
      {
        var verifiedResult = await UserConsentVerifier.RequestVerificationAsync(
          "Just checking that you are really you :-)");

        if (verifiedResult == UserConsentVerificationResult.Verified)
        {
          // we're ok.
        }
      }

then, without any kind of biometric authentication mechanism, this code asks me to verify my PIN on my device to, effectively, ensure that it’s me who’s currently sitting in front of my the device;

image

In this case the API asks me to say ‘Hello’ to my PC using the PIN number that I’ve set up which is, clearly, not a biometric authentication mechanism but if I get the PIN number correct then that will satisfy the API’s requirements.

If I run this on my 950XL phone then it offers me both a biometric option (iris scanner) and a PIN option. I haven’t tried but I assume that if I plugged a RealSense or Kinect camera into my Surface Pro 3 then facial identification would become an option there.

So…a simple API to request that the logged in user can authenticate themselves to the system at a point in time where your app needs them and that’s not specifically tied to biometric mechanisms.

“Project Oxford”–Speaker Verification from a Windows 10/UWP App

Following up on my previous post, I said that I’d return to ‘Project Oxford’ and its capability to do speaker verification and identification.

You can read the documentation on this here and you can try it out on the web;

and I learned late in this process Smile that while there does not seem to be a client SDK package as such for the Windows10/UWP developer that surfaces these REST APIs into your environment there are some .NET samples;

and if I’d realised that up front then I could have saved myself quite a lot of effort Smile That said, those .NET samples seem to use NAudio in order to generate an audio stream and I wanted to use UWP APIs to do that and that’s where most of my effort went in trying out the ‘Oxford’ speaker recognition APIs.

In my own terms, verification is where you have an idea whose voice you are listening to and you want ‘Oxford’ to confirm it. Identification, on the other hand, is where you want ‘Oxford’ to take the voice and tell you who it belongs to.

I’ve only really dug into verification so far but I think that in both cases you work with ‘Profiles’ and for verification the process runs something like this;

    • You make a REST call to ask the system for a set of supported verification phrases. You choose one to present to your user.
    • You make a REST call to create a new profile (for the user)
    • You need to repeat at least 3 times what ‘Oxford’ calls ‘enrolment’ for a profile;
      • Capturing the user saying the SAME verification phrase in a WAV container containing PCM audio at 16K, 16-bit mono.
      • Make a REST call sending the speech to the service and checking that it is happy to use it for enrolment.
    • Once the user is fully ‘enrolled’, you can now verify the user’s voice by capturing them speaking one of the phrases again and then making a REST call with that speech against that profile and the system says “Yes” or “No” and provides a confidence factor on that response.

That’s about it when reduced down to a few sentences. I think the service currently supports 1000 profiles so, effectively, 1000 users.

I wrote a little Windows 10 UWP app to try out the verification service and, lacking the discovery of the WPF sample, I wrote a bunch of classes that I could have more or less taken straight from that sample code if I’d known about it before I started;

image

I didn’t write mountains of code but it’s perhaps too big to go into a blow-by-blow account of here and most of it is just REST calls and JSON de-serialization.

I did, though, spend quite a long time on it and for one reason only…

Struggles with Posting WAV Files containing PCM Audio

I used the UWP AudioGraph APIs to create a byte stream containing a WAV container wrapping a PCM audio stream sampled at 16K in mono with 16-bits per sample.

That’s what ‘Oxford’ said that it wanted and that’s what I tried to give it. Many times over Smile

Every time I submitted one of these byte streams to the ‘Oxford’ speaker recognition enrolment APIs I would get some kind of ‘PCM is required’ error.

After much head scratching, I found myself going back and reading a resource that I’ve used before on WAV file formats (like when I wrote this old post).

I found that if I opened up a WAV file from c:\windows\media with the binary editor in Visual Studio then I would see a header something like;

image

whereas the WAV byte stream that was coming out of the Windows 10 UWP AudioGraph APIs looked like this;

image

Clearly, the characters ‘JUNK” stood out for me here and, in as far as I can tell, this second WAV file contains a ‘JUNK chunk’ of 36 bytes. As far as I know, this is valid and is certainly mentioned here and it doesn’t seem to cause Groove music or Windows Media Player any troubles.

However…it did seem to cause troubles for the ‘Project Oxford’ API in that I found that when my stream contained the ‘JUNK chunk’, I got errors from the API whereas once I removed that chunk (and patched up the file length which is also stored in the stream) then things seemed to work so much better Smile

Consequently, my code currently has this slightly hacky method which hacks the byte stream that comes out of the AudioGraph APIs before it gets sent to ‘Oxford’;

    byte[] HackOxfordWavPcmStream(IInputStream inputStream, out int offset)
    {
      var netStream = inputStream.AsStreamForRead();
      var bits = new byte[netStream.Length];
      netStream.Read(bits, 0, bits.Length);

      // original file length
      var pcmFileLength = BitConverter.ToInt32(bits, 4);

      // take away 36 bytes for the JUNK chunk
      pcmFileLength -= 36;

      // now copy 12 bytes from start of bytes to 36 bytes further on
      for (int i = 0; i < 12; i++)
      {
        bits[i + 36] = bits[i];
      }
      // now put modified file length into byts 40-43
      var newLengthBits = BitConverter.GetBytes(pcmFileLength);
      newLengthBits.CopyTo(bits, 40);

      // the bits that we want are now 36 onwards in this array
      offset = 36;

      return (bits);
    }

Once I had that code in place, I seemed to be able to submit audio to ‘Oxford’s speaker verification endpoints for both enrolment and verification with no hassle whatsoever.

I’m not sure whether I could somehow push the AudioGraph APIs to not emit this JUNK chunk but, for now, I’ve gone with the hack after wasting quite a few hours trying to figure it out.

In Action

Putting it together, I have the most basic UI which runs like this – the demo below is a little bit ‘dry’ as I’ve not got a community of users to try it out with as I write this post but maybe you can try it out yourself and see how it works for you with a set of voices;

Code

If you want the code, it’s here for download but there’s a #error in a file called Keys.cs where you would need to supply your own API key for Oxford’s speaker recognition API.

Identification?

With the code as it is, I don’t think it’d be too hard to add on calls to the identification service and try that out as well. I might get there in a follow on post…

Beyond?

Voice verification/identification here is exciting stuff Smile You could imagine building a system (web connected) that used Oxford’s facial and voice verification to add 2/3 factor authentication to something like a secure swipe-card door access system.

At the time of writing, facial recognition is available on device (Intel’s RealSense SDK for desktop apps) and is based on depth camera images (like those used in ‘Windows Hello’).

For voice recognition though I’m not aware of an on-device service whether for desktop or UWP so it’s exciting to see this up at ‘Oxford’ in preview (free) for developers to experiment with.

Give it a whirl Smile

Windows 10, UWP, AudioGraph–Recording Microphone to WAV File

Just a snippet of code to share – I hadn’t tried to record from the system’s microphone before with Windows 10/UWP and I wanted to record PCM into a file so I had to spend 20 minutes trying to figure it out.

This sample was incredibly helpful and I stripped it down for my purposes to having a UI with a Start and a Stop button wired to these handlers below and that seemed to work in letting me record mono, 16K PCM, mono file with 16-bits per sample into a file on my desktop.

  using System;
  using System.Collections.Generic;
  using System.Threading.Tasks;
  using Windows.Devices.Enumeration;
  using Windows.Media.Audio;
  using Windows.Media.Capture;
  using Windows.Media.Devices;
  using Windows.Media.MediaProperties;
  using Windows.Media.Render;
  using Windows.Storage;
  using Windows.Storage.Pickers;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
    }

    async void OnStart(object sender, RoutedEventArgs e)
    {
      var file = await this.PickFileAsync();

      if (file != null)
      {
        var result = await AudioGraph.CreateAsync(
               new AudioGraphSettings(AudioRenderCategory.Speech));

        if (result.Status == AudioGraphCreationStatus.Success)
        {
          this.graph = result.Graph;

          var microphone = await DeviceInformation.CreateFromIdAsync(
            MediaDevice.GetDefaultAudioCaptureId(AudioDeviceRole.Default));

          // In my scenario I want 16K sampled, mono, 16-bit output
          var outProfile = MediaEncodingProfile.CreateWav(AudioEncodingQuality.Low);
          outProfile.Audio = AudioEncodingProperties.CreatePcm(16000, 1, 16);

          var outputResult = await this.graph.CreateFileOutputNodeAsync(file,
            outProfile);

          if (outputResult.Status == AudioFileNodeCreationStatus.Success)
          {
            this.outputNode = outputResult.FileOutputNode;

            var inProfile = MediaEncodingProfile.CreateWav(AudioEncodingQuality.High);

            var inputResult = await this.graph.CreateDeviceInputNodeAsync(
              MediaCategory.Speech,
              inProfile.Audio,
              microphone);

            if (inputResult.Status == AudioDeviceNodeCreationStatus.Success)
            {
              inputResult.DeviceInputNode.AddOutgoingConnection(
                this.outputNode);

              this.graph.Start();
            }
          }
        }
      }
    }
    async void OnStop(object sender, RoutedEventArgs e)
    {
      if (this.graph != null)
      {
        this.graph?.Stop();

        await this.outputNode.FinalizeAsync();

        // assuming that disposing the graph gets rid of the input/output nodes?
        this.graph?.Dispose();

        this.graph = null;
      }
    }
    async Task<StorageFile> PickFileAsync()
    {
      FileSavePicker picker = new FileSavePicker();
      picker.FileTypeChoices.Add("Wave File (PCM)", new List<string> { ".wav" });
      picker.SuggestedStartLocation = PickerLocationId.Desktop;

      var file = await picker.PickSaveFileAsync();

      return (file);
    }
    AudioGraph graph;
    AudioFileOutputNode outputNode;
  }

it’s clearly quite a rough bit of code – all mistakes are mine and I hadn’t tried out AudioGraph before.

Speech to Text (and more) with Windows 10 UWP & ‘Project Oxford’

We’re increasingly talking to machines and, more importantly, they’re increasingly listening and even starting to understand.

In the world of the Windows 10 Universal Windows Platform, there’s the SpeechRecognizer class which can do speech recognition on the device without necessarily calling off to the cloud and it has a number of different capabilities.

The recognizer can be invoked either to recognise a discrete ‘piece’ of speech at a particular point in time or it can be invoked to continuously listen to speech. For the former case, it can also show standard UI that the user would recognise across apps or it can show custom UI of the developer’s choice (or none at all if that’s appropriate).

As a simple example, here’s a piece of code that sets a few options around timeouts, UI options and displays the system UI to recognise speech – assume that it’s invoked from a Button and that there is a TextBlock called txtResults to store the results into;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
    }
    async void OnListenAsync(object sender, RoutedEventArgs e)
    {
      this.recognizer = new SpeechRecognizer();
      await this.recognizer.CompileConstraintsAsync();

      this.recognizer.Timeouts.InitialSilenceTimeout = TimeSpan.FromSeconds(5);
      this.recognizer.Timeouts.EndSilenceTimeout = TimeSpan.FromSeconds(20);

      this.recognizer.UIOptions.AudiblePrompt = "Say whatever you like, I'm listening";
      this.recognizer.UIOptions.ExampleText = "The quick brown fox jumps over the lazy dog";
      this.recognizer.UIOptions.ShowConfirmation = true;
      this.recognizer.UIOptions.IsReadBackEnabled = true;
      this.recognizer.Timeouts.BabbleTimeout = TimeSpan.FromSeconds(5);

      var result = await this.recognizer.RecognizeWithUIAsync();

      if (result != null)
      {
        StringBuilder builder = new StringBuilder();

        builder.AppendLine(
          $"I have {result.Confidence} confidence that you said [{result.Text}] " +
          $"and it took {result.PhraseDuration.TotalSeconds} seconds to say it " +
          $"starting at {result.PhraseStartTime:g}");

        var alternates = result.GetAlternates(10);

        builder.AppendLine(
          $"There were {alternates?.Count} alternates - listed below (if any)");

        if (alternates != null)
        {
          foreach (var alternate in alternates)
          {
            builder.AppendLine(
              $"Alternate {alternate.Confidence} confident you said [{alternate.Text}]");
          }
        }
        this.txtResults.Text = builder.ToString();
      }
    }
    SpeechRecognizer recognizer;
  }

and here’s that code running;

and it’s easy to take away the UIOptions and swap the call to RecognizeWithUIAsync() to be a call to RecognizeAsync() in order to have the system drop the UI and, perhaps, provide my own UI in the form of a ProgressBar. Here’s that code;

 public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
    }
    async void OnListenAsync(object sender, RoutedEventArgs e)
    {
      this.recognizer = new SpeechRecognizer();
      await this.recognizer.CompileConstraintsAsync();

      this.recognizer.Timeouts.InitialSilenceTimeout = TimeSpan.FromSeconds(5);
      this.recognizer.Timeouts.EndSilenceTimeout = TimeSpan.FromSeconds(20);
      this.recognizer.Timeouts.BabbleTimeout = TimeSpan.FromSeconds(5);

      this.txtResults.Text = string.Empty;

      this.progressBar.Visibility = Visibility.Visible;

      var result = await this.recognizer.RecognizeAsync();

      if (result != null)
      {
        StringBuilder builder = new StringBuilder();

        builder.AppendLine(
          $"I have {result.Confidence} confidence that you said [{result.Text}] " +
          $"and it took {result.PhraseDuration.TotalSeconds} seconds to say it " +
          $"starting at {result.PhraseStartTime:g}");

        var alternates = result.GetAlternates(10);

        builder.AppendLine(
          $"There were {alternates?.Count} alternates - listed below (if any)");

        if (alternates != null)
        {
          foreach (var alternate in alternates)
          {
            builder.AppendLine(
              $"Alternate {alternate.Confidence} confident you said [{alternate.Text}]");
          }
        }
        this.txtResults.Text = builder.ToString();
      }
      this.progressBar.Visibility = Visibility.Collapsed;
    }
    SpeechRecognizer recognizer;
  }

and here it is running although I think it’s fair to say that my “UI” is not really giving the user much of a clue around what they are expected to do here Winking smile

Unless I’m building a dictation program, I might want to guide the speech recognition engine and use more of a ‘command’ mode.

For instance, if I wanted to start implementing the classic interactive programming environment of Logo with voice control then I might want commands like “left” or “right” or something along those lines. I’ve dropped an image into my UI which has a RotateTransform applied to it called rotateTransform and I’ve changed my code;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
    }
    async void OnListenAsync(object sender, RoutedEventArgs e)
    {
      this.recognizer = new SpeechRecognizer();

      var commands = new Dictionary<string, int>()
      {
        ["left"] = -90,
        ["right"] = 90
      };

      this.recognizer.Constraints.Add(new SpeechRecognitionListConstraint(
        commands.Keys));

      await this.recognizer.CompileConstraintsAsync();

      var result = await this.recognizer.RecognizeAsync();

      if ((result != null) && (commands.ContainsKey(result.Text)))
      {
        this.rotateTransform.Angle += commands[result.Text];
      }
    }
    SpeechRecognizer recognizer;
  }

and that lets me spin the turtle;

but I probably want to move away from having to press a button every time I want the SpeechRecognizer to listen to me and use more of a ‘continuous’ speech recognition session.

I can do that while preserving my constrained set of commands that are being listened for;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += OnLoaded;
    }

    async void OnLoaded(object sender, RoutedEventArgs args)
    {
      this.recognizer = new SpeechRecognizer();

      var commands = new Dictionary<string, int>()
      {
        ["left"] = -90,
        ["right"] = 90
      };

      this.recognizer.Constraints.Add(new SpeechRecognitionListConstraint(
        commands.Keys));

      await this.recognizer.CompileConstraintsAsync();

      this.recognizer.ContinuousRecognitionSession.ResultGenerated +=
        async (s, e) =>
        {
          if ((e.Result != null) && (commands.ContainsKey(e.Result.Text)))
          {
            await this.Dispatcher.RunAsync(CoreDispatcherPriority.Normal,
              () =>
              {
                this.rotateTransform.Angle += commands[e.Result.Text];
              }
            );
            this.recognizer.ContinuousRecognitionSession.Resume();
          }
        };

      await this.recognizer.ContinuousRecognitionSession.StartAsync(
        SpeechContinuousRecognitionMode.PauseOnRecognition);
    }
    SpeechRecognizer recognizer;
  }

and now the turtle just ‘obeys’ without me having to press any buttons etc;

but it might be nice to accept a more natural form of input here where the user might say something like;

“make the turtle turn left”

where the only bit of the speech that the code is really interested in is the “left” part but the presentation to the user is more natural.

This is perhaps where we can start to guide the recogniser with a bit of a grammar and the engine here understands SRGS (there’s a better guide here on MSDN).

I made a simple grammar;

<?xml version="1.0" encoding="UTF-8"?>
<grammar 
  version="1.0" mode="voice" root="commands"
  xml:lang="en-US" tag-format="semantics/1.0"  
  xmlns="http://www.w3.org/2001/06/grammar">
  <rule id="commands" scope="public">
    <item>make the turtle turn</item>
    <ruleref uri="#direction" />
    <tag> out.rotation=rules.latest(); </tag>
  </rule>
  <rule id="direction">
    <one-of>
      <item>
        <tag>out="left";</tag>
        <one-of>
          <item>left</item>
          <item>anticlockwise</item>
          <item>banana</item>
        </one-of>
      </item>
      <item>
        <tag>out="right";</tag>
        <one-of>
          <item>right</item>
          <item>clockwise</item>
          <item>cheese</item>
        </one-of>
      </item>
    </one-of>
  </rule>
</grammar>

and added it to my project as a file grammar.xml and then changed my code a little – I now have 3 terms in the grammar defined for “left”/”right” but in the code I only need to deal with “left” and “right” coming from the engine;

 public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += OnLoaded;
    }

    async void OnLoaded(object sender, RoutedEventArgs args)
    {
      this.recognizer = new SpeechRecognizer();

      var grammarFile = await StorageFile.GetFileFromApplicationUriAsync(
        new Uri("ms-appx:///grammar.xml"));

      this.recognizer.Constraints.Add(
        new SpeechRecognitionGrammarFileConstraint(grammarFile));

      await this.recognizer.CompileConstraintsAsync();

      this.recognizer.ContinuousRecognitionSession.ResultGenerated +=
        async (s, e) =>
        {
          var rotationList = e.Result?.SemanticInterpretation?.Properties?["rotation"];
          var rotation = rotationList.FirstOrDefault();

          if (!string.IsNullOrEmpty(rotation))
          {
            var angle = 0;

            switch (rotation)
            {
              case "left":
                angle = -90;
                break;
              case "right":
                angle = 90;
                break;
              default:
                break;
            }
            await this.Dispatcher.RunAsync(CoreDispatcherPriority.Normal,
              () =>
              {
                this.rotateTransform.Angle += angle;
              });
          }
          this.recognizer.ContinuousRecognitionSession.Resume();
        };

      await this.recognizer.ContinuousRecognitionSession.StartAsync(
        SpeechContinuousRecognitionMode.PauseOnRecognition);
    }
    SpeechRecognizer recognizer;
  }

and then I can try that out;

and that seems to work pretty well.

That’s a number of different options that can all be done on the client device without coding against a cloud service but, of course, those cloud services are out there and are cross-platform so let’s try some of that functionality out…

Adding Cloud

In terms of the services that I want to add to the post here, I’m looking at what’s available under the banner of ‘Project Oxford’ and specifically the Speech APIs and the Speaker Recognition APIs.

All of these are in preview right now and there’s a need to grab an API key to make use of them which you can do on the site itself.

Speech Recognition

There’s a page here that puts the Windows 10 and ‘Oxford’ speech capabilities all on one page but I must admit that I find it quite confusing as the examples given seem to be more about the UWP SpeechRecognizer than about ‘Oxford’.

With the right couple of clicks though you can jump to the class SpeechRecognitionServiceFactory in the docs which seems to provide the starting point for Oxford-based speech recognition.

That said, for a UWP developer, there’s a bit of a blocker at the time of writing as the Oxford client libraries don’t have UWP support Sad smile

Github thread about UWP support

and so you either need to drop down to the desktop and code for WPF or Windows Forms or you can make calls to the REST API yourself without a client library.

My choice was to go with WPF given that it looks like UWP support is on its way and (I’d imagine) the APIs will end up looking similar for UWP.

So, I made a WPF application and added in the NuGet package;

image

and I went back to my original scenario for speech recognition of some freeform speech and made a UI that with a Start button, a Stop button and a TextBlock to display the recognition results and that gave me some code that looked like this;

  public partial class MainWindow : Window
  {
    public MainWindow()
    {
      InitializeComponent();
    }

    private void OnStart(object sender, RoutedEventArgs e)
    {
      this.client = SpeechRecognitionServiceFactory.CreateMicrophoneClient(
        SpeechRecognitionMode.ShortPhrase,
        "en-GB",
        Constants.Key);

      this.client.OnPartialResponseReceived += OnPartialResponse;
      this.client.OnResponseReceived += OnResponseReceived;
      this.client.OnConversationError += Client_OnConversationError;

      this.client.StartMicAndRecognition();
    }
    private void Client_OnConversationError(object sender, SpeechErrorEventArgs e)
    {
      this.Dispatch(() =>
      {
        this.txtResults.Text = $"Some kind of problem {e.SpeechErrorText}";
      });
    }
    void OnResponseReceived(object sender, SpeechResponseEventArgs e)
    {
      if (e.PhraseResponse.RecognitionStatus != RecognitionStatus.RecognitionSuccess)
      {
        this.Dispatch(() =>
        {
          this.txtResults.Text = $"Some kind of problem {e.PhraseResponse.RecognitionStatus}";
        });
      }
      else
      {
        StringBuilder builder = new StringBuilder();

        foreach (var response in e.PhraseResponse.Results)
        {
          builder.AppendLine(
            $"We have [{response.Confidence}] confidence that you said [{response.DisplayText}]");
        }
        this.Dispatch(() =>
        {
          this.txtResults.Background = Brushes.LimeGreen;
          this.txtResults.Text = builder.ToString();
        });
      }
    }

    void OnPartialResponse(object sender, PartialSpeechResponseEventArgs e)
    {
      this.Dispatch(() =>
      {
        this.txtResults.Background = Brushes.Orange;
        this.txtResults.Text = $"Partial result: {e.PartialResult}";
      });
    }
    void Dispatch(Action a)
    {
      this.Dispatcher.Invoke(a);
    }

    private void OnStop(object sender, RoutedEventArgs e)
    {
      this.client.EndMicAndRecognition();
      this.client.Dispose();
    }
    MicrophoneRecognitionClient client;
  }

and that produces results that look like;

One of the interesting things here is that there is a sibling API to the CreateMicrophoneClient API called CreateDataClient and that divorces the voice capture from the speech recognition such that you can bring your own voice streams/files to the API – that’s a nice thing and it’s not there in the UWP APIs as far as I know.

In the code above, I’m using the speech mode of ShortPhrase which limits me to 20 seconds of speech but there is also the  long dictation mode which allows for up to 2 minutes.

The functionality here feels somewhere between the UWP SpeechRecognizer’s two modes of continuous recognition and discrete recognition – I call Start and then I get events fired and then I call End.

Unlike the UWP APIs, I don’t think there’s a way to guide these APIs in their recognition either using a constraint list or using an SRGS grammar but there is another piece of powerful functionality lurking here and that’s the ability to use a cloud service to analyse the intent of the language being used.

Adding Intent – LUIS

If I substitute the call that I made to CreateMicrophoneClient with one to CreateMicrophoneClientWithIntent then this allows me to link up with a language ‘model’ that has been built using the Language Understanding Intelligent Service (or LUIS).

There’s a video on how this works up here on the web;

LUIS Video

and I certainly found that to be a good thing to watch as I hadn’t tried out LUIS before.

Based on a couple of minutes of watching that video, I made myself a basic LUIS model based around my previous example of a Logo turtle and wanting to turn it left, right, clockwise, anti-clockwise and so on.

I only have one entity in the model (the turtle) and I have a couple of intents (left, right and none) and I published that model and grabbed the application Id and key such that I could insert it here into the client-side code;

using Microsoft.ProjectOxford.SpeechRecognition;
using Newtonsoft.Json.Linq;
using System;
using System.Text;
using System.Windows;
using System.Windows.Media;
using System.Linq;

namespace WpfApplication26
{
  public partial class MainWindow : Window
  {
    public MainWindow()
    {
      InitializeComponent();
    }

    private void OnStart(object sender, RoutedEventArgs e)
    {
      this.client = SpeechRecognitionServiceFactory.CreateMicrophoneClientWithIntent(
        "en-GB",
        Constants.Key,
        Constants.LuisAppId,
        Constants.LuisKey);

      this.client.OnIntent += OnIntent;
      this.client.OnPartialResponseReceived += OnPartialResponse;
      this.client.OnResponseReceived += OnResponseReceived;
      this.client.OnConversationError += Client_OnConversationError;

      this.client.StartMicAndRecognition();
    }
    void OnIntent(object sender, SpeechIntentEventArgs args)
    {
      this.Dispatch(() =>
      {
        this.txtIntent.Text = args.Intent?.Body;
      });

      // We have to parse this as JSON. Ho Hum.
      var result = JObject.Parse(args.Intent?.Body);
      var intents = (JArray)result["intents"];
      var entities = (JArray)result["entities"];
      var topIntent = intents.OrderByDescending(i => (float)i["score"]).FirstOrDefault();
      var topEntity = entities.OrderByDescending(e => (float)e["score"]).FirstOrDefault();

      if ((topIntent != null) && (topEntity != null))
      {
        var entityName = (string)topEntity["entity"];
        var intentName = (string)topIntent["intent"];

        if (entityName == "turtle")
        {
          int angle = 0;

          switch (intentName)
          {
            case "left":
              angle = -90;
              break;
            case "right":
              angle = 90;
              break;
            default:
              break;
          }
          this.Dispatch(() =>
          {
            this.rotateTransform.Angle += angle;
          });
        }
      }
    }

    private void Client_OnConversationError(object sender, SpeechErrorEventArgs e)
    {
      this.Dispatch(() =>
      {
        this.txtResults.Text = $"Some kind of problem {e.SpeechErrorText}";
      });
    }
    void OnResponseReceived(object sender, SpeechResponseEventArgs e)
    {
      if (e.PhraseResponse.RecognitionStatus != RecognitionStatus.RecognitionSuccess)
      {
        this.Dispatch(() =>
        {
          this.txtResults.Text = $"Some kind of problem {e.PhraseResponse.RecognitionStatus}";
        });
      }
      else
      {
        StringBuilder builder = new StringBuilder();

        foreach (var response in e.PhraseResponse.Results)
        {
          builder.AppendLine(
            $"We have [{response.Confidence}] confidence that you said [{response.DisplayText}]");
        }
        this.Dispatch(() =>
        {
          this.txtResults.Background = Brushes.LimeGreen;
          this.txtResults.Text = builder.ToString();
        });
      }
    }

    void OnPartialResponse(object sender, PartialSpeechResponseEventArgs e)
    {
      this.Dispatch(() =>
      {
        this.txtResults.Background = Brushes.Orange;
        this.txtResults.Text = $"Partial result: {e.PartialResult}";
      });
    }
    void Dispatch(Action a)
    {
      this.Dispatcher.Invoke(a);
    }

    private void OnStop(object sender, RoutedEventArgs e)
    {
      this.client.EndMicAndRecognition();
      this.client.Dispose();
    }
    MicrophoneRecognitionClient client;
  }
}

and the key functionality here is in the OnIntent handler where the service returns a bunch of JSON telling me which intents and entities that it thinks that it has detected and with what confidence values.

I can then act upon that information to try and rotate the turtle as demonstrated in the video below;

In some ways, using LUIS here feels a little like using a much less formal SRGS grammar with the SpeechRecognizer on-device in the UWP but the promise of LUIS here is that it can be trained such that, over time, it gets presented with more “utterances” (i.e. speech) and the administrator can go in and mark them up so as to improve the model.

Speaker Recognition

Something that’s definitely not present in the UWP is the ‘Project Oxford’ ability to try and identify a user from their voice in one of two modes;

    1. Confirm that a speaker is a ‘voice match’ with a previously enrolled speaker.
    2. Identify a speaker by ‘voice match’ against some set of previously enrolled speakers.

This all feels very ‘James Bond’ Smile and I’m keen to try it out but I’ll push it to a follow on post because this post is getting long and, as far as I can tell, there’s no client library wrapper here to dress up the REST calls so it’s likely to need a little bit of work.

Customising Language & Acoustic Models

‘Project Oxford’ also offers the possibility of customising language and acoustic models via its CRIS service but this one is “private preview” right now and I’m not on that preview so there’s nothing I can say there other than that I like these acronyms and I hope that there is a ‘Louis’ and a ‘Chris’ on the ‘Project Oxford’ team somewhere Smile

Summary

This became a long post and it’s mostly ‘Hello World’ type stuff but there’s some things in here about speech recognition working on-device in the UWP case and then off-device (and cross platform) in the ‘Project Oxford’ case and I think the main point that I’d make is that these technologies are very much ‘out there’ and very much commodity and yet I don’t see a lot of apps using them (yet) to great advantage.

Naturally, there’s also Cortana and I’ve written about ‘her’ before on the blog but that’s another, related aspect to what I’ve written here.

Band 2 Development–A First Try…

For a while, I’ve had it on my list to look at developing on the Microsoft Band 2. I skipped the first generation of the Band but I did jump in with the 2nd generation back in December and I like the device although I don’t use it to its full capacity, mainly using it for;

    • Sleep tracking
    • Workout tracking
    • Calendar, Mails, Text
    • Steps, Calendar, Calories

and it’s been working pretty well for me in that I get decent battery life from the device and it’s proved to be quite wearable once I’ve got used to its position on my wrist.

I’ll admit that I had a few ‘challenges’ with the device when I first got it in that some of the functionality didn’t work with my original Lumia 1020 phone for a reason that I never got to the bottom of and I found that quite frustrating. That said, I’ve since paired it with a Lumia 950XL and it’s worked fine with that device.

One of the things that I think causes some confusion around the Band 2 is the type of device that it is. I’ve talked to Microsoft folks who are convinced that the Band is a Windows 10 device and that it runs the Windows 10 OS and Universal Windows Platform (UWP) apps and, as you probably know if you’re reading this blog site, it isn’t and it doesn’t.

As an aside, it would be quite something to put the “UWP” onto a device like this. As has been said before, the Windows 10 app platform is made up of platforms (UWP, Mobile, Desktop, etc) and those platforms are made up contracts which are sets of APIs. The “UWP” is a contract and it contains most of the available APIs and so to put all of that functionality onto a device like a Band 2 would be “quite a piece of work”. I’m not saying it’s “impossible” but…

For me, I see the Band 2 as more of a ‘smart peripheral’. That is, it’s really a companion device for a piece of software running on another device (generally, a phone) but it does have some level of built-in functionality in that it can do things like;

    1. Guide you through a pre-defined workout.
    2. Run a stopwatch or a timer.
    3. Record details of your sleep or a run.
    4. Keep track of your activity like steps, calories, heart rate.

in isolation which is a good thing otherwise you’d have to take both the phone and the band with you any time you wanted to use the band.

When it comes to developing for the Band 2, there are 3 options outlined on the website.

    1. Use the Band SDK – write code to talk to the Band 2 over bluetooth from a device like a phone or PC (there’s an iOS, Android and Windows SDK).
    2. Make a ‘Web Tile’ – provide a URL that can be polled for data to display on a tile. The resulting tile can be manually installed to a Band 2 or it can be submitted to the gallery for a user to install themselves via the Health app on the Band 2.
    3. Work with the back-end data that’s ultimately gathered from a Band 2 and available over a REST API.

I think all of these have their place but it’s probably the SDK that’s of most interest to me and so I opened it up and tried out a few scenarios.

A big note – I’m not doing anything here that hasn’t been done many times before. I’ve watched developer sessions on the Band/Band 2. I’m mainly working through it here and writing it up in order to add to that weight of material but also so that it sticks in my head – I tend to find that if I haven’t written the code myself then I can’t remember how things work.

Installing the SDK

Installing the Band 2 SDK is as easy as installing the NuGet package Microsoft.Band which I did from the package manager console in Visual Studio and seemed to get ‘success’ !

image

At the time of writing, I always have a tense “Will it install?” moment when grabbing SDKs from Nuget and trying to install them into UWP projects so I was happy that this one seemed to “just work”.

The SDK docs are then a big PDF document that seems to do a pretty good job of explaining things but I’ll add my own little experiments below.

Connecting to a Band 2

The SDK walks you through getting a connection to a Band which I think can be summarised as;

      var bands = await BandClientManager.Instance.GetBandsAsync();
      
      if (bands?.Count() > 0)
      {
        var client = 
          await BandClientManager.Instance.ConnectAsync(bands.First());

        // Do something...
      }

and that was a piece of code that worked the first time I wrote it and ran it on my phone (which is paired with my Band 2). I should say that I’d modified my manifest to include;

image

but I didn’t manually hack the manifest XML file as suggested in the SDK, I just used the manifest designer here.

So, I’m connected to my Band 2. What can I now do?

Reading from Sensors

There are a bunch of sensors on the Band 2 and accessing them seems pretty simple. There’s even a ‘Sensor Manager’ that handles it for you although I’ve often been told off for naming classes “XYZManager” in the past Smile

If you imagine that the code below is from a code-behind class in XAML and that there is a TextBlock on screen called txtStatus;

      var bands = await BandClientManager.Instance.GetBandsAsync();
      
      if (bands?.Count() > 0)
      {
        var client = 
          await BandClientManager.Instance.ConnectAsync(bands.First());

        var contact = client.SensorManager.Contact;

        // Do something...
        DispatchedHandler handler = async () =>
        {
          var status = 
            await contact.GetCurrentStateAsync();

          this.txtStatus.Text = status.State !=
            BandContactState.Worn ? "Put your band back on now!" : "Glad you are wearing Band";
        };
        handler();

        contact.ReadingChanged += (s, e) =>
        {
          this.Dispatcher.RunAsync(CoreDispatcherPriority.Normal, handler);
        };
      }

then this code does what you expect – it monitors the user as they put their Band 2 on/off.

I found that code worked pretty well given that I’d spend maybe 1-2 minutes on it and apologies for the slightly obtuse (i.e. lazy) way that I’ve written and invoked the event handler inline inside of this function.

If I wanted to read from another sensor like the heart rate sensor then that’s pretty easy but it requires a couple of extra steps – I set up an interval for the reading to occur and I also have to ask for permission;

      var bands = await BandClientManager.Instance.GetBandsAsync();

      if (bands?.Count() > 0)
      {
        var client =
          await BandClientManager.Instance.ConnectAsync(bands.First());

        var hr = client.SensorManager.HeartRate;

        var allowed = (hr.GetCurrentUserConsent() == UserConsent.Granted);

        if (!allowed)
        {
          allowed = await hr.RequestUserConsentAsync();
        }
        if (allowed)
        {
          // choose the smallest interval
          hr.ReportingInterval = hr.SupportedReportingIntervals.OrderBy(i => i).First();

          await hr.StartReadingsAsync();

          hr.ReadingChanged += (s, e) =>
          {
            this.Dispatcher.RunAsync(CoreDispatcherPriority.Normal,
              () =>
              {
                this.txtStatus.Text = e.SensorReading.HeartRate.ToString();
              });
          };
        }

but it’s still pretty easy and it looks like the other sensors including Accelerometer, Altimeter, Barometer, Calories, Distance, Gsr, Gyroscope, Pedometer, RRInterval, SkinTemperature and UV all follow a very similar model meaning that it’s a “learn once, repeat many” type process which I like a lot.

For an application that wants to run on my Phone/PC in either the foreground or the background and talk to my Band 2 for a while, gather some data from it and then “do something with it” like sending it to the cloud it doesn’t feel like it should be too difficult to do.

What’s not so clear to me is how the Band 2 might trigger something on the Phone/PC in a scenario where the Phone/PC wasn’t already “listening” for that action short of having the Phone/PC app running all the time which often isn’t a viable option. I’ll need to come back to that.

What else can be done? There are some other ‘manager’ objects hanging off that IBandClient interface that I’ve got hold of…

Notifications (Part 1)

I can send notifications from code running on my device (Phone/PC) to my Band 2. Here’s my first attempt;

      var bands = await BandClientManager.Instance.GetBandsAsync();

      if (bands?.Count() > 0)
      {
        using (var client =
          await BandClientManager.Instance.ConnectAsync(bands.First()))
        {
          await client.NotificationManager.VibrateAsync(VibrationType.OneToneHigh);
        }
      }

and that worked out fine so I thought that I’d be able to follow it up by sending a message or showing a dialog but both of those need a tile ID – where does that come from?

Tiles

The band displays tiles on its Start Strip;

image

and those can be of two different types;

    • Messaging – just have messages (titles/bodies) behind them.
    • Custom – can have more complex content behind them managed as a set of pages.

I figured that I’d create a messaging tile and see how that worked for me. The code’s pretty simple and so I tried to combine it with some code which then displayed a dialog before tidying up the tile by removing it again;

     var bands = await BandClientManager.Instance.GetBandsAsync();

      if (bands?.Count() > 0)
      {
        using (var client =
          await BandClientManager.Instance.ConnectAsync(bands.First()))
        {
          var tileSpace = await client.TileManager.GetRemainingTileCapacityAsync();

          if (tileSpace > 0)
          {
            var iconFile = await StorageFile.GetFileFromApplicationUriAsync(
              new Uri("ms-appx:///Assets/tileicon.png"));

            var smallIconFile = await StorageFile.GetFileFromApplicationUriAsync(
              new Uri("ms-appx:///Assets/smalltileicon.png"));

            using (var stream = await iconFile.OpenReadAsync())
            {
              using (var smallStream = await smallIconFile.OpenReadAsync())
              {
                var largeBitmap = new WriteableBitmap(48, 48);
                largeBitmap.SetSource(stream);
                var largeIcon = largeBitmap.ToBandIcon();

                var smallBitmap = new WriteableBitmap(24, 24);
                smallBitmap.SetSource(smallStream);
                var smallIcon = smallBitmap.ToBandIcon();

                var guid = Guid.NewGuid();

                var added = await client.TileManager.AddTileAsync(
                  new BandTile(guid)
                  {
                    Name = "Test",
                    TileIcon = largeIcon,
                    SmallIcon = smallIcon
                  }
                );
                if (added)
                {
                  // NB: This call will return back to us *before* the
                  // user has acknowledged the dialog on their device -
                  // we don't get to know their answer here.
                  await client.NotificationManager.ShowDialogAsync(
                    guid, "check-in", "are you ok?");

                  await client.TileManager.RemoveTileAsync(guid);
                }
              }
            }
          }
        }
      }
    }

Note that while this code works, I think it’s a little suspect as I actually make a call to RemoveTileAsync which is likely to run before the user has actually seen the dialog associated with that tile on the screen. The Band 2 (or the SDK) seems to forgive me for this but it’s perhaps not the best of ideas.

I can switch the call to ShowDialogAsync to ShowMessageAsync by just changing that piece of code to;

              await client.NotificationManager.SendMessageAsync(
                    guid, "title", "body", DateTimeOffset.Now, MessageFlags.None);

                  await Task.Delay(20000);

                  await client.TileManager.RemoveTileAsync(guid);

and the difference here is in how the Band 2 displays a message versus a dialog.

For the dialog case it pops a dialog for me to dismiss whereas for the message case I end up with a tile with a badge on it that I can then tap to see the details of the message behind the tile.

That’s why I added the 20 second delay to the code for the message example in order to give me enough time to see the message behind the tile before it was removed. Again, the way in which I’m removing the tile here probably isn’t a great thing to be doing!

Once I’ve got a tile, I can know when the user taps on the tile to “enter” my app within the Band 2 and when they “leave” by using the back button but, again, this is only going to be while I have code with an active connection to the Band 2.

It was time to shake up my “UI” a little so I made 3 buttons for Create/Register Handlers/Remove and put this code behind it;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
    }
    async void OnCreateTile(object sender, RoutedEventArgs args)
    {
      var bands = await BandClientManager.Instance.GetBandsAsync();

      if (bands?.Count() > 0)
      {
        this.client = await BandClientManager.Instance.ConnectAsync(bands.First());

        var tileSpace = await this.client.TileManager.GetRemainingTileCapacityAsync();

        if (tileSpace > 0)
        {
          var iconFile = await StorageFile.GetFileFromApplicationUriAsync(
            new Uri("ms-appx:///Assets/tileicon.png"));

          var smallIconFile = await StorageFile.GetFileFromApplicationUriAsync(
            new Uri("ms-appx:///Assets/smalltileicon.png"));

          using (var stream = await iconFile.OpenReadAsync())
          {
            using (var smallStream = await smallIconFile.OpenReadAsync())
            {
              var largeBitmap = new WriteableBitmap(48, 48);
              largeBitmap.SetSource(stream);
              var largeIcon = largeBitmap.ToBandIcon();

              var smallBitmap = new WriteableBitmap(24, 24);
              smallBitmap.SetSource(smallStream);
              var smallIcon = smallBitmap.ToBandIcon();

              this.tileGuid = Guid.NewGuid();

              var added = await this.client.TileManager.AddTileAsync(
                new BandTile(this.tileGuid)
                {
                  Name = "Test",
                  TileIcon = largeIcon,
                  SmallIcon = smallIcon
                }
              );
            }
          }
        }
      }
    }
    async void OnRemove(object sender, RoutedEventArgs e)
    {
      await this.client.TileManager.StopReadingsAsync();
      await this.client.TileManager.RemoveTileAsync(this.tileGuid);
    }

    void OnRegister(object sender, RoutedEventArgs e)
    {
      this.client.TileManager.TileOpened += OnTileOpened;
      this.client.TileManager.TileClosed += OnTileClosed;
      this.client.TileManager.StartReadingsAsync();
    }
    void OnTileClosed(object sender, BandTileEventArgs<IBandTileClosedEvent> e)
    {
      if (e.TileEvent.TileId == this.tileGuid)
      {
        // My tile!
      }
    }
    void OnTileOpened(object sender, BandTileEventArgs<IBandTileOpenedEvent> e)
    {
      if (e.TileEvent.TileId == this.tileGuid)
      {
        // My tile!
      }
    }
    IBandClient client;
    Guid tileGuid;
  }

and, sure enough, the 2 event handlers that I left empty above get called when I tap on the new tile on the band and when I use the “back” button to leave that tile.

From the SDK docs, these events are handled using intents on Android which would mean that the events can be received whether the app on the phone is running or not which is not something that the Windows SDK here surfaces (although, naturally, it’s possible to do that type of thing on Windows UWP).

Pages, UI Elements

Beyond that, tiles can have pages within them which can contain UI pieces like buttons and text blocks, icons and barcodes within layout containers that may scroll etc.

I found this slightly more tricky to get working in the first instance as there’s a definite ‘pattern’ that’s in use here where you need to;

    1. Create your tile that specifies the page layouts that it contains and the UI elements within them which are given identifiers.
    2. Dynamically “talk” to the tile at a later point to set content for the elements that you earlier identified.

and I didn’t get this ‘2 step dance’ quite right for the first few attempts but it makes sense and is fairly easy once you’ve got the idea of it.

I left my UI as being 3 buttons for create/register/remove and I altered my tile such that it had a single page with a single button on it;

  using Microsoft.Band;
  using Microsoft.Band.Tiles;
  using Microsoft.Band.Tiles.Pages;
  using System;
  using System.Linq;
  using Windows.Storage;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;
  using Windows.UI.Xaml.Media.Imaging;
  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.buttonElementId = 1;
    }
    async void OnCreateTile(object sender, RoutedEventArgs args)
    {
      var bands = await BandClientManager.Instance.GetBandsAsync();

      if (bands?.Count() > 0)
      {
        this.client = await BandClientManager.Instance.ConnectAsync(bands.First());

        var tileSpace = await this.client.TileManager.GetRemainingTileCapacityAsync();

        if (tileSpace > 0)
        {
          var iconFile = await StorageFile.GetFileFromApplicationUriAsync(
            new Uri("ms-appx:///Assets/tileicon.png"));

          var smallIconFile = await StorageFile.GetFileFromApplicationUriAsync(
            new Uri("ms-appx:///Assets/smalltileicon.png"));

          using (var stream = await iconFile.OpenReadAsync())
          {
            using (var smallStream = await smallIconFile.OpenReadAsync())
            {
              var largeBitmap = new WriteableBitmap(48, 48);
              largeBitmap.SetSource(stream);
              var largeIcon = largeBitmap.ToBandIcon();

              var smallBitmap = new WriteableBitmap(24, 24);
              smallBitmap.SetSource(smallStream);
              var smallIcon = smallBitmap.ToBandIcon();

              this.tileGuid = Guid.NewGuid();
              this.pageGuid = Guid.NewGuid();

              var panel = new FilledPanel()
              {
                BackgroundColor = new BandColor(0xFF, 0x00, 0x00),
                Rect = new PageRect(0, 0, 245, 102),
                BackgroundColorSource = ElementColorSource.BandBase
              };
              panel.Elements.Add(
                new Microsoft.Band.Tiles.Pages.TextButton()
                {
                  ElementId = this.buttonElementId,
                  Rect = new PageRect(0, 0, 245, 102),
                  VerticalAlignment = Microsoft.Band.Tiles.Pages.VerticalAlignment.Bottom,
                  HorizontalAlignment = Microsoft.Band.Tiles.Pages.HorizontalAlignment.Center
                }
              );
              var pageLayout = new PageLayout(panel);

              var bandTile = new BandTile(this.tileGuid)
              {
                Name = "Test",
                TileIcon = largeIcon,
                SmallIcon = smallIcon
              };
              bandTile.PageLayouts.Add(pageLayout);

              var added = await this.client.TileManager.AddTileAsync(bandTile);
            }
          }
        }
      }

      this.pageGuid = Guid.NewGuid();

      // The hard-coded 0 here means 'layout 0'
      await this.client.TileManager.SetPagesAsync(
        this.tileGuid,
        new PageData(
          this.pageGuid,
          0, 
          new TextButtonData(this.buttonElementId, "Click Me")));
    }
    async void OnRemove(object sender, RoutedEventArgs e)
    {
      await this.client.TileManager.StopReadingsAsync();
      this.client.TileManager.TileOpened -= OnTileOpened;
      this.client.TileManager.TileClosed -= OnTileClosed;
      this.client.TileManager.TileButtonPressed -= OnTileButtonPressed;
      await this.client.TileManager.RemoveTileAsync(this.tileGuid);
    }

    void OnRegister(object sender, RoutedEventArgs e)
    {
      this.client.TileManager.TileOpened += OnTileOpened;
      this.client.TileManager.TileClosed += OnTileClosed;
      this.client.TileManager.TileButtonPressed += OnTileButtonPressed;
      this.client.TileManager.StartReadingsAsync();
    }

    void OnTileButtonPressed(object sender,
      BandTileEventArgs<IBandTileButtonPressedEvent> e)
    {
      if ((e.TileEvent.TileId == this.tileGuid) &&
        (e.TileEvent.PageId == this.pageGuid) &&
        (e.TileEvent.ElementId == this.buttonElementId))
      {
        // My button!
      }
    }

    void OnTileClosed(object sender, BandTileEventArgs<IBandTileClosedEvent> e)
    {
      if (e.TileEvent.TileId == this.tileGuid)
      {
        // My tile!
      }
    }
    async void OnTileOpened(object sender, BandTileEventArgs<IBandTileOpenedEvent> e)
    {
      if (e.TileEvent.TileId == this.tileGuid)
      {
        // My Tile
      }
    }
    IBandClient client;
    Guid tileGuid;
    Guid pageGuid;
    short buttonElementId;
    BandTile bandTile;
  }

and the OnCreateTile handler here creates a tile on the band with a single page with a single button on it whose data is dynamically set to say “Click Me”.

The OnRegister handler sets up event handlers for when that tile is entered/left and also for when the button is clicked – the OnTileButtonPressed handler.

That’s all quite nice and not too difficult at all. What else can the Band 2 do for me?

Personalisation

The last bit of obvious functionality in the Band SDK is the personalisation that lets you write code to;

    • Change the ‘Me Tile’
    • Change the theme

and so it feels like this is the set of APIs that underpin some (or all) of the Health app’s capability to play around with the Band 2’s settings which you might want to duplicate in your own app.

What’s Missing?

The one thing I’d really like to see from the Band 2 SDK is a means by which an action initiated on the Band 2 can cause something to happen on a paired phone or PC even if that phone or PC isn’t actively waiting for it.

I’m thinking of some kind of feature whereby an app on the phone or PC can register a background trigger (possibly on a bluetooth socket) so that the system can quietly wait for an action on the Band 2 (like a button press) and run some background code when that occurs.

From the SDK docs, it looks like that’s do-able on Android and it’s probably do-able on Windows too if you go spelunking into the bluetooth protocol that underpins the SDK but, without that, it’s the one most obvious thing that I’d like to see added.

Otherwise, the SDK’s really easy to pick up and code against – I enjoyed having a quick play with the Band 2.

Windows 10 Devices and ‘The Developer Portal’

I haven’t found official documentation on this anywhere so I thought I’d write up my own quick notes which I can replace when I find the right web pages to link to – I’ve seen quite a few people talk about the portal that’s present on both IoT Core and on Windows Mobile and I’m assuming on Surface Hub but I haven’t checked that last one at the time of writing.

By way of rough notes…

I’m currently on a mixture of wired/WiFi networking where I have my Surface Pro 3 development machine on a wired network along with a Raspberry PI 2 running Windows IoT Core and a Windows Mobile 10 device (my phone) which connects to the same network but over WiFi.

If I use the IoT Dashboard application then it shows me a couple of devices on my network;

image

and so it’s showing both my PI and my phone.

If I use the “Open Device Portal” on the Raspberry PI 2 then a browser opens;

image

with the portal for the IoT core device and I can use this to manage aspects of the device as per the list on the left hand side of the screen – specifically, I can use this to add/remove/stop/start apps and I can use it to set up networking and bluetooth devices.

If you’ve seen me talk about IoT Core in public then you’ll have seen me talk my way through this portal and show a few things.

What I’ve never shown in public (AFAIK) though is what happens if I use the “Open Device Portal” on my phone. In the first instance, I see an error;

image

but I can fix that by losing the :8080 port on the address here although (right now) I still find myself with a warning;

image

but if I go around that I get to;

image

and this leads over to the pages on the phone where I can set all of this up. As you can see, I have the device in developer mode;

wp_ss_20160203_0001

and you’ll notice that I have “Device discovery” switched on here;wp_ss_20160203_0002

and that it says that this is available over both USB and WiFi, that I have no devices paired and that the portal is also switched on and advertising itself at http://192.168.0.11.

wp_ss_20160203_0003

If I then go through that ‘Pair’ process on the phone, I get a code;wp_ss_20160203_0004

which I can then type into the web page and I’m then onto the portal on the device and here’s the phone portal side by side with the IoT Core portal;

image

and from that Phone portal I can then go and add/remove/stop/start apps and I can look at performance, networking and so on for the phone just like I can on IoT Core device.

What I’d really like to do is to also be able to debug here over WiFi without having to connect a USB cable and I don’t know at the time of writing whether I can do that or not.

If I unplug the USB cable then Visual Studio certainly sees the device as a remote target over the network if I use the Universal authentication mode;

image

but attempting to deploy using this mechanism ends up in me being prompted for a PIN;

image

and I don’t seem to be able to provide any type of PIN here that makes the phone play nicely with me so I’m not sure whether this is just because I don’t know how to provide the right PIN or whether there isn’t a PIN that will work here right now.

Let me know if you know and I’ll update the post.

Intel RealSense SR300, Windows 10 and UWP?

I’ve written quite a few posts about RealSense and, specifically, around the F200 front-facing camera which has a tonne of capability.

I’d taken my eye off the ball a little though in that I only recently caught sight of one of the recent updates around the RealSense developer kit;

RealSense Developer Kit

which talks of the current F200 camera being transitioned to the a new SR300 camera which I think was already shown built into the Razer Stargazer camera shown at CES;

image

and it looks like this requires USB 3.0 and is Windows 10 only and will be available in Q2.

Meanwhile, back at Intel, it looks like the developer kit isn’t yet ready but might be ready some time this month.

There’s more info on this camera in Intel’s SDK update notes in this post;

What’s News in R5 of the Intel® RealSense™ SDK R5 (v7)

which says that the SDK supports the SR300 with new, specific features;

    • Cursor Mode – a very responsive tracking of finger/hand movement to screen. Using only half the power while not requiring the identification/calibration of full hand mode with no latency, longer range (90cm) and faster speed detection (2 meters/second). Includes a click gesture that simulates clicking the mouse.
    • Windows Universal support. The SR300 can use the ‘Microsoft 3D camera’ through the WinRT* service via the Intel RealSense plugin. Windows 10 Hello and Universal Apps will use the 3D camera middleware where the Intel Realsense Runtime calls the camera API which can use the DCM.

I’m not sure that I fully understand what the ‘Microsoft 3D camera’ is but I like the idea of Universal Apps getting support for RealSense as, to date, it’s been a desktop app thing.

I’d also missed this post;

Get Ready for Intel RealSense SDK Universal Windows Platform Apps

which says that “R5 delivers … components … for developing UWP apps that use the Intel RealSense camera (SR300)”.

It sounds like the initial support here is going to be for “raw color and depth streaming and the blob tracking algorithm”.

That doesn’t seem out of line with what I looked at in terms of UWP support for the Kinect for Windows V2 camera in this post;

Kinect V2, Windows Hello and Perception APIs

but there’s also a bit of a teaser in that the article states “Other UWP-specific algorithms are in development”.

It all sounds promising and, as usual, I know no more about this than what’s written on the public web here but I’m waiting for the notification to come through that the SR300 is available and then I’ll be getting my order in to see what the SDK enables.