A comment flooded in to the blog from my reader asking for a UWP speech demo using an SRGS grammar. It so happens that I’ve got one or two of those kicking around and so I simplified one of them down to the example that you see running in the video below;
I’m no expert at SRGS grammars (or at anything for that matter) but I’ve written one or two in the past and so I put together this grammar here;
<?xml version="1.0" encoding="utf-8" ?> <grammar version="1.0" mode="voice" root="commands" xml:lang="en-US" tag-format="semantics/1.0" xmlns="http://www.w3.org/2001/06/grammar"> <rule id="commands"> <one-of> <item> <ruleref uri="#size"/> </item> <item> <ruleref uri="#move"/> </item> </one-of> </rule> <rule id="size"> <one-of> <item> make the rectangle <ruleref uri="#sizetype" /> <tag> out.action=rules.latest(); </tag> </item> </one-of> </rule> <rule id="move"> <one-of> <item> move the rectangle <ruleref uri="#direction" /> <tag> out.action=rules.latest(); </tag> </item> </one-of> </rule> <rule id="direction"> <one-of> <item> <tag>out="left";</tag> <one-of> <item>left</item> </one-of> </item> <item> <tag>out="right";</tag> <one-of> <item>right</item> </one-of> </item> </one-of> </rule> <rule id="sizetype"> <one-of> <item> <tag>out="bigger";</tag> <one-of> <item>bigger</item> <item>larger</item> <item>increase</item> </one-of> </item> <item> <tag>out="smaller";</tag> <one-of> <item>smaller</item> <item>small</item> <item>decrease</item> </one-of> </item> </one-of> </rule> </grammar>
and then I just made a blank project in Visual Studio 2015 for the UWP, made sure that the manifest gave my application access to the microphone and then I wrote this XAML to put a rectangle on the screen;
<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <Rectangle Width="200" Height="200" Stroke="Black" StrokeThickness="3" Fill="REd" RenderTransformOrigin="0.5,0.5"> <Rectangle.RenderTransform> <CompositeTransform x:Name="transform" /> </Rectangle.RenderTransform> </Rectangle> </Grid>
and I wrote this code behind it in order to use a SpeechRecognizer with a grammar constraint to recognise some simple commands around moving the rectangle;
namespace QuickSpeechDemo { using System; using System.Collections.Generic; using System.Linq; using Windows.Media.SpeechRecognition; using Windows.Storage; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; public sealed partial class MainPage : Page { public MainPage() { this.InitializeComponent(); this.Loaded += OnLoaded; // Clearly, this complex dictionary would be better wrapped into an // object model - I'm being lazy. this.actions = new Dictionary<string, Dictionary<string, Action>>() { [SIZE_RULE] = new Dictionary<string, Action>() { [BIGGER_OPTION] = OnBigger, [SMALLER_OPTION] = OnSmaller }, [MOVE_RULE] = new Dictionary<string, Action>() { [LEFT_OPTION] = OnLeft, [RIGHT_OPTION] = OnRight } }; } void OnBigger() { this.transform.ScaleX += SCALE; this.transform.ScaleY += SCALE; } void OnSmaller() { this.transform.ScaleX -= SCALE; this.transform.ScaleY -= SCALE; } void OnLeft() { this.transform.TranslateX += (0 - TRANSLATION); } void OnRight() { this.transform.TranslateX += TRANSLATION; } async void OnLoaded(object sender, RoutedEventArgs e) { this.speechRecognizer = new SpeechRecognizer(); this.speechRecognizer.Timeouts.BabbleTimeout = TimeSpan.FromSeconds(0); this.speechRecognizer.Timeouts.InitialSilenceTimeout = TimeSpan.FromSeconds(0); this.speechRecognizer.Timeouts.EndSilenceTimeout = TimeSpan.FromSeconds(0); var grammarFile = await StorageFile.GetFileFromApplicationUriAsync( new Uri("ms-appx:///grammar.xml")); this.speechRecognizer.Constraints.Add( new SpeechRecognitionGrammarFileConstraint(grammarFile)); var result = await speechRecognizer.CompileConstraintsAsync(); if (result.Status == SpeechRecognitionResultStatus.Success) { while (true) { var speechResult = await speechRecognizer.RecognizeAsync(); if ((speechResult.Confidence == SpeechRecognitionConfidence.Medium) || (speechResult.Confidence == SpeechRecognitionConfidence.High)) { string spokenCommand = string.Empty; var lastRulePath = speechResult.RulePath.Last(); IReadOnlyList<string> values = null; if (speechResult?.SemanticInterpretation?.Properties.TryGetValue( ACTION_IDENTIFIER, out values) == (bool)true) { var action = values.FirstOrDefault(); // Ok, we have a rule and an action. Need to execute on it. this.actions[lastRulePath]?[action]?.Invoke(); } } } } } SpeechRecognizer speechRecognizer; Dictionary<string, Dictionary<string, Action>> actions; static readonly string SIZE_RULE = "sizetype"; static readonly string MOVE_RULE = "direction"; static readonly string BIGGER_OPTION = "bigger"; static readonly string SMALLER_OPTION = "smaller"; static readonly string LEFT_OPTION = "left"; static readonly string RIGHT_OPTION = "right"; static readonly string ACTION_IDENTIFIER = "action"; static readonly int TRANSLATION = 100; static readonly float SCALE = 0.5f; } }
Sure, I’m being a bit lazy with some of the code there and in the real world I’d build some kind of framework for listening/dispatching commands heard audibly but hopefully the reader will get the idea from this snippet and perhaps be able to take this further and do complex stuff with grammars.
The code is here for download.