I wanted to experiment with what it meant to add simple speech capability to a Windows Phone 8.1 application and so I needed a quick place to experiment and I thought I’d try out the Bing Synonym API which lets you find alternate references to entities based on Bing’s picture of the internet.
That API is available in a technical preview up on the developer centre and it’s a fairly simple API to make use of, involving an HTTP request/response following the OData specification for constructing the URI and for the responses thrown back.
Making use of this from Windows Phone 8.1 caused me to hit an initial snag though in that I struggled to find a library with support for client-side parsing of OData. I tried to Nuget my way into using this one;
but that died for me at the point where it tried to bring in its dependency on System.Spatial (huh?);
and so that didn’t work. I wonder what the heck System.Spatial is doing in there – I remember when OData was just some AtomPub data that needed an HTTP client and a bit of XML parsing. When did it grow all these dependencies?
Anyway, I tried a few more including this one;
with no success so I went off to look up WCF Data Services and its support for client libraries and ended up back in the same place.
I gave up on trying to find a proper OData client-side library that would do what I wanted here and decided just to go for the cheapest, simplest approach to calling the API and so I wrote a little class to try and help with that;
class SynonymApi { public SynonymApi(string apiKey) { if (string.IsNullOrEmpty(apiKey)) { throw new ArgumentException("Sorry, I think they make you have a key for this API"); } this.apiKey = apiKey; } public async Task<IEnumerable<string>> GetSynonymsAsync(string term) { List<string> synonyms = new List<string>(); HttpClientHandler handler = new HttpClientHandler() { Credentials = new NetworkCredential(apiKey, apiKey) }; HttpClient client = new HttpClient(handler); Uri uri = new Uri( string.Format(uriFormatString, Uri.EscapeDataString(term))); HttpResponseMessage responseMsg = await client.GetAsync(uri); if (responseMsg.IsSuccessStatusCode) { using (Stream stream = await responseMsg.Content.ReadAsStreamAsync()) { XElement xElement = XElement.Load(stream); synonyms.AddRange( xElement.DescendantsAndSelf(synonymElementName).Select(x => (string)x)); } } return (synonyms); } static XName synonymElementName = XName.Get("Synonym", "http://schemas.microsoft.com/ado/2007/08/dataservices"); static readonly string uriFormatString = "https://api.datamarket.azure.com/Bing/Synonyms/v1/GetSynonyms?Query=%27{0}%27"; string apiKey; }
and now I have something that I can use to call this API. In testing this out I found that the API is quite “funny” with respect to what synonyms it brings back. For instance, the API seems to love “J Lo”;
but it can’t come up with a synonym for the word “river”;
Not to worry – it works from the point of view of what I want here and I suppose the next thing to do would be to write a little app to call this API with a query and then display the results.
I wanted to keep that fairly minimal so I’ll just write 2 pages with some code-behind and I’m not going to worry about proper state management and lifecycle management and that kind of thing. That’s for another day. For now, I’ll just have a MainPage and a SearchResultsPage.
MainPage.xaml/.cs
<Page x:Class="App120.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:App120" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d"> <Page.BottomAppBar> <CommandBar> <AppBarButton Click="OnSearch"> <AppBarButton.Icon> <SymbolIcon Symbol="Find" Name="search"/> </AppBarButton.Icon> </AppBarButton> </CommandBar> </Page.BottomAppBar> <Page.Background> <ImageBrush ImageSource="Assets/backdrop.jpg" /> </Page.Background> <Grid> <Grid.ChildrenTransitions> <TransitionCollection> <EntranceThemeTransition /> </TransitionCollection> </Grid.ChildrenTransitions> <Grid.RowDefinitions> <RowDefinition Height="Auto" /> <RowDefinition Height="*" /> </Grid.RowDefinitions> <!-- Title Panel --> <StackPanel Grid.Row="0" Margin="19,0,0,0"> <TextBlock Text="SYNONYMS" Style="{ThemeResource TitleTextBlockStyle}" Margin="0,12,0,0" /> <TextBlock Text="search" Margin="0,-6.5,0,26.5" Style="{ThemeResource HeaderTextBlockStyle}" CharacterSpacing="{ThemeResource PivotHeaderItemCharacterSpacing}" /> </StackPanel> <!--TODO: Content should be placed within the following grid--> <Grid Grid.Row="1" x:Name="ContentRoot" Margin="19,9.5,19,0"> <TextBox Header="term to search for" x:Name="txtSearch" /> </Grid> </Grid> </Page>
and then some code;
using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; using Windows.UI.Xaml.Navigation; namespace App120 { public sealed partial class MainPage : Page { public MainPage() { this.InitializeComponent(); } private void OnSearch(object sender, RoutedEventArgs e) { if (!string.IsNullOrEmpty(this.txtSearch.Text)) { this.Frame.Navigate(typeof(SearchResultsPage), this.txtSearch.Text); } } } }
SearchResultsPage.xaml/.cs
<Page x:Class="App120.SearchResultsPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:App120" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d"> <Page.Background> <ImageBrush ImageSource="Assets/backdrop.jpg" /> </Page.Background> <Grid> <Grid.ChildrenTransitions> <TransitionCollection> <EntranceThemeTransition /> </TransitionCollection> </Grid.ChildrenTransitions> <Grid.RowDefinitions> <RowDefinition Height="Auto" /> <RowDefinition Height="*" /> </Grid.RowDefinitions> <!-- Title Panel --> <StackPanel Grid.Row="0" Margin="19,0,0,0"> <TextBlock Text="SYNONYMS" Style="{ThemeResource TitleTextBlockStyle}" Margin="0,12,0,0" /> <TextBlock Text="search" x:Name="txtTitle" Margin="0,-6.5,0,26.5" Style="{ThemeResource HeaderTextBlockStyle}" CharacterSpacing="{ThemeResource PivotHeaderItemCharacterSpacing}" /> </StackPanel> <!--TODO: Content should be placed within the following grid--> <Grid Grid.Row="1" x:Name="ContentRoot" Margin="19,9.5,19,0"> <ListView x:Name="lstView" ItemsSource="{Binding}"> <ListView.ItemTemplate> <DataTemplate> <TextBlock Style="{StaticResource ListViewItemTextBlockStyle}" Text="{Binding}"/> </DataTemplate> </ListView.ItemTemplate> </ListView> <ProgressBar IsIndeterminate="True" IsEnabled="False" x:Name="progressBar" Foreground="White" Visibility="Collapsed"/> </Grid> </Grid> </Page>
and a little bit of code;
using System.Linq; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; using Windows.UI.Xaml.Navigation; namespace App120 { public sealed partial class SearchResultsPage : Page { public SearchResultsPage() { this.InitializeComponent(); } void SwitchBar(bool on = true) { this.progressBar.Visibility = on ? Visibility.Visible : Visibility.Collapsed; this.progressBar.IsEnabled = on; } protected async override void OnNavigatedTo(NavigationEventArgs e) { string term = e.Parameter as string; this.SwitchBar(); this.lstView.DataContext = null; SynonymApi api = new SynonymApi(BING_KEY); var synonyms = await api.GetSynonymsAsync(term); if ((synonyms != null) && (synonyms.Count() > 0)) { this.lstView.DataContext = synonyms; } this.SwitchBar(false); } static readonly string BING_KEY = ""; } }
Running the App
Ok, that gives me enough code that I can run and do a basic search in which results in;
and has me wondering how easy/hard it is to get that to work with voice commands on the phone?
Adding Voice
I’ve seen various demos of people adding voice commands to phone applications in the past but I’m not sure I’ve ever done it myself and so I took a look at this MSDN article;
http://msdn.microsoft.com/en-us/library/windowsphone/develop/dn630430.aspx
and followed through making myself a little VCD file. I got a little confused with MSDN mentioning “a VCD file” as it seems that the VCD part is really just a particular XML grammar.
I started with the example taken from that MSDN page;
<?xml version="1.0" encoding="utf-8"?> <VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.1"> <CommandSet xml:lang="en-us"> <CommandPrefix> Contoso Widgets </CommandPrefix> <Example> Show today's specials </Example> <Command Name="showWidgets"> <Example> Show today's specials </Example> <ListenFor> [Show] {widgetViews} </ListenFor> <Feedback> Showing {widgetViews} </Feedback> <Navigate Target="favorites.xaml"/> </Command> <PhraseList Label="widgetViews"> <Item> today's specials </Item> <Item> best sellers </Item> </PhraseList> </CommandSet> <!-- Other CommandSets for other languages --> </VoiceCommands>
and then did a bit of poking around on the reference to see what my options were. I’m unlikely to exhaust the schema with this little app but I wanted to get a flavour of it and I got a bit bogged down in trying to figure out how I could have a voice command like;
“synonyms for [phrase]”
and, specifically, I got bogged down in how to apply a “wildcard” to that [phrase] portion until I took a look at this Talk from Build and, specifically, the parts around slides 22, 23, 24 from the Powerpoint here which helped a lot in understanding what I was missing about the way in which I could pick up arbitrary speech.
That got me to this XML file;
<?xml version="1.0" encoding="utf-8"?> <VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.1"> <CommandSet xml:lang="en-us"> <CommandPrefix>synonyms</CommandPrefix> <Example>for a word, name or phrase</Example> <Command Name="for"> <Example>for a word, name or phrase</Example> <ListenFor>[for] {dictatedSearchTerms}</ListenFor> <Feedback>checking that out on Bing for you, give me a second</Feedback> <Navigate Target="SearchResultsPage.xaml"/> </Command> <PhraseTopic Label="dictatedSearchTerms" Scenario="Natural Language"> </PhraseTopic> </CommandSet> </VoiceCommands>
One thing I’d say is that I’m not 100% sure on the role of the SearchResultsPage.xaml as the Target above as (as far as I can tell) I get to control the navigation myself anyway when the app is activated.
I then attempt to pick up the spoken text in my App.OnActivated handler although I’m not 100% sure that I got this right just yet;
protected override void OnActivated(IActivatedEventArgs args) { this.CreateUI(args.Kind != ActivationKind.VoiceCommand); base.OnActivated(args); if (args.Kind == ActivationKind.VoiceCommand) { VoiceCommandActivatedEventArgs voiceArgs = (VoiceCommandActivatedEventArgs)args; // Unsure whether these ever come back null or not if ((voiceArgs.Result != null) && (voiceArgs.Result.SemanticInterpretation != null) && (voiceArgs.Result.SemanticInterpretation.Properties != null)) { IReadOnlyList<string> list = null; if (voiceArgs.Result.SemanticInterpretation.Properties.TryGetValue( "dictatedSearchTerms", out list)) { StringBuilder builder = new StringBuilder(); for (int i = 0; i < list.Count; i++) { builder.AppendFormat("{0}{1}", i == 0 ? string.Empty : " ", list[i]); } string query = builder.ToString(); this.rootFrame.Navigate(typeof(SearchResultsPage), query); } } } }
I’m not sure if there’s a better way to get all of the dictatedSearchTerms into a single string such that I can then pass that string as an argument to the call to rootFrame.Navigate but the method that I’m using about seems to work.
The CreateUI function above is a single piece of code which creates the UI (i.e. the Frame) for the app with the boolean flag being passed to try and indicate whether the function should navigate the Frame created to the default page at startup or not – in the case where the app is activated from a speech query, this wouldn’t be necessary.
I register the voice file above every time that the application starts up ( which is a bit wasteful, I could set a flag to stop this ) with code like;
async void InstallSpeech() { var storageFile = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Voice.xml")); await VoiceCommandManager.InstallCommandSetsFromStorageFileAsync(storageFile); }
and that’s pretty much it.
Using the App
With just those little pieces of code added, the experience of using the app isn’t bad at all – I’m impressed by how little I have to change.
Firstly, if I ask Cortana what I can say then I see;
and there’s my app with its suggestion as to what can be said which I can drill slightly further into ( not much because I only have one example );
Now, if I try out a voice query without the app already running then I see;
and that works quite nicely – if I go “back” from here then because this is the only screen the app has shown, it goes back to the home screen on the phone whereas the navigation would correctly go back to the previous screen within the app itself if it had already been running when the voice query was issued.
I’m going to experiment with this some more – the code I’ve got so far is very rough and ready but if it’s of any use to you then it’s here for download.