OCR Library for WinRT Apps–Recognising Business Cards…

I was interested to see this blog post from the official Windows App Building blog;

Microsoft OCR Library for Windows Runtime

and it made me want  to try those bits out a little in the context of with a Windows 8.1 or a Windows Phone 8.1 application.

I’ve seen a few apps in my time which do the classic thing of trying to recognise text on business cards and so I thought I’d see if I could quickly knock something together on my phone to see what this library was like to code against.

Here’s a quick screen capture of where I got to and where this blog post gets you to if you read to the bottom;

In looking at the comments on the post on the Windows App Building blog I noticed this video from one of my colleagues, Jerry, which also talks about this new OCR library so I thought that I’d include that as a good reference too;

Now, that video pretty much has the sort of “hello world” that I was thinking of putting together but, what the heck, I thought I’d take a look at it anyway just to gain from the experience – I prefer to “do” rather than just to “read”.

First Attempt – Live Video Frames Processed in Realtime

My first thought was to attempt to make one of those “live video preview” style apps where you hold a phone up to something like a business card and it overlays the results of running OCR detection on that business card onto the video frames. My idea was that the app would have some regular expressions for pieces of text that are commonly found in (English) business cards and it would draw little coloured boxes around the email, phone number and so on that it spotted in the video frames.

To that end, I made myself a blank Windows Phone 8.1 project, searched “Microsoft OCR” on NuGet, added the package, changed my configuration of CPU to be x86 for the emulator and then set about exploring the API. It didn’t take long – this is a neat little library with not too much to explore here. I added the classes that I looked at to the class diagram below;

image

and so the object model is to create an OcrEngine, feed it a language, hand it a bitmap via its RecognizeAsync method which then returns a OcrResult which is a bunch of OcrLine which is a bunch of OcrWord.

Done.

I guess the first step then is to get hold of some bitmaps and feed them into this OCR engine.

To do this, I was planning to use the regular MediaCapture API along with the CaptureElement which can display a preview of images from that MediaCapture but, for the life of me, I can’t get that to work on Windows Phone 8.1 right now. My understanding is that there’s a bug in this area and I seemed to be blocked in the sense that what I was trying was either not working or was crashing the phones that I was trying to debug on across both the Windows Phone 8.1 developer preview but also on the regular released Windows Phone 8.1.

So, for the moment, I backtracked on my plan to do realtime video overlay and set about trying something a little less ambitious.

Second Attempt – Business Card Photos Taken from flickR

For my second attempt, I thought that I’d switch from trying to grab realtime video frames from the camera and instead just use a flickR image search to get some images of business cards and then I can display those and overlay the results of OCR recognition on top of the images.

I broke that down into a number of stages…

Getting Some Code to Search flickR

I have a bunch of flickR searching code kicking around already as I’ve been using it for demos for ages now but I modified it somewhat and came up with this little class which knows how to represent a flickR URL ready to call the search API;

namespace App253.FlickrSearch
{
  using System.Collections.Generic;
  using System.Text;

  internal class FlickrSearchUrl
  {
    static string apiKey = "SORRY, YOU NEED A KEY";
    static string serviceUri = "https://api.flickr.com/services/rest/?method=";
    static string baseUri = serviceUri + "flickr.photos.search&";
    public int ContentType { get; set; }
    public int PerPage { get; set; }
    public int Page { get; set; }
    public List<string> SearchTags { get; private set; }

    private FlickrSearchUrl()
    {

    }
    public FlickrSearchUrl(
        string searchTag,
        int pageNo = 0,
        int perPage = 5,
        int contentType = 1
    ) : this()
    {
      this.SearchTags = new List<string>()
      {
        searchTag
      };
      this.Page = pageNo;
      this.PerPage = perPage;
      this.ContentType = contentType;
    }
    public override string ToString()
    {
      // flickR has pages numbered 1..N (I think :-)) whereas this class thinks
      // of 0..N-1. 
      StringBuilder tagBuilder = new StringBuilder();
      for (int i = 0; i < this.SearchTags.Count; i++)
      {
        tagBuilder.AppendFormat("{0}{1}",
          i != 0 ? "," : string.Empty,
          this.SearchTags[i]);
      }
      return (
        string.Format(
          baseUri +
          "api_key={0}&" +
          "safe_search=1&" +
          "tags={1}&" +
          "page={2}&" +
          "per_page={3}&" +
          "content_type={4}",
          apiKey, tagBuilder.ToString(), this.Page + 1, this.PerPage, this.ContentType));
    }
  }
}

and then I built a little class that represents a single image brought back from such a search (again, I had this code knocking around);

namespace App253.FlickrSearch
{
  using System.Xml.Linq;

  public class FlickrPhotoResult
  {
    public FlickrPhotoResult(XElement photo)
    {
      Id = (long)photo.Attribute("id");
      Secret = photo.Attribute("secret").Value;
      Farm = (int)photo.Attribute("farm");
      Server = (int)photo.Attribute("server");
      Title = photo.Attribute("title").Value;
    }
    public long Id { get; private set; }
    public string Secret { get; private set; }
    public int Farm { get; private set; }
    public string Title { get; private set; }
    public int Server { get; private set; }

    public string ImageUrl
    {
      get
      {
        return (string.Format(
          "http://farm{0}.static.flickr.com/{1}/{2}_{3}_z.jpg",
          Farm, Server, Id, Secret));
      }
    }
  }
}

and a class that gathers up a set of these into a list and also can store the total number of pages of images that come back from a flickR search;

namespace App253.FlickrSearch
{
  using System.Collections.Generic;

  internal class FlickrSearchResult
  {
    public int Pages { get; set; }
    public List<FlickrPhotoResult> Photos { get; set; }
  }
}

and finally I put together a little class that actually takes the FlickrSearchUrl and executes an HTTP request to get the data before serializing it back into one of these FlickrSearchResult objects;

namespace App253.FlickrSearch
{
  using System.IO;
  using System.Linq;
  using System.Net.Http;
  using System.Threading.Tasks;
  using System.Xml.Linq;

  internal static class FlickrSearcher
  {
    public static async Task<FlickrSearchResult> SearchAsync(FlickrSearchUrl searchUrl)
    {
      HttpClient client = new HttpClient();
      FlickrSearchResult result = new FlickrSearchResult();

      using (HttpResponseMessage response = await client.GetAsync(searchUrl.ToString()))
      {
        if (response.IsSuccessStatusCode)
        {
          using (Stream stream = await response.Content.ReadAsStreamAsync())
          {
            XElement xml = XElement.Load(stream);
            result.Photos =
                (
                    from p in xml.DescendantsAndSelf("photo")
                    select new FlickrPhotoResult(p)
                ).ToList();

            result.Pages =
                (int)xml.DescendantsAndSelf("photos").First().Attribute("pages");
          }
        }
      }
      return (result);
    }
  }
}

With that in place, I have some code that I can use to execute a flickR search for a particular set of search tags and I can specify a page size and a page number and so I can use this to page through results for a search.

I want to display those photos…

Making a FlipView that does Incremental Loading

My intention was to have this flickR search be a sort of “endless” list of photos in a FlipView without the user having to be aware of the network traffic going on behind the scenes in order to support the paged data. I’ve done similar things before so I figured I’d implement ISupportIncrementalLoading over the top of my flickR searching code such as to build out an incrementally loading set of business card data.

In doing that (and in what follows) I wanted to take care not to have a collection which contained lots of actual image data. I just wanted to have a collection of URLs and a few other strings. I built out this class;

namespace App253
{
  using App253.FlickrSearch;
  using System;
  using System.Collections.ObjectModel;
  using System.Linq;
  using System.Threading.Tasks;
  using Windows.Foundation;
  using Windows.UI.Xaml.Data;

  internal class FlickrBusinessCardPhotoResultCollection :
    ObservableCollection<FlickrPhotoResult>, ISupportIncrementalLoading
  {
    public FlickrBusinessCardPhotoResultCollection()
    {
      this.searchUrl = new FlickrSearchUrl("business card", perPage: 5);
    }
    public bool HasMoreItems
    {
      get
      {
        // initially, we don't know because we haven't asked. once we've asked
        // we'll have a better picture...
        return (!pages.HasValue || this.searchUrl.Page <= pages.Value);
      }
    }
    public IAsyncOperation<LoadMoreItemsResult> LoadMoreItemsAsync(uint count)
    {
      return (this.LoadMoreItemsInternalAsync(count).AsAsyncOperation());
    }
    async Task<LoadMoreItemsResult> LoadMoreItemsInternalAsync(uint count)
    {
      LoadMoreItemsResult result = new LoadMoreItemsResult();
      uint pagesRequested = (uint)(count / this.searchUrl.PerPage);

      if ((count % this.searchUrl.PerPage) != 0)
      {
        pagesRequested++;
      }

      for (
        int i = 0;
        (
          (i < pagesRequested) &&
          (!this.pages.HasValue || this.searchUrl.Page < this.pages.Value)
        );
        i++)
      {
        try
        {
          var results = await FlickrSearcher.SearchAsync(this.searchUrl);

          this.searchUrl.Page++;

          pages = results.Pages;

          foreach (var photo in results.Photos)
          {
            this.Add(photo);
          }
          result.Count = (uint)results.Photos.Count();
        }
        catch
        {
          // don't really mind why this failed in this case.
        }        
      }
      return (result);
    }
    int? pages;
    FlickrSearchUrl searchUrl;
  }
}

and you’ll spot that what it’s trying to do is to use the underlying FlickrSearcher and FlickrSearchUrl pieces to page its way through data and grow out an ever-growing ObservableCollection for some control to bind to.

This all works “reasonably” well but then I hit a “minor snag” in that I’d intended to put a FlipView on top of this code and I had forgotten that the FlipView doesn’t support data collections that implement ISupportIncrementalLoading.

That was a bit of a blow but I figured that I would go ahead and try to build a little derivation of the FlipView that at least worked for my purposes in terms of supporting incremental loading;

namespace App253
{
  using System;
  using System.Collections;
  using System.Threading.Tasks;
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;
  using Windows.UI.Xaml.Data;

  /// <summary>
  ///  Trying to make a cheap/cheerful FlipView that does something with a data
  ///  source that implements ISupportIncrementalLoading. Not trying to deal
  ///  with all cases like e.g. where someone explicitly sets the items source
  ///  and so on - just trying to deal with a regular, data-bound flipview.
  /// </summary>
  class IncrementalFlipView : FlipView
  {
    public IncrementalFlipView()
    {
      this.DataContextChanged += OnDataContextChanged;
      this.SelectionChanged += OnSelectionChanged;
    }
    async void OnSelectionChanged(object sender, SelectionChangedEventArgs e)
    {
      // are we getting near to the end of the items that we have.
      IList list = this.ItemsSource as IList;

      if (list != null)
      {
        if (list.Count - this.SelectedIndex <= LOAD_THRESHOLD)
        {
          // see if we've got anything more to load.
          await this.LoadNextDataItemsAsync();
        }
      }
    }
    async void OnDataContextChanged(FrameworkElement sender,
      DataContextChangedEventArgs args)
    {
      if (this.ItemsSource != this.previousItemsSource)
      {
        this.previousItemsSource = this.ItemsSource;

        this.loader = this.ItemsSource as ISupportIncrementalLoading;

        if (this.loader != null)
        {
          await this.LoadNextDataItemsAsync();
        }
      }
    }
    async Task LoadNextDataItemsAsync()
    {
      if (this.loader.HasMoreItems)
      {
        await this.loader.LoadMoreItemsAsync(10);
      }
    }
    static readonly int LOAD_THRESHOLD = 3;
    static readonly int LOAD_ITEMS = 10;
    object previousItemsSource;
    ISupportIncrementalLoading loader;
  }
}

What this is trying to do is to monitor whenever the DataContext property changes (i.e. it’s only going to work if things are done by binding) and whenever that property changes we take a look at the ItemsSource property to see if it implements ISupportIncrementalLoading and, if so, we attempt to use it such that we initially load the first set of data items and then whenever the control moves to within 3 items of the end of the current data-set it sees if it can load 10 more items of data.

With that in place, I knocked up a basic ViewModel;

namespace App253
{
  using App253.FlickrSearch;
  using System;
  using System.Threading.Tasks;
  using System.Windows.Input;
  using Windows.Foundation;
  using Windows.Graphics.Imaging;
  using Windows.Storage.Streams;
  using Windows.UI;
  using Windows.UI.Xaml.Controls;
  using Windows.UI.Xaml.Media;
  using Windows.UI.Xaml.Shapes;
  using Windows.Web.Http;
  using WindowsPreview.Media.Ocr;

  class ViewModel : ViewModelBase
  {
    public ViewModel()
    {
      this.Items = new FlickrBusinessCardPhotoResultCollection();
      this.recogniseCommand = new SimpleCommand(this.OnRecognise);
      this.IsIdle = true;
    }
    public FlickrBusinessCardPhotoResultCollection Items
    {
      get;
      private set; // never called.
    }
    public FlickrPhotoResult SelectedPhoto
    {
      get
      {
        return (this.selectedPhoto);
      }
      set
      {
        this.selectedPhoto = value;
      }
    }
    public string Name
    {
      get
      {
        return (this.name);
      }
      set
      {
        base.SetProperty(ref this.name, value);
      }
    }
    public string Email
    {
      get
      {
        return (this.email);
      }
      set
      {
        base.SetProperty(ref this.email, value);
      }
    }
    public string Phone
    {
      get
      {
        return (this.phone);
      }
      set
      {
        base.SetProperty(ref this.phone, value);
      }
    }
    public ICommand RecogniseCommand
    {
      get
      {
        return (this.recogniseCommand);
      }
    }
    async void OnRecognise()
    {
      this.IsIdle = false;

      try
      {
        if (this.selectedPhoto != null)
        {
        }
      }
      finally
      {
        this.IsIdle = true;
      }
    }
    public bool IsIdle
    {
      get
      {
        return (this.idle);
      }
      private set
      {
        this.recogniseCommand.SetEnabled(value);
        base.SetProperty(ref this.idle, value);
      }
    }
    bool idle;
    SimpleCommand recogniseCommand;
    FlickrPhotoResult selectedPhoto;
    string phone;
    string email;
    string name;
  }
}

and then I bound up a little UI to the Items, Phone, Email, Name, RecogniseCommand members here;

<Page xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
      xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
      xmlns:local="using:App253"
      xmlns:img="Windows.UI.Xaml.Media.Imaging"
      xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
      xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
      xmlns:Converters="using:App253.Converters"
      x:Class="App253.MainPage"
      mc:Ignorable="d"
      Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
  <Page.BottomAppBar>
    <CommandBar IsOpen="True">
      <AppBarButton Icon="ZoomIn"
                    Label="recognise"
                    Command="{Binding RecogniseCommand}" />
    </CommandBar>
  </Page.BottomAppBar>
  <Grid x:Name="LayoutRoot">

    <Grid.RowDefinitions>
      <RowDefinition Height="Auto" />
      <RowDefinition Height="*" />
    </Grid.RowDefinitions>

    <StackPanel Grid.Row="0"
                Margin="19,0,0,0">
      <TextBlock Text="OCR TEST APP"
                 Style="{ThemeResource TitleTextBlockStyle}"
                 Margin="0,12,0,0" />
      <TextBlock Text="cards"
                 Margin="0,-6.5,0,26.5"
                 Style="{ThemeResource HeaderTextBlockStyle}"
                 CharacterSpacing="{ThemeResource PivotHeaderItemCharacterSpacing}" />
    </StackPanel>
    <Grid Grid.Row="1"
          x:Name="ContentRoot"
          Margin="19,9.5,19,0">
      <Grid.RowDefinitions>
        <RowDefinition />
        <RowDefinition Height="Auto" />
      </Grid.RowDefinitions>
      <local:IncrementalFlipView ItemsSource="{Binding Items}"
                                 SelectedValue="{Binding SelectedPhoto, Mode=TwoWay}"
                                 IsEnabled="{Binding IsIdle}"
                                 x:Name="flipView">
        <local:IncrementalFlipView.ItemTemplate>
          <DataTemplate>
            <!-- What I want here is for the Image to size itself and for the Canvas
                 to be the same size as the Image no matter what the image does
            -->
            <Grid VerticalAlignment="Center"
                  HorizontalAlignment="Center"
                  Width="Auto"
                  Height="Auto">
              <Image Source="{Binding ImageUrl}"
                     Stretch="Uniform">
              </Image>
              <Canvas HorizontalAlignment="Stretch"
                      VerticalAlignment="Stretch"
                      RenderTransformOrigin="0.5,0.5">
                <Canvas.RenderTransform>
                  <RotateTransform Angle="0" />
                </Canvas.RenderTransform>
              </Canvas>
              <TextBlock Text="{Binding Title}" />
            </Grid>
          </DataTemplate>
        </local:IncrementalFlipView.ItemTemplate>
      </local:IncrementalFlipView>
      <StackPanel Orientation="Vertical"
                  Grid.Row="1"
                  VerticalAlignment="Top"
                  Margin="0,0,0,40">
        <TextBlock Text="name"
                   Style="{StaticResource TitleTextBlockStyle}" />
        <TextBlock Text="{Binding Name, FallbackValue=unset, TargetNullValue=unset}"
                   Style="{StaticResource ListViewItemContentTextBlockStyle}"
                   Margin="10,0,0,0" />
        <TextBlock Text="email"
                   Style="{StaticResource TitleTextBlockStyle}" />
        <TextBlock Text="{Binding Email, FallbackValue=unset, TargetNullValue=unset}"
                   Style="{StaticResource ListViewItemContentTextBlockStyle}"
                   Margin="10,0,0,0" />
        <TextBlock Text="phone"
                   Style="{StaticResource TitleTextBlockStyle}" />
        <TextBlock Text="{Binding Phone, FallbackValue=unset, TargetNullValue=unset}"
                   Style="{StaticResource ListViewItemContentTextBlockStyle}"
                   Margin="10,0,0,0" />
      </StackPanel>
    </Grid>
  </Grid>
</Page>

and so what I’ve essentially got here is the IncrementalFlipView control which is displaying Items and it’s displaying an Image (ImageUrl bound to Source) a TextBlock (Text bound to Title) and a Canvas for each item. I’ve then got three simple text blocks and a button bound to simple properties which represent any OCR recognised email address, phone number and name and also an ICommand which is intended to kick off the process of OCR recognition on the current image.

I’ll be returning to that Canvas in a moment but the essence of it is to provide a drawing surface on top of the image so that I can ultimately highlight the recognised text by drawing rectangles onto that Canvas. It’s going to be important to me that the Canvas stays the same size as the underlying Image and so I’ve tried to make sure that it does here.

I spent a little time trying to decide whether to put the Canvas into the item template or whether to keep it out of the IncrementalFlipView altogether and, for the moment, I’ve chosen to include it in the item template here.

I wrote a tiny bit of code-behind to wire up the view model;

namespace App253
{
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += OnLoaded;
    }
    void OnLoaded(object sender, RoutedEventArgs e)
    {
      this.viewModel = new ViewModel();
      this.DataContext = this.viewModel;
    }
    ViewModel viewModel;
  }
}

and that gives me a little FlipView experience that lets me flip my way through photos of “business cards” from flickR;

123

Adding some OCR

So far, so good but this post was supposed to be about using the OCR library so I added a reference to the “Microsoft OCR” library (Microsoft.Windows.Ocr if you want the quick way of finding it rather than fumbling through the search function of the Nuget dialog in Visual Studio).

I then started to write some code behind the OnRegognise button above. To be honest, that code was pretty easy to write. The only challenge I really faced was around how to get hold of the pixels of the image which is a challenge I’ve had many times in XAML based applications and it always feels a bit ironic because the image is actually already on the screen in front of me but grabbing the bytes of that image in a decoded form isn’t necessarily “so easy”.

Rather than get involved in WriteableBitmaps and so on, I went down what seemed like an easier route and simply downloaded the image again. I added a little private class to my ViewModel class to represent the details of that image;

    class CurrentImageInfo
    {
      public uint Width { get; set; }
      public uint Height { get; set; }
      public byte[] Pixels { get; set; }
    }

and a couple of member variables;;

    CurrentImageInfo currentImageInfo;
    HttpClient httpClient;
    OcrEngine ocrEngine;

that I can initialise in my modified constructor – you’ll notice that I’m explicitly requesting English language processing here from the OCR library;

    public ViewModel()
    {
      this.Items = new FlickrBusinessCardPhotoResultCollection();
      this.recogniseCommand = new SimpleCommand(this.OnRecognise);
      this.IsIdle = true;
      this.httpClient = new HttpClient();
      this.ocrEngine = new OcrEngine(OcrLanguage.English);
    }

and then I can start to implement that OnRecognise function in my ViewModel which performs the OCR recognition;

    async void OnRecognise()
    {
      this.IsIdle = false;

      try
      {
        if (this.selectedPhoto != null)
        {
          // I've deliberately avoided downloading any image bits until this point.
          // We (probably) have the image on the screen. However, that's hidden
          // inside an Image control which I'm letting do the URL->Image work
          // for me (as well as any caching it decides to do).
          // But, now, I actually need the bytes of the image itself and I can't
          // just grab them out of the image control so we go back to the web.
          try
          {
            await this.DownloadImageBitsAsync();
            OcrResult ocrResult = await this.RunOcrAsync();

            if (ocrResult != null)
            {
            }
          }
          catch
          {
            // TBD...
          }
        }
      }
      finally
      {
        this.IsIdle = true;
      }
    }

with the function which downloads the image bits looking like;

    async Task DownloadImageBitsAsync()
    {
      this.currentImageInfo = null;

      // TODO: do I really have to do all this to get the pixels, width, height or
      // can I shortcut it somehow?
      using (var inputStream =
        await this.httpClient.GetInputStreamAsync(new Uri(this.SelectedPhoto.ImageUrl)))
      {
        using (InMemoryRandomAccessStream memoryStream = new InMemoryRandomAccessStream())
        {
          await RandomAccessStream.CopyAsync(inputStream, memoryStream);
          memoryStream.Seek(0);
          BitmapDecoder decoder = await BitmapDecoder.CreateAsync(memoryStream);
          PixelDataProvider provider = await decoder.GetPixelDataAsync();
          this.currentImageInfo = new CurrentImageInfo()
          {
            Width = decoder.PixelWidth,
            Height = decoder.PixelHeight,
            Pixels = provider.DetachPixelData()
          };
        }
      }
    }

and the function which runs the OCR detection being;

    async Task<OcrResult> RunOcrAsync()
    {
      var results = await this.ocrEngine.RecognizeAsync(
        this.currentImageInfo.Height,
        this.currentImageInfo.Width,
        this.currentImageInfo.Pixels);

      return (results);
    }

which you’d spot as being “pretty simple” – it’s essentially just passes the details of the newly download image (in terms of width, height, pixels) into the algorithm and asks it to do recognition.

With that, I’ve got OCR recognition results. Easy!

Matching Results and Getting a Canvas to Draw to

In terms of displaying the results – I wrote a little class to help out in terms of trying to identify phone numbers, email addresses and simple names once they were recognised by OCR. That looks as per below and is just using regular expressions;

namespace App253
{
  using System.Collections.Generic;
  using System.Text.RegularExpressions;

  enum RecognitionType
  {
    Other,
    Email,
    Phone,
    Name
  }
  static class CardTextRecogniser
  {
    public static RecognitionType Recognise(string businessCardText)
    {
      RecognitionType type = RecognitionType.Other;

      foreach (var expression in expressions)
	    {
        if (Regex.IsMatch(businessCardText, expression.Value))
        {
          type = expression.Key;
          break;
        }		 
	    }
      return(type);
    }
    static Dictionary<RecognitionType, string> expressions = 
      new Dictionary<RecognitionType,string>()
      {
        // regex taken from MSDN: http://msdn.microsoft.com/en-us/library/01escwtf(v=vs.110).aspx
        {
          RecognitionType.Email,
            @"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" +
            @"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$"
        },
        // regex taken from regex lib: http://regexlib.com/REDetails.aspx?regexp_id=296 
        {
          RecognitionType.Phone,
          @"^(\+[1-9][0-9]*(\([0-9]*\)|-[0-9]*-))?[0]?[1-9][0-9\- ]*$"
        },
        // regex taken from regex lib: http://regexlib.com/REDetails.aspx?regexp_id=247
        {
          RecognitionType.Name,
          @"^([ \u00c0-\u01ffa-zA-Z'])+$"
        },
      };
  }
}

and so there’s nothing at all clever going on there and then I set about trying to make sure that my code could draw onto the Canvas which is sitting in the ItemTemplate of my IncrementalFlipView and which is on top of the underlying Image.

In doing that, I need to be conscious of the fact that the image that’s displayed on the screen may have sized itself differently from the image that I’ve explicitly downloaded from the web and have passed into the OCR recognition engine.

Because of the way in which I’ve authored my ItemTemplate, the Canvas that I’m going to draw on will be the same size as the underlying Image but I need to map co-ordinates between the OCR image and the Canvas if I’m going to draw accurately onto that Canvas in order to represent the places where the OCR engine is telling me that it has found text in the image.

With that in mind, I wrote a little co-ordinate mapping class;

namespace App253
{
  using Windows.Foundation;

  class CoordinateMapper
  {
    public CoordinateMapper(Rect sourceSpace,
      Rect destSpace)
    {
      this.sourceSpace = sourceSpace;
      this.destSpace = destSpace;
    }
    public Point MapPoint(Point source)
    {
      double x = (source.X - sourceSpace.Left) / sourceSpace.Width;
      x = destSpace.Left + (x * (destSpace.Width));

      double y = (source.Y - sourceSpace.Top) / (sourceSpace.Height);
      y = destSpace.Left + (y * (destSpace.Height));

      return(new Point() { X = x, Y = y});
    }
    public double MapWidth(double width)
    {
      return(width / sourceSpace.Width * destSpace.Width);
    }
    public double MapHeight(double height)
    {
      return (height / sourceSpace.Height * destSpace.Height);
    }
    Rect sourceSpace;
    Rect destSpace;
  }
}

and then hit the common problem of having written a ViewModel which is all data-bound but suddenly needs access to something in the View (the Canvas) and rather than go to town in building abstractions here or fancy binding converters, I simply passed passed the ItemContainerGenerator from the FlipView from my code behind into constructor of the ViewModel;

namespace App253
{
  using Windows.UI.Xaml;
  using Windows.UI.Xaml.Controls;

  public sealed partial class MainPage : Page
  {
    public MainPage()
    {
      this.InitializeComponent();
      this.Loaded += OnLoaded;
    }
    void OnLoaded(object sender, RoutedEventArgs e)
    {
      // modified...
      this.viewModel = new ViewModel(this.flipView.ItemContainerGenerator) ;
      this.DataContext = this.viewModel;
    }
    ViewModel viewModel;
  }
}

and then added a few more member variables to the ViewModel;

    static SolidColorBrush redBrush = new SolidColorBrush(Colors.Red);
    Canvas drawCanvas;
    CurrentImageInfo currentImageInfo;

and modified the constructor to take the ItemContainerGenerator and store it;

    public ViewModel(ItemContainerGenerator generator)
    {
      // modified
      this.itemContainerGenerator = generator;
      this.Items = new FlickrBusinessCardPhotoResultCollection();
      this.recogniseCommand = new SimpleCommand(this.OnRecognise);
      this.IsIdle = true;
      this.httpClient = new HttpClient();
      this.ocrEngine = new OcrEngine(OcrLanguage.English);
    }

and then whenever the current photo changes in the IncrementalFlipView I attempt to grab hold of the Canvas from within the ItemTemplate;

    public FlickrPhotoResult SelectedPhoto
    {
      get
      {
        return (this.selectedPhoto);
      }
      set
      {
        this.selectedPhoto = value;

        // modified
        this.InitialiseDrawCanvas();
      }
    }

with that method looking like;

    void InitialiseDrawCanvas()
    {
      this.drawCanvas = null;

      if (this.SelectedPhoto != null)
      {
        FlipViewItem fvi = (FlipViewItem)
          this.itemContainerGenerator.ContainerFromItem(this.SelectedPhoto);

        this.drawCanvas = (Canvas)fvi.GetDescendantByType(typeof(Canvas));

        this.drawCanvas.Children.Clear();
        this.Phone = this.Email = this.Name = string.Empty;
      }
    }

and now I’ve got a Canvas to draw things to.

Displaying OCR Results

Finally, I modified the code in my OnRecognise function such that once it has done the OCR recognition it does two things as per below;

async void OnRecognise()
    {
      this.IsIdle = false;

      try
      {
        if (this.selectedPhoto != null)
        {
          // I've deliberately avoided downloading any image bits until this point.
          // We (probably) have the image on the screen. However, that's hidden
          // inside an Image control which I'm letting do the URL->Image work
          // for me (as well as any caching it decides to do).
          // But, now, I actually need the bytes of the image itself and I can't
          // just grab them out of the image control so we go back to the web.
          try
          {
            await this.DownloadImageBitsAsync();
            OcrResult ocrResult = await this.RunOcrAsync();

            if (ocrResult != null)
            {
              // Modified
              this.DrawOcrResults(ocrResult);
              this.ApplyPatternMatching(ocrResult);
            }
          }
          catch
          {
            // TBD...
          }
        }
      }
      finally
      {
        this.IsIdle = true;
      }
    }

with the DrawOcrResults bringing in this code;

    void DrawOcrResults(OcrResult ocrResult)
    {
      this.RepeatForOcrWords(ocrResult,
        (result, word) =>
        {
          Rectangle rectangle = MakeOcrDrawRectangle(ocrResult, word);
          this.drawCanvas.Children.Add(rectangle);
        }
      );
    }
    void RepeatForOcrWords(OcrResult ocrResult,
      Action<OcrResult, OcrWord> repeater)
    {
      if (ocrResult.Lines != null)
      {
        foreach (var line in ocrResult.Lines)
        {
          foreach (var word in line.Words)
          {
            repeater(ocrResult, word);
          }
        }
      }
    }

    Rectangle MakeOcrDrawRectangle(OcrResult ocrResult, OcrWord word)
    {
      // Avoided using CompositeTransform here as I could never quite get my
      // combination of Scale/Rotate/Translate to work right for a given
      // RenderTransformOrigin. Probably my fault but it was easier to
      // just do it myself.
      CoordinateMapper mapper = new CoordinateMapper(
        new Rect(0, 0, this.currentImageInfo.Width, this.currentImageInfo.Height),
        new Rect(0, 0, this.drawCanvas.ActualWidth, this.drawCanvas.ActualHeight));

      Rectangle r = new Rectangle()
      {
        Width = mapper.MapWidth(word.Width),
        Height = mapper.MapHeight(word.Height),
        RenderTransformOrigin = new Point(0.5, 0.5)
      };
      r.Stroke = redBrush;
      r.StrokeThickness = 1;
      Point mappedPoint = mapper.MapPoint(new Point(word.Left, word.Top));
      Canvas.SetLeft(r, mappedPoint.X);
      Canvas.SetTop(r, mappedPoint.Y);

      RotateTransform rotate = this.drawCanvas.RenderTransform as RotateTransform;
      rotate.Angle = 0.0d - ocrResult.TextAngle ?? 0.0d;
      return r;
    }

and perhaps the only slightly unusual bit here would be that code around lines 48-50 which tries to deal with the situation where the OCR engine returns that it has found text which is on an angle and the code here tries to make sure that it draws a rectangle around the “correct” part of the recognised text in the underlying image by applying that rotation to the Canvas that’s being drawn to.

The other part of the code that I added here is functionality to attempt to spot email addresses, phone numbers and names in the recognised text in a fairly simplistic way. That code looks like;

    void ApplyPatternMatching(OcrResult ocrResult)
    {
      this.RepeatForOcrWords(ocrResult,
        (result, word) =>
        {
          switch (CardTextRecogniser.Recognise(word.Text))
          {
            case RecognitionType.Other:
              break;
            case RecognitionType.Email:
              this.Email = word.Text;
              break;
            case RecognitionType.Phone:
              this.Phone = word.Text;
              break;
            case RecognitionType.Name:
              this.Name = word.Text;
              break;
            default:
              break;
          }
        }
      );
    }

and is just some regular expression matching combined with some data-bound text blocks. Naturally, if this code encounters multiple strings that look like names/emails/phone numbers in the recognised text then it’s going to be a case of “last detected text wins” as the code above loops through every result looking for matches.

Wrapping Up – We Have OCR!

There’s quite a bit of code in this post but only a tiny amount of it is anything to do with using the OCR library which is testament to the simple, clean design of the API.

With what I’ve built up above, I can now run the app and have a look at a few items from flickR that look like business cards and I can then press the “recognise” app bar button and see how the OCR engine does as shown in the video at the top of the post.

I’d like to revisit the case where I try and combine the OCR library with more “realtime” capture from the phone’s camera but, for the moment, I’ll share the code that I’ve got here in case anyone wants to have a play with it.