This post follows on from this previous post in many ways;
but it also makes reference to this post around Optical Character Recognition in Windows Phone 8.1;
I was pleased to find the other day that the (entirely local, no cloud needed) optical character recognition APIs had made it into the Universal Windows Platform. They live in;
In the post that I wrote about face detection, I set up a little framework which made it ‘easy’ to display preview video from a web cam and to run a separate processing loop which did facial recognition against the frames of video.
Having got that code and having seen the OCR APIs previously, I wondered how hard it’d be to adapt that sample to do OCR and so I set about that and, in order to keep things easier for myself, I decided to draw with a Canvas rather than a Win2D CanvasControl and so my UI simply becomes;
<Page x:Class="App183.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:App183" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d"> <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <CaptureElement x:Name="captureElement" Stretch="Fill"/> <Canvas x:Name="drawCanvas" HorizontalAlignment="Stretch" VerticalAlignment="Stretch"/> </Grid> </Page>
I got rid of the FaceDetectionFrameProcessor that I’d written in my previous post and implemented a new processor based around the OcrEngine…
using System; using System.Threading.Tasks; using Windows.Graphics.Imaging; using Windows.Media.Capture; using Windows.Media.MediaProperties; using Windows.Media.Ocr; class OcrDetectionFrameProcessor : PreviewFrameProcessor<OcrResult> { public OcrDetectionFrameProcessor(MediaCapture capture, VideoEncodingProperties videoProperties) : base(capture, videoProperties) { } protected override BitmapPixelFormat BitmapFormat { get { // The OCR Engine supports anything convertible to Gray8 so I'm asking for // gray 8 in the first place. return (BitmapPixelFormat.Gray8); } } protected override async Task InitialiseForProcessingLoopAsync() { this.ocrEngine = OcrEngine.TryCreateFromUserProfileLanguages(); } protected override async Task<OcrResult> ProcessBitmapAsync(SoftwareBitmap bitmap) { var results = await this.ocrEngine.RecognizeAsync(bitmap); return (results); } OcrEngine ocrEngine; }
with that in place, I took away the code that the previous post had for doing face detection and for drawing rectangles with Win2D and I replace my main code behind with this code which is using some of the pieces from the previous post but which has simplified the drawing code to make use of a Canvas;
using System; using System.Linq; using System.Threading; using Windows.Foundation; using Windows.Media.MediaProperties; using Windows.Media.Ocr; using Windows.UI.Core; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; public sealed partial class MainPage : Page { public MainPage() { this.InitializeComponent(); this.Loaded += OnLoaded; } async void OnLoaded(object sender, RoutedEventArgs args) { // NB: this example never sets this but it could. CancellationTokenSource cancellationTokenSource = new CancellationTokenSource(); this.cameraPreviewManager = new CameraPreviewManager(this.captureElement); this.videoProperties = await this.cameraPreviewManager.StartPreviewToCaptureElementAsync( vcd => vcd.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Back); // this sample is highly dependent on focus control and I find focus control // to be quite a mystery on Windows devices with cameras. many devices seem to say // that they can't do auto focus while other devices say they can do it but it // doesn't seem to work. I'm confused 😦 this.cameraPreviewManager.MediaCapture.VideoDeviceController.Focus.TrySetAuto(true); this.ocrProcessor = new OcrDetectionFrameProcessor( this.cameraPreviewManager.MediaCapture, videoProperties); this.ocrProcessor.FrameProcessed += (s, e) => { this.Dispatcher.RunAsync( CoreDispatcherPriority.Normal, () => { this.DrawOcrResults(e.Results); }); }; // NB: we provide no way to cancel this - we just run forever. await this.ocrProcessor.RunFrameProcessingLoopAsync( cancellationTokenSource.Token); } void DrawOcrResults(OcrResult ocrResult) { if ((ocrResult == null) || (ocrResult.Lines == null) || (ocrResult.Lines.Count == 0)) { this.drawCanvas.Children.Clear(); } else { var words = ocrResult.Lines.SelectMany(l => l.Words).ToList(); // draw any words that we have recognised, doing our best to // make use of rectangles that are already on the canvas. for (int i = 0; i < words.Count; i++) { var word = words[i]; DisplayBox drawBox = CreateOrReuseDiplayBox(i); var scaledBox = this.ScaleBoxToCanvas(word.BoundingRect); Canvas.SetLeft(drawBox, scaledBox.Left); Canvas.SetTop(drawBox, scaledBox.Top); drawBox.Width = scaledBox.Width; drawBox.Height = scaledBox.Height; drawBox.Text = word.Text; this.rotateTransform.Angle = ocrResult.TextAngle ?? 0.0d; } // Get rid of any rectangles that we have which no longer represent words // that we have recognised. for (int i = this.drawCanvas.Children.Count - 1; i >= words.Count; i--) { this.drawCanvas.Children.RemoveAt(i); } } } Rect ScaleBoxToCanvas(Rect boundingRect) { double x = (boundingRect.X / (double)this.videoProperties.Width) * this.drawCanvas.ActualWidth; double y = (boundingRect.Y / (double)this.videoProperties.Height) * this.drawCanvas.ActualHeight; double width = (boundingRect.Width / (double)this.videoProperties.Width) * this.drawCanvas.ActualWidth; double height = (boundingRect.Height / (double)this.videoProperties.Height) * this.drawCanvas.ActualHeight; return (new Rect(x, y, width, height)); } DisplayBox CreateOrReuseDiplayBox(int index) { DisplayBox box = null; if (index < this.drawCanvas.Children.Count) { box = this.drawCanvas.Children[index] as DisplayBox; } else { box = new DisplayBox(); this.drawCanvas.Children.Add(box); } return (box); } OcrDetectionFrameProcessor ocrProcessor; CameraPreviewManager cameraPreviewManager; VideoEncodingProperties videoProperties; }
and the DisplayBox being used here is just a user control with UI;
<UserControl x:Class="App183.DisplayBox" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:App183" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d" d:DesignHeight="300" d:DesignWidth="400"> <Grid Background="#55000000"> <Rectangle Stroke="Red" StrokeThickness="1" /> <Viewbox Stretch="Fill"> <TextBlock x:Name="txtBlock" /> </Viewbox> </Grid> </UserControl>
and code;
using Windows.UI.Xaml.Controls; public sealed partial class DisplayBox : UserControl { public DisplayBox() { this.InitializeComponent(); } public string Text { get { return (this.txtBlock.Text); } set { this.txtBlock.Text = value; } } }
and that works reasonably well in that I can point my camera at some text and have it detected and drawn in ‘real enough’ time as I’ve tried to demonstrate in the screen capture below by showing the front facing camera on my Surface Pro 3 a copy of a cookbook!
and the code is here for download if you want it – remember to watch out for focus settings on the camera.