Grabbing a Photo & Calling Azure Vision API from ‘Pure UWP’ Code

Just using a blog post as a pastie – I had cause to write a function today to take a photo from a camera on a UWP device and send it to the Azure Cognitive Service for Vision to ask for ‘tags’ from the image.

The intention was to call the function from a UWP-specific Unity app running on HoloLens (the code does work on HoloLens). It would need camera, microphone and internet client capabilities to work.

It’s very specific to one task but, clearly, could be wrapped up into some class that made it a lot more general-purpose and exercised more pieces of that API but I just wanted somewhere to put the code in case I want it again in the future. This is what I had…

static async Task<Dictionary<string,double>> TakePhotoAnalyzeAzureForTagsAsync(
            string azureVisionKey,
            string azureVisionBaseEndpoint = "")
            var azureVisionApi = "/vision/v1.0/analyze?visualFeatures=Tags";
            Dictionary<String,double> resultDictionary = null;

            // Capture an image from the camera.
            var capture = new MediaCapture();

            await capture.InitializeAsync();

            var stream = new InMemoryRandomAccessStream();

            await capture.CapturePhotoToStreamAsync(
                ImageEncodingProperties.CreateJpeg(), stream);


            // Now send that off to Azure for processing
            var httpClient = new HttpClient();
            httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", azureVisionKey);

            var streamContent = new HttpStreamContent(stream);
            streamContent.Headers["Content-Type"] = "application/octet-stream";

            var response = await httpClient.PostAsync(
                new Uri(azureVisionBaseEndpoint + azureVisionApi),

            if (response.IsSuccessStatusCode)
                var responseString = await response.Content.ReadAsStringAsync();
                JsonObject jsonObject;

                if (JsonObject.TryParse(responseString, out jsonObject))
                    resultDictionary = 
                            tag => tag.GetObject().GetNamedString("name"),
                            tag => tag.GetObject().GetNamedNumber("confidence"));
            return (resultDictionary);

As a related aside, I also revisited this blog post recently as part of looking back at Cognitive Services and its facial APIs and I realised that code needed changing a bit to make it work and so I did that work and dropped it onto github over here;

Example of using both local UWP face detection and Cognitive Service face detection in a UWP app

Experiments with Shared Holograms and Azure Blob Storage/UDP Multicasting (Part 7)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

A follow-up to my previous post around experiments with shared holograms using Azure blob storage and UDP multicasting techniques.

At the end of the previous post, I said that I might return and make a slightly better ‘test scene’ for the Unity project this post is my write up of my attempts to do that.

What’s in the New Test Scene?

I found a model of a house on;


and I made the test scene about visualising that model in a consistent place on multiple devices with the ability to rotate, scale and move it such that the multiple devices keep a consistent view.

What I built is pretty simple and the essential steps involved in the scene are;

  • The app runs and waits for the underlying library to tell it whether there are already other devices on the same network or not. During this period, it displays a ‘waiting screen’ for up to 5 seconds if it doesn’t receive notification that there are other devices on the network.


  • If the app determines that no-other devices are on the network then it pops up a model of a house gaze-locked to the device so that the user can potentially move it around and say ‘done’ to place it.


  • Once positioned, the app replaces the model displayed by using the APIs detailed in the previous posts to create a shared hologram which is exactly the same as the house and in the same position etc. At this point, its creation will be multicast around the network and the blob representing its world anchor will be uploaded to Azure.
  • If the app determines that there are other devices on the network at start-up time then it will inform the user of this;


  • and it will stop the user from positioning the model while waiting to bring the position data (world anchor) from Azure. The same thing should happen in the race condition where multiple users start the app at the same time and then one of them becomes the first to actually position the model.


  • Once the model has been positioned on the local device (in whichever way) it enters into a mode which allows for voice commands to be used to enter ‘rotate’, ‘scale’ and ‘move’ modes to move it around;


  • those transformations are then multicast to other devices on the network such that they all display the same model of a house in the same place.

and that’s pretty much it Smile

How’s the Test Scene Structured?

I already had a test scene within the Unity project that I’d published to github and so I just altered it rather than starting from scratch.

It’s very simple – the scene starts with the main camera parenting both a text object (to give a very poor Heads-Up-Display) and the model of the house (to give a very poor gaze-locked positioning system) as below;


there is then one object called ScriptHolder which has an instance of the Shared Hologram Controller component (and its dependency) that I discussed in the previous posts;


I’ve ommitted the details of my own Azure configuration so that would need to be filled in to specify the storage details and I’ve also told the script that I want to synchronise transforms on a fairly high frequency which, realistically, I think I could drop down a little.

Beyond that, I also have a script here called Main Script which contains the logic for the scene with the positive part of it being that there’s not too much of it;

using SharedHolograms;
using System;
using System.Linq;
using UnityEngine;
using UnityEngine.Windows.Speech;

public class MainScript : MonoBehaviour, ICreateGameObjects
    // Text to display output messages on
    public TextMesh StatusDisplayTextMesh;

    // GameObject to use as a marker to position the model (i.e. the house)
    public GameObject PositionalModel;

    // Implementation of ICreateGameObject - because we are not creating a Unity primitive
    // I've implemented this here and 'plugged it in' but our creation is very simple in
    // that we duplicate the object that we're using as the PositionalModel (i.e. the
    // house in my version).
    public void CreateGameObject(string gameObjectSpecifier, Action<GameObject> callback)
        // Right now, we know how to create one type of thing and we do it in the most
        // obvious way but we could do it any which way we like and even get some other
        // componentry to do it for us.
        if (gameObjectSpecifier == "house")
            var gameObject = GameObject.Instantiate(this.PositionalModel);
            // Sorry, only know about "house" right now.
    void Start()
        // Set up our keyword handling. Originally, I imagined more than one keyword but
        // we ended up just with "Done" here.
        var keywords = new[]
            new { Keyword = "done", Handler = (Action)this.OnDoneKeyword }
        this.keywordRecognizer = new KeywordRecognizer(keywords.Select(k => k.Keyword).ToArray());

        this.keywordRecognizer.OnPhraseRecognized += (e) =>
            var understood = false;

            if ((e.confidence == ConfidenceLevel.High) ||
                (e.confidence == ConfidenceLevel.Medium))
                var handler = keywords.FirstOrDefault(k => k.Keyword == e.text.ToLower());

                if (handler != null)
                    understood = true;
            if (!understood)
                this.SetStatusDisplayText("I might have missed what you said...");
        // We need to know when various things happen with the shared holograms controller.
        SharedHologramsController.Instance.SceneReady += OnSceneReady;
        SharedHologramsController.Instance.Creator.BusyStatusChanged += OnBusyStatusChanged;
        SharedHologramsController.Instance.Creator.HologramCreatedRemotely += OnRemoteHologramCreated;
        SharedHologramsController.Instance.Creator.GameObjectCreator = this;

        // Wait to see whether we should make the positional model active or not.
    void OnDoneKeyword()
        if (!this.busy)

            this.SetStatusDisplayText("working, please wait...");

            if (this.PositionalModel.activeInHierarchy)
                // Get rid of the placeholder.

                // Create the shared hologram in the same place as the placeholder.
                    gameObject =>
                        this.SetStatusDisplayText("object created and shared");
                        this.houseGameObject = gameObject;
    void OnBusyStatusChanged(object sender, BusyStatusChangedEventArgs e)
        this.busy = e.Busy;

        if (e.Busy)
            this.SetStatusDisplayText("working, please wait...");
    void OnSceneReady(object sender, SceneReadyEventArgs e)
        // Are there other devices around or are we starting alone?
        if (e.Status == SceneReadyStatus.OtherDevicesInScene)
            this.SetStatusDisplayText("detected other devices, requesting sync...");
            this.SetStatusDisplayText("detected no other devices...");

            // We need this user to position the model so switch it on
            this.SetStatusDisplayText("walk to position the house then say 'done'");

            // Wait for the 'done' keyword.
    void OnRemoteHologramCreated(object sender, HologramEventArgs e)
        // Someone has beaten this user to positioning the model
        // turn off the model.


        // Stop waiting for the 'done' keyword (if we are)

        this.houseGameObject = GameObject.Find(e.ObjectId.ToString());

        // Make sure we can manipulate what the other user has placed.
    void AddManipulations()
        this.SetStatusDisplayText("say 'move', 'rotate' or 'scale'");

        // The Manipulations script contains a keyword recognizer for 'move', 'rotate', 'scale'
        // and some basic logic to wire those to hand manipulations
    void SetStatusDisplayText(string text)
        if (this.StatusDisplayTextMesh != null)
            this.StatusDisplayTextMesh.text = text;
    KeywordRecognizer keywordRecognizer;
    GameObject houseGameObject;
    bool busy;

if someone (anyone! please! please! Winking smile) had been following the previous set of blog scripts closely they might have noticed that in order to write that code I had to change my existing code to at least;

  • Fire an event when the device joins the network such that code can be notified of whether the messaging layer has seen other devices on the network or not.
  • Fire events when other devices on the network create/delete holograms causing them to be imported and created by the local device.
  • Fire an event as/when the underlying code is ‘busy’ doing some downloading or uploading or similar.

Having tried to implement this scene it was immediately obvious to me that this was needed but it wasn’t so obvious to me that I implemented those pieces beforehand and so that was a useful output of writing this test scene.

The other thing that’s used in the scene is a MonoBehaviour named Manipulations. This is a version of a script that I’ve used in a few places in the past and it’s a very cheap and cheerful way to provide rotate/scale/move behaviour on a focused object in response to voice commands and hand manipulations.

I placed this script and the other script that is specific to the test scene in the ‘Scene Specific’ folder;


and the Manipulations script has a dependency on the 3 materials in the Resources folder that it uses for drawing different coloured boxed around an object while it is being rotated/scaled/moved;


and that’s pretty much it.

One thing that I’d note is that when I’d used this Manipulations scripts before it was always in projects that were making use of the Mixed Reality Toolkit for Unity and, consequently, I had written the code to depend on some items of the toolkit – specifically around the IManipulationHandler interface and the IInputClickHandler interface.

I don’t currently have any use of the toolkit in this test project and it felt like massive overkill to add it just to enable this one script and so I reworked the script to move it away from having a dependency on the toolkit and I was very pleased to find that this was only a small piece of work – i.e. the toolkit had mostly done a bit of wrapping on the raw Unity APIs and so it wasn’t difficult to unpick that dependency here.

Wrapping Up

I don’t intend to write any more posts in this mini-series around using Azure blob storage and UDP multicasting to enable shared holograms, I think I’ve perhaps gone far enough Smile

The code is all up on github should anyone want to explore it, try it, take some pieces for their own means.

I’m always open to feedback so feel free to do that if you want to drop me a line and be aware that I’ve only tested this code in a limited way as I wrote it all on a single HoloLens device using the (supplied) test programs to simulate responses from a second device but I’m ‘reasonably’ happy that it’s doing sensible things.

Experiments with Shared Holograms and Azure Blob Storage/UDP Multicasting (Part 4)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Following up on my previous post, one of the “to do” items was to allow a hologram to move, rotate, scale and have that change in its transform carried across the network to other devices.

I’m going to assume for this post that the hologram is not going to want to move so much as to cause it to change its parent and move to another world-anchored parent as that’d require more work right now so I’m thinking of small movements relative to the parent object that are reflected by changes in the local position, rotation and scale properties of a transform.

I added a settable property to my SharedHologramsController to flag whether the code should attempt to synchronise transforms on the shared objects that it has created;


and an interval at which to attempt synchronisation.

Naturally, a lot of GameObjects being shared over the network with a high interval is going to equate to a lot of network messages so there’s a trade-off to be made there.

I then added a MonoBehaviour derived script to actually poll a GameObject and watch for changes to its Transform.LocalPosition, LocalRotation and LocalScale properties before dispatching them over the network using the TransformMessage message that already existed in the project.

That script is called TransformSynchronizer and it’s fairly simple (and perhaps too simple! Smile).

That’s all the changes to enable that sort of behaviour but I’d really like to test it properly with multiple devices as so far I’ve only been able to use the approach that I mentioned in an earlier post of having the HoloLens multicast messages to a console application which then sends them back again as a way of simulating a second device.

To start to test this out, I modified the test scene in the Unity project to add a behaviour such that the first tap on a cube will start it slowly rotating whereas the second tap on a cube will now delete it. That let me test changes to the cube’s rotation changing and I need to add some more code to test out changes to position and scale but rotation seems to work “reasonably”.

All of that code is contained in the TestScript in the Unity project.

I’ll make subsequent updates if I find that changes to local position and local scale don’t behave in a suitable way.