Experimenting with Research Mode and Sensor Streams on HoloLens Redstone 4 Preview

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Previews, Research and Experiments

I recently installed the Redstone 4 Preview onto a HoloLens as documented here;

HoloLens RS4 Preview

and one of the many things that interested me around what was present in the preview was the piece about ‘research mode’ which (from the docs);

“Allows developers to access key HoloLens sensors when building academic and industrial applications to test new ideas in the fields of computer vision and robotics”

which then detail the sensors as;

  • “The four environment tracking cameras used by the system for spatial map building and head-tracking.

  • Two versions of the depth mapping camera data – one for high-frequency (30 fps) near-depth sensing, commonly used in hand tracking, and the other for lower-frequency (1 fps) far-depth sensing, currently used by spatial mapping.

  • Two versions of an IR-reflectivity stream, used to compute depth, but valuable in its own right as these images are illuminated from the HoloLens and reasonably unaffected by ambient light.”

So, it sounds like there’s a possibility of 8 streams of data there and developers have been asking about access to these streams for some time as in this forum question;

Will we have access to the the depth sensors, IR cameras, and RGB cameras data streams?

and prior to the RS4 preview the answer was “not possible” but it looks like the preview has some experimental support for getting access to these streams.

That said, in order to switch this on a developer has to (from the docs);

“First, ensure “Use developer features” and “Enable Device Portal” are set to On in Settings > Update & Security > For developers on HoloLens. Next, on a desktop PC, use Device Portal to access your HoloLens through a web browser, expand System, select Research mode, and check the box next to “Allow access to sensor streams.” Reboot your HoloLens for the settings to take effect.


Note: Apps built using Research mode cannot be submitted to the Microsoft Store.”

and if you visit that device portal and switch this setting to allow “Research Mode” then you’ll notice that it says;

Capture

So the guidance here is pretty strong and says that this setting will damage performance, is not recommended apart from for active research and will mean that an application using it will not be applicable for the Windows Store.

With all of those caveats in mind, I wanted to try this out and see if I could get some data from the device and so I started to write some code.

Before getting there, I want to re-state that the code here is just my own work, likely to be quite rough and experimental and there are official samples coming in this area later in the month so keep your eye on the URL that the device portal points you to;

https://aka.ms/hololensresearchmode

for official updates. Meanwhile, on with my rough work which I’ve actually attempted before…

Previous Attempts at Accessing Sensor Data

I’d had a look at this type of stream access in this post;

InfraredFrameSources–Access to Camera Streams

where I was trying to use UWP media capture pieces (e.g. MediaCapture, MediaFrameReader, MediaFrameSourceGroup etc) in order to get access to sensor streams but I only came away with a media source group called MN34150 which I think represents the built-in webcam on the device and it didn’t surface any depth streams or infrared streams nor streams from the other 4 environment sensing cameras on the device.

That had proven to be a dead-end at the time on the Anniversary Update but I thought that I could use the same classes/techniques for trying again in the light of RS4 Preview…

A New Attempt at Accessing Sensor Streams from a 2D UWP App

I wanted to start fairly small and so I wondered whether I might write an app for HoloLens which would access 1 or 2 of these new streams and send the data from them on some frequency over the network to some other (desktop) app which would display them.

I thought I’d begin with a 2D app as I find the development time quicker there than working in 3D and so I spun up a new XAML based 2D UWP app on SDK version 17125 (I think 17133 may also be out by the time of writing so keep that in mind).

To speed things up a little further, I borrowed some socket code from this previous post;

Windows 10, UWP, HoloLens & A Simple Two-Way Socket Library

That post contained some code where I used Bluetooth LE advertising in order to connect sockets across 2 devices without any need to manually enter (or assume) IP addresses or ports – one device creates a socket and advertises its details over Bluetooth LE and the other device finds the advertisement and (assuming some common network) connects a socket to the address/port combination advertised. In that post, the main class that I wrote was named AutoConnectMessagePipe and I gave it some capability around sending raw byte arrays, strings and serialized objects but for my purposes in this experiment, I have stripped the code back to just send byte arrays back and forth.

In my new app for this post, that code ends up being run at start up time and ends up looking something like this;

            // We are going to advertise our socket details over bluetooth
            this.messagePipe = new AutoConnectMessagePipe(true);

            // We wait for someone else to see them and connect to us
            await this.messagePipe.WaitForConnectionAsync(
                TimeSpan.FromMilliseconds(-1));

Once the call to WaitForConnectionAsync completes, we should have a connected client ready to talk down the socket to our app on HoloLens and receive some media frames from the device.

To use these pieces means that my HoloLens project would need capabilities specified in its application manifest for bluetooth and probably internet (client/server) and private networks. I also figured that it might well need the webcam capability and the spatial perception capability too.

With that added to my manifest, I started to write some code that would let me get access to all the media frame source groups on the device and you can see in the screenshot below that code coming back with the new “Sensor Streaming” media frame source group;

Capture2

and that seemed fine but when I came to code which tried to create a MediaCapture using this source, I hit a bit of a snag – the device was raising a dialog asking for access to the camera but then it was crashing;

piccy

and I figured that it must be that having the spatial perception capability in my app manifest mustn’t be enough to switch on access to these streams and so, perhaps, there was some new capability that allowed access?

I checked out the list of capabilities in the docs;

App capability declarations

and couldn’t find anything there – that doc is really good and partitions capabilities into different groups but it maybe hasn’t been updated yet for the preview and, as far as I know, that set of capabilities maps fairly literally onto the registry key;

image

and so I had a look at the capabilityClass_Restricted key on my RS4 preview machine and compared the contents of the key named MemberCapability to the one on my Fall Creators Update machine and the list looks to contain some new restricted capabilities;

broadFileSystemAccess, deviceIdentityManagement, lpacIME, lpacPackageManagerOperation, perceptionSensorsExperimental, smbios, systemDialog, thumbnailCache, timezone, userManagementSystem, webPlatformMediaExtension

and so I figured that the one that I would need was likely to be perceptionSensorsExperimental and so I added that to my app manifest within the restricted section (as per that earlier doc on how to add restricted capabilities) as below;

  <Capabilities>
    <Capability Name="internetClient" />
    <Capability Name="internetClientServer" />
    <Capability Name="privateNetworkClientServer" />
    <uap2:Capability Name="spatialPerception" />
    <rescap:Capability Name="perceptionSensorsExperimental" />
    <DeviceCapability Name="microphone" />
    <DeviceCapability Name="webcam" />
    <DeviceCapability Name="bluetooth" />
  </Capabilities>
That manifest is probably overkill for what I need here but adding that extra capability allowed my MediaCapture to initialise ok;
cap
and so I can make progress but I wasn’t quite “ready” to write code which would handle all of the available streams and so I decided that I would try and access a single depth stream and a single infrared stream as a starting point and so my code has an array of the stream types that it wants to access;
            var frameSourceKinds = new MediaFrameSourceKind[]
            {
                MediaFrameSourceKind.Depth,
                MediaFrameSourceKind.Infrared
            };

and I wrote this little class;

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Windows.Media.Capture.Frames;

namespace App1
{
    static class MediaSourceFinder
    {
        public static async Task<MediaFrameSourceGroup> FindGroupsWithAllSourceKindsAsync(
            params MediaFrameSourceKind[] sourceKinds)
        {
            var groups = await MediaFrameSourceGroup.FindAllAsync();

            var firstGroupWithAllSourceKinds =
                groups.FirstOrDefault(
                    g => sourceKinds.All(k => g.SourceInfos.Any(si => si.SourceKind == k)));

            return (firstGroupWithAllSourceKinds);
        }
        public static List<string> FindSourceInfosWithMaxFrameRates(
            MediaFrameSourceGroup sourceGroup, params MediaFrameSourceKind[] sourceKinds)
        {
            var listSourceInfos = new List<string>();

            foreach (var kind in sourceKinds)
            {
                var sourceInfos =
                    sourceGroup.SourceInfos.Where(s => s.SourceKind == kind);

                var maxInfo = sourceInfos.OrderByDescending(
                    si => si.VideoProfileMediaDescription.Max(
                        msd => msd.FrameRate * msd.Height * msd.Width)).First();

                listSourceInfos.Add(maxInfo.Id);
            }
            return (listSourceInfos);
        }
    }
}

which provides some limited helpers which let me take that array of MediaFrameSourceKind[] (depth/infrared) and attempt to;

  • find the first MediaFrameSourceGroup which claims that it can do all of the types I’m interested in (i.e. depth + infrared).
  • from that MediaFrameSourceGroup find the media source Ids of the “best” sources for depth, infrared.
    • here, “best” is arbitrarily chosen as the highest multiplier of frame rate * width * height just so that I end up with one depth stream and one IR stream rather than many.

Those bits of code are enough to enable me to instantiate a MediaCapture for the source group;

        // Note2: I've gone with Cpu here rather than Gpu because I ultimately
                // want a byte[] that I can send down a socket. If I go with Gpu then
                // I get an IDirect3DSurface but (AFAIK) there's not much of a way
                // to get to a byte[] from that other than to copy it into a 
                // SoftwareBitmap and then to copy that SoftwareBitmap into a byte[]
                // which I don't really want to do. Hence - Cpu choice here.
                await this.mediaCapture.InitializeAsync(
                    new MediaCaptureInitializationSettings()
                    {
                        SourceGroup = firstSourceGroupWithSourceKinds,
                        MemoryPreference = MediaCaptureMemoryPreference.Cpu,
                        StreamingCaptureMode = StreamingCaptureMode.Video
                    }
                );

and once I have a MediaCapture I can then use it to open MediaFrameReader instances for the sources that I am interested in and I get frame readers for each of the streams.

I initially tried to do this by using MediaCapture.CreateMultiSourceFrameReader in order to have a single reader which gathered all the frames but this seemed to throw exceptions on me and so I switched to using the regular CreateFrameReaderAsync() on each of the sources separately which seemed to work fine for me although it doesn’t have the ability to ‘synchronise’ the frames which the multi frame reader might have.

Once I had readers open on a couple of streams, I quickly realised that they were going to fire back “quite a lot of data” and that simply to handling the FrameArrived event and passing the frame data over the network would eat my WiFi bandwidth.

Specifically, it seemed that I had selected depth streams firing at 30fps with either 8 or 16 bits per pixel at a resolution of 448*450 pixels. That meant that even with just 2 streams I would be trying to copy maybe ~20MB a second over the network which didn’t seem like a great idea.

Based on that, I decided that rather than try to handle every FrameArrived event, I would instead just install a timer which ticked on some interval, attempted to get the latest frame from each of the readers and sent it over the network.

This seemed to work out “ok” although the code I have was put together pretty quickly and so is rough and not very resilient to failure and it lives in this App in the solution;

pic

there is largely just a XAML based UI which displays a frame count of how many IR and how many Depth frames it thinks it has sent over the network and there’s some code behind plus a couple of supporting classes along with a dependency on the code in the SharedCode project which provides the routines for establishing the socket communications along with some common code around manipulating the buffers.

The “UI” ends up being a rather undramatic screen;

20180405_132355_HoloLens

In terms of the buffers, I make no attempt to compress them or anything like that and I simply send them over the network pre-fixed with a header including the size, buffer type (depth/infrared), width, height. I do not attempt to encode them as PNG/JPEG or similar but just leave them in their raw format which for these 2 streams is Gray8 and Gray16.

A Companion 2D Desktop App

On the desktop side, I made a second UWP XAML based app and added it to the solution and gave it a dependency on the SharedCode folder so that it could also use the socket and buffer-access routines.

sketch

This app displays a blank UI with a couple of XAML Images backed by having their Source property set to instances of SoftwareBitmapSource.

On start up, the app waits for a Bluetooth LE advertisement such that it can automatically connect to the socket listening on the HoloLens.

Once connected, the app picks up the frames sent down the wire, interprets them as depth/infrared and turns them into SoftwareBitmap instances in BGRA8 format such that it can update the XAML Images with the new bitmaps by simply using SoftwareBitmapSource.SetBitmapAsync().

There’s not too much going on in this app and it could do with a little more “UI” and some resilience around the socket connection dropping but it seems to fundamentally “work” in that frames come over the network and get displayed Smile

Here’s a quick screenshot – the depth data on the left is (I think) coming from the 30fps near-depth camera and it’s perhaps only just visible here so maybe I need to process it to brighten it up a little for display but I can see what it’s showing on my monitor;

shot

and the IR on the right is much clearer to see.

So, it’s not going to win any UX or implementation awards Winking smile but it seems to “just about work”.

What’s Next?

I’ve only really had a chance to glance at this and take a first-step but I’m pleased that I was able to grab frames so easily. It would be “nice” to put a communication protocol between the 2 apps here such that the desktop app could “ask” for different streams and perhaps at different intervals and it’d also be “nice” to display some of the other streams so perhaps I’ll look into that for subsequent posts and follow up with some modifications.

Where’s the Code?

The code for this post is pretty rough and experimental so please keep that in mind but I shared it here on github;

http://github.com/mtaulty/ExperimentalSensorApps

so feel free to take, explore and fix etc. Smile

Third Experiment with Image Classification on Windows ML from UWP (on HoloLens in Unity)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Following up from this earlier post;

Second Experiment with Image Classification on Windows ML from UWP (on HoloLens)

I’d finished up that post by flagging that what I was doing with a 2D UI felt weird in that I was looking through my HoloLens at a 2D app which was then displaying the contents of the webcam on the HoloLens back to me and while things seemed to work fine, it felt like a hall of mirrors.

Moving the UI to an immersive 3D app built in something like Unity would make this a little easier to try out and that’s what this post is about.

Moving the code as I had it across to Unity hasn’t proved difficult at all.

I spun up a new Unity project and set it up for HoloLens development by setting the typical settings like;

  • Switching the target platform to UWP (I also switched to the .NET backend and its 4.6 support)
  • Switching on support for the Windows Mixed Reality SDK
  • Moving the camera to the origin, changing its clear flags to solid black and changing the near clipping plane to 0.85
  • Switching on the capabilities that let my app access the camera and the microphone

and, from there, I brought the .onnx file with my model in it and placed it as a resource in Unity;

image

and then I brought the code across from the XAML based UWP project in as much as I could, conditionally compiling most of it out with ENABLE_WINMD_SUPPORT constants as most of the code that I’m trying to run here is entirely UWP dependent and isn’t going to run in the Unity Editor and so on.

In terms of code, I ended up with only 2 code files;

image

the dachshund file started life by being generated for me in the first post in this series by the mlgen tool although I did have to alter it to get it to work after it had been generated.

The code uses the underlying LearningModelPreview class which claims to be able to load a model from a storage file and from a stream. Because in this instance inside of Unity I’m going to load the model using Unity’s Resource.Load() mechanism I’m going to end up with a byte[] for the model and so I wanted to feed it through into the LoadModelFromStreamAsync() method but I found this didn’t seem to be implemented yet and so I had to do a minor hack and write the byte array out to a file before feeding it to the LoadModelFromStorageFileAsync() method.

That left this piece of code looking as below;

#if ENABLE_WINMD_SUPPORT
namespace dachshunds.model
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Runtime.InteropServices.WindowsRuntime;
    using System.Threading.Tasks;

    using Windows.AI.MachineLearning.Preview;
    using Windows.Media;
    using Windows.Storage;
    using Windows.Storage.Streams;

    // MIKET: I renamed the auto generated long number class names to be 'Daschund'
    // to make it easier for me as a human to deal with them 🙂
    public sealed class DachshundModelInput
    {
        public VideoFrame data { get; set; }
    }

    public sealed class DachshundModelOutput
    {
        public IList<string> classLabel { get; set; }
        public IDictionary<string, float> loss { get; set; }

        public DachshundModelOutput()
        {
            this.classLabel = new List<string>();
            this.loss = new Dictionary<string, float>();

            // MIKET: I added these 3 lines of code here after spending *quite some time* 🙂
            // Trying to debug why I was getting a binding excption at the point in the
            // code below where the call to LearningModelBindingPreview.Bind is called
            // with the parameters ("loss", output.loss) where output.loss would be
            // an empty Dictionary<string,float>.
            //
            // The exception would be 
            // "The binding is incomplete or does not match the input/output description. (Exception from HRESULT: 0x88900002)"
            // And I couldn't find symbols for Windows.AI.MachineLearning.Preview to debug it.
            // So...this could be wrong but it works for me and the 3 values here correspond
            // to the 3 classifications that my classifier produces.
            //
            this.loss.Add("daschund", float.NaN);
            this.loss.Add("dog", float.NaN);
            this.loss.Add("pony", float.NaN);
        }
    }

    public sealed class DachshundModel
    {
        private LearningModelPreview learningModel;

        public static async Task<DachshundModel> CreateDachshundModel(byte[] bits)
        {
            // Note - there is a method on LearningModelPreview which seems to
            // load from a stream but I got a 'not implemented' exception and
            // hence using a temporary file.
            IStorageFile file = null;
            var fileName = "model.bin";

            try
            {
                file = await ApplicationData.Current.TemporaryFolder.GetFileAsync(
                    fileName);
            }
            catch (FileNotFoundException)
            {
            }
            if (file == null)
            {
                file = await ApplicationData.Current.TemporaryFolder.CreateFileAsync(
                    fileName);

                await FileIO.WriteBytesAsync(file, bits);
            }

            var model = await DachshundModel.CreateDachshundModel((StorageFile)file);

            return (model);
        }
        public static async Task<DachshundModel> CreateDachshundModel(StorageFile file)
        {
            LearningModelPreview learningModel = await LearningModelPreview.LoadModelFromStorageFileAsync(file);
            DachshundModel model = new DachshundModel();
            model.learningModel = learningModel;
            return model;
        }
        public async Task<DachshundModelOutput> EvaluateAsync(DachshundModelInput input) {
            DachshundModelOutput output = new DachshundModelOutput();
            LearningModelBindingPreview binding = new LearningModelBindingPreview(learningModel);
            binding.Bind("data", input.data);
            binding.Bind("classLabel", output.classLabel);

            // MIKET: this generated line caused me trouble. See MIKET comment above.
            binding.Bind("loss", output.loss);

            LearningModelEvaluationResultPreview evalResult = await learningModel.EvaluateAsync(binding, string.Empty);
            return output;
        }
    }
}
#endif // ENABLE_WINMD_SUPPORT

and then I made a few minor modifications to the code which had previously formed my ‘code behind’ in my XAML based app to move it into this MainScript.cs file where it performs pretty much the same function as it did in the XAML based app – getting frames from the webcam, passing them to the model for evaluation and then displaying the results. That code now looks like;

using System;
using System.Linq;
using System.Collections;
using System.Collections.Generic;
using UnityEngine;

#if ENABLE_WINMD_SUPPORT
using System.Threading.Tasks;
using Windows.Devices.Enumeration;
using Windows.Media.Capture;
using Windows.Media.Capture.Frames;
using Windows.Media.Devices;
using Windows.Storage;
using dachshunds.model;
using System.Diagnostics;
using System.Threading;
#endif // ENABLE_WINMD_SUPPORT

public class MainScript : MonoBehaviour
{
    public TextMesh textDisplay;

#if ENABLE_WINMD_SUPPORT
    public MainScript ()
	{
        this.inputData = new DachshundModelInput();
        this.timer = new Stopwatch();
	}
    async void Start()
    {
        await this.LoadModelAsync();

        var device = await this.GetFirstBackPanelVideoCaptureAsync();

        if (device != null)
        {
            await this.CreateMediaCaptureAsync(device);

            await this.CreateMediaFrameReaderAsync();
            await this.frameReader.StartAsync();
        }
    }    
    async Task LoadModelAsync()
    {
        // Get the bits from Unity's resource system :-S
        var modelBits = Resources.Load(DACHSHUND_MODEL_NAME) as TextAsset;

        this.learningModel = await DachshundModel.CreateDachshundModel(
            modelBits.bytes);
    }
    async Task<DeviceInformation> GetFirstBackPanelVideoCaptureAsync()
    {
        var devices = await DeviceInformation.FindAllAsync(
            DeviceClass.VideoCapture);

        var device = devices.FirstOrDefault(
            d => d.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Back);

        return (device);
    }
    async Task CreateMediaFrameReaderAsync()
    {
        var frameSource = this.mediaCapture.FrameSources.Where(
            source => source.Value.Info.SourceKind == MediaFrameSourceKind.Color).First();

        this.frameReader =
            await this.mediaCapture.CreateFrameReaderAsync(frameSource.Value);

        this.frameReader.FrameArrived += OnFrameArrived;
    }

    async Task CreateMediaCaptureAsync(DeviceInformation device)
    {
        this.mediaCapture = new MediaCapture();

        await this.mediaCapture.InitializeAsync(
            new MediaCaptureInitializationSettings()
            {
                VideoDeviceId = device.Id
            }
        );
        // Try and set auto focus but on the Surface Pro 3 I'm running on, this
        // won't work.
        if (this.mediaCapture.VideoDeviceController.FocusControl.Supported)
        {
            await this.mediaCapture.VideoDeviceController.FocusControl.SetPresetAsync(FocusPreset.AutoNormal);
        }
        else
        {
            // Nor this.
            this.mediaCapture.VideoDeviceController.Focus.TrySetAuto(true);
        }
    }

    async void OnFrameArrived(MediaFrameReader sender, MediaFrameArrivedEventArgs args)
    {
        if (Interlocked.CompareExchange(ref this.processingFlag, 1, 0) == 0)
        {
            try
            {
                using (var frame = sender.TryAcquireLatestFrame())
                using (var videoFrame = frame.VideoMediaFrame?.GetVideoFrame())
                {
                    if (videoFrame != null)
                    {
                        // From the description (both visible in Python and through the
                        // properties of the model that I can interrogate with code at
                        // runtime here) my image seems to to be 227 by 227 which is an 
                        // odd size but I'm assuming the underlying pieces do that work
                        // for me.
                        // If you've read the blog post, I took out the conditional
                        // code which attempted to resize the frame as it seemed
                        // unnecessary and confused the issue!
                        this.inputData.data = videoFrame;

                        this.timer.Start();
                        var evalOutput = await this.learningModel.EvaluateAsync(this.inputData);
                        this.timer.Stop();
                        this.frameCount++;

                        await this.ProcessOutputAsync(evalOutput);
                    }
                }
            }
            finally
            {
                Interlocked.Exchange(ref this.processingFlag, 0);
            }
        }
    }
    string BuildOutputString(DachshundModelOutput evalOutput, string key)
    {
        var result = "no";

        if (evalOutput.loss[key] > 0.25f)
        {
            result = $"{evalOutput.loss[key]:N2}";
        }
        return (result);
    }
    async Task ProcessOutputAsync(DachshundModelOutput evalOutput)
    {
        string category = evalOutput.classLabel.FirstOrDefault() ?? "none";
        string dog = $"{BuildOutputString(evalOutput, "dog")}";
        string pony = $"{BuildOutputString(evalOutput, "pony")}";

        // NB: Spelling mistake is built into model!
        string dachshund = $"{BuildOutputString(evalOutput, "daschund")}";
        string averageFrameDuration =
            this.frameCount == 0 ? "n/a" : $"{(this.timer.ElapsedMilliseconds / this.frameCount):N0}";

        UnityEngine.WSA.Application.InvokeOnAppThread(
            () =>
            {
                this.textDisplay.text = 
                    $"dachshund {dachshund} dog {dog} pony {pony}\navg time {averageFrameDuration}";
            },
            false
        );
    }
    DachshundModelInput inputData;
    int processingFlag;
    MediaFrameReader frameReader;
    MediaCapture mediaCapture;
    DachshundModel learningModel;
    Stopwatch timer;
    int frameCount;
    static readonly string DACHSHUND_MODEL_NAME = "dachshunds"; // .bytes file in Unity

#endif // ENABLE_WINMD_SUPPORT
}

while experimenting with this code, it certainly occurred to me that I could move it to more of a “pull” model inside of Unity by trying to grab frames in an Update() method rather than do the work separately and then pushing the results back to the App thread. It also occurred to me that the code is very single threaded and simply drops frames if it is ‘busy’ whereas it could be smarter and process them on some other thread including perhaps a thread from the thread pool. There are lots of possibilities Smile

In terms of displaying the results inside of Unity – I no longer need to display a preview from the webcam because my eyes are already seeing the same thing that the camera sees and so I’m just left with the challenge of displaying some text and so I just added a 3D Text object into the scene and made it accessible via a public field that can be set up in the editor.

image

and the ScriptHolder there is just a place to put my MainScript and pass it this TextMesh to display text in;

image

and that’s pretty much it.

I still see a fairly low processing rate when running on the device and I haven’t yet looked at that but here’s some screenshots of me looking at photos from Bing search on my 2nd monitor while running the app on HoloLens.

In this case the device (on my head) is around 40cm from the 24 inch monitor and I’ve got the Bing search results displaying quite large and the model seems to do a decent job of spotting dachshunds…

image

image

image

and dogs in general (although it has only really been trained on alsatians so it knows that they are dogs but not dachshunds);

image

and for whatever reason that I can’t explain I also trained it on ponies so it’s quite good at spotting those;

image

image

This works pretty well for me Smile I need to revisit and take a look at whether I can improve the processing speed and also the problem that I flagged in my previous post around not being able to run a release build but, otherwise, it feels like progress.

The code is in the same repo as it was before – I just added a Unity project to the repo.

https://github.com/mtaulty/WindowsMLExperiment

Second Experiment with Image Classification on Windows ML from UWP (on HoloLens)

NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.

Following up from this earlier post;

First Experiment with Image Classification on Windows ML from UWP

around Windows ML;

AI Platform for Windows Developers

at the end of that previous post I’d said that I would be really keen to try the code that I’d written on HoloLens but, at the time of that post, the required Windows 10 “Redstone 4” preview wasn’t available for HoloLens.

Things change quickly these days Winking smile and just a few days later there’s a preview of “Redstone 4” available for HoloLens documented here;

HoloLens RS4 Preview

and I followed the instructions there and very quickly had that preview operating system running on my HoloLens.

The first thing that I then wanted to do was to take the code that I’d written for that previous post around WindowsML and try it out on HoloLens even though it was a 2D XAML app rather than a 3D immersive app.

My hope was that it would “just work”. Did it?

No, of course not, it’s software Smile 

I ran the code inside of Visual Studio and immediately got;

crash

Oh dear. But…I suspected that this might be because I had used Windows 10 SDK Preview version 17110 to build this app in the first place and perhaps that wasn’t going to work so well on a device that is now running a 17123.* build number.

So, I went back to the Windows Insider site and downloaded the Preview SDK labelled 10.0.17125.1000 to see if that changed things for me and I retargeted my application in Visual Studio to set its Target build to 17125 and its minimum build to 16299 before doing a complete rebuild and redeploy.

I had to set the minimum build to something below 17123 as that is what the device is now running.

Once again, I got the exact same error and so I set about trying to debug and immediately noticed that my debugger wasn’t stepping nicely and that prompted me to notice for the first time that VS had automatically selected the release build configuration and it jarred a memory in that I remembered that I had seen this exact same exception trying to run in release mode on the PC when I’d first written the code and I hadn’t figured it out putting it down to perhaps something in the preview SDK.

So, perhaps HoloLens wasn’t behaving any differently from the PC here? I switched to the debug configuration and, sure enough, the code doesn’t hit that marshalling exception and runs fine although I’m not sure yet about that ‘average time’ value that I’m calculating – that needs some looking into but here’s a screenshot of the app staring at a picture of a dachshund;

image

The screenshot is a bit weird because I cropped it out of a video recording and also because I’m holding up a picture of a dachshund in front of the app which is then displaying the view from its own webcam which contains the picture of the dachshund so it all gets a little bit recursive.

Here’s the app looking at a picture of an alsatian;

image

and it’s a little less sure about this pony;

image

So, for a quick experiment this is great in that I’ve taken the exact same code and the exact same model from the PC and it works ‘as is’ on these preview pieces on HoloLens Smile Clearly, I could do with taking a look at the time it seems to be taking to process frames but I suspect that’s to do with me running debug bits and/or the way in which I’m grabbing frames from the camera.

For me, it’s a bit of a challenge though to have this 2D XAML app get in the way of what the camera is actually looking at so the next step would be to see if I can put this into an immersive app rather than a 2D app and that’s perhaps where I’d follow up with a later blog post.

For this post, the code is just where it was for the previous post – nothing has changed Smile

By the way – I still don’t know what happens if I point the model at an actual dachshund/dog/pony – I need to get some of those for testing Winking smile and, additionally, I suspect that once the code is comfortable with being able to find a particular object then the next question is likely to involve locating it in the 3D scene which might involve some kind of correlation between the colour image and a depth image and I’m not sure whether that’s something that’s achievable – I’d need to think about that.