NB: The usual blog disclaimer for this site applies to posts around HoloLens. I am not on the HoloLens team. I have no details on HoloLens other than what is on the public web and so what I post here is just from my own experience experimenting with pieces that are publicly available and you should always check out the official developer site for the product documentation.
This post falls mainly into the category of “just for fun” but since I first got an immersive Windows Mixed Reality headset (an Acer) I’ve been quite keen to set up a situation where I could track its position using my HoloLens.
I don’t really know why and I don’t know whether there’s a tangible use for this, I just wanted to experiment with it.
What do I mean by track? Here’s a video example to explain. Please keep in mind that this was captured using mixed reality capture on a HoloLens which means that the quality is much lower than the on-device experience would be;
Tracking with Multiple HoloLens Devices
In building shared holographic experiences between multiple HoloLens devices its not too tricky to have multiple HoloLens devices all in one scene, networked together with each device capable of displaying the position, orientation and gaze vector of the other devices or some other shared holograms.
For the purposes of this post, I’m using the term “track” to describe the ability of one HoloLens to know the position and orientation of another device but it’s my own term rather than some official one.
There’s a script in the Mixed Reality Toolkit named RemoteHeadManager which does some of this for you and in previous blog posts like this one I’ve shown examples of doing that as demonstrated in the picture below;
where you can see a HoloLens floating and displaying its gaze ray. In that particular example the participant was remote and so there’s no local human being attached to that HoloLens but, hopefully, you get the idea.
Being able to do this piece of magic ultimately comes down to being able to agree a common co-ordinate system between the multiple devices or at least a transformation from the co-ordinate system of one device to that of another.
When you first run an application on a HoloLens the starting device (or head) position is taken as the origin of the Unity scene (i.e. a Vector3(0,0,0)) with the X,Y,Z axes pointing to the right, up and forward in the natural way with respect to the device and/or the user’s head.
This means that if multiple HoloLens devices are present in a location then, unless they all run the application by being placed in the exact same physical start up spot, they are all going to have different positions in that location meaning that their origin point (0,0,0) will be in a different physical position and their X,Y,Z axes are likely to be pointing in different directions.
How to rationalise across these different co-ordinate systems in order to be able to display consistent content? The devices need to agree on something
HoloLens sprinkles in some magic here because the device supports the idea of Spatial Anchors – a blob of data that represents a position and orientation in physical space.
The magic comes when you first learn that a HoloLens can export a spatial anchor, pass it over the network to another HoloLens and then that receiving device can attempt to import the same spatial anchor and locate it in the same space.
If that all works successfully (and generally it does) then the two devices now have an agreement about how a (position, rotation) within the room space is represented in their respective co-ordinate systems – this makes it “relatively easy” to consistently display objects.
A common way of then achieving that is to have each device maintain a GameObject locked to the position and orientation of the spatial anchor and then parent all content to be shared across devices from that GameObject such that all that content effectively has its origin and its axes determined by the anchored object.
This then means that e.g. a co-ordinate of (3,3,3) relative to the spatial anchored object on one device will show up in the same physical place in the world as a co-ordinate of (3,3,3) relative to the spatial anchored object on another device.
So, for HoloLens this is all good because of the magic of Spatial Anchors. What about an immersive headset?
Tracking with a HoloLens and an Immersive Headset
If you’ve looked at the immersive Mixed Reality headsets then you’ll know that they feature inside-out tracking and so it’s perhaps natural to assume that an application running on a PC displaying on an immersive headset would be able to import a spatial anchor from a HoloLens meaning that the code here would be the same as for the HoloLens scenario.
As far as I know, that’s not the case and I don’t believe it’s possible today to share a spatial anchor between an immersive headset and a HoloLens although I can’t quite find the definitive link that tells me that at the time of writing.
I’d be happy to be wrong here and it’d make the rest of the post redundant but that’d be a good thing
Additionally, it’s relevant to consider that on an immersive headset the origin (0,0,0) and axis orientation (X,Y,Z) is not just determined by the place and direction that the headset is physically sitting at the point when the application first runs.
The documentation on coordinate systems explains different the scales of experience as being orientation, seated, standing, room and world and the different frames of reference that make these experiences possible.
One of these is the stage frame of reference where the origin is going to be on the floor of the room at the point that the user defined it when they set up their headset. So, for instance it’s perfectly possible for an app to start on an immersive headset at some position of (2,0.5,2) rather than at (0,0,0) as it would on HoloLens.
So, if I’ve got a HoloLens and an immersive headset operating in the same physical space then they almost certainly will have different origins within the space and differently aligned axes.
In order then for the HoloLens to somehow track the immersive headset in its own co-ordinate system, some form of manual means is going to be needed to agree on some common reference point that can be used to span co-ordinate systems.
Now, one way of doing this might be to use something like a Vuforia tag but the immersive headsets don’t have a web camera on them and so I’m not sure this would be feasible like it would on HoloLens.
With that in mind, I set about an approach of doing this manually along the lines of the following steps;
- HoloLens app runs up and displays some marker object that can be positioned in physical space.
- The HoloLens app can then create an empty GameObject at this co-ordinate with the same orientation
- Immersive headset runs up and is physically moved to the same place as the HoloLens marker object with the same orientation.
- The immersive app can then be informed (e.g. via a voice command) to create an empty GameObject at this co-ordinate with the same orientation
- The immersive headset sends its subsequent camera positions over the network relative to the GameObject created at step 2.1 above.
- The HoloLens headset can now reposition its marker object using the co-ordinates sent from the immersive headset relative to the game object created at step 1.1 above.
and, while quite “manual”, this seems to work out relatively nicely and the human being does the work of telling the devices how to align their respective co-ordinate systems.
It’s like spatial anchors for the generation who remember black and white TV
In terms of how that was put together…
Putting it Together
The implementation of this seems relatively simple. I made a new Unity project in Unity 2017.2.0f3, brought in the Mixed Reality Toolkit and set up my project using the provided dialogs for;
- Project Settings
- Scene Settings
- UWP Capabilities (including the microphone)
and so in the scene below, the only part that I created is the GameObject labelled Root with the rest coming from the toolkit dialogs;
From there, I wanted a shared experience and was happy to make use of the sharing server and so I brought in the SharingStage prefab from the toolkit and configured it for my local PC’s IP address and I also configured it to use the AutoJoinSessionAndRoom script from the toolkit such that it would automatically join a default session and room on connection;
Also on that same object is a script called Logic.cs which simply tries to enable (in a clunky way) one of two child objects named HoloLens and Immersive depending on which type of headset the code is running on;
From there, the HoloLens object looks like this;
and so it makes use of a modified version of the CustomMessages.cs script taken from the toolkit’s tests project and then also contains this HoloLensLogic.cs script which essentially;
- Creates the prefab representing the spectacles 1.5m in front of the user and locks them to their gaze (this is a cheap way of positioning them)
- Waits for a click event and then
- Creates a new game object at the position where the spectacles are to be used as the parent representing that transform in space
- Registers to receive broadcasts of the immersive headset position and forward vector
- On receipt of a broadcast
- Updates the position of the spectacles (relative to the parent) to reflect the update from the remote immersive headset
On the immersive side, the game object is as below;
and so it also uses the same CustomMessages script, it also sets itself up to handle the speech keyword “mark” and has the ImmersiveLogic.cs script set up to provide that handling which;
- Waits for the speech keyword “mark” and then creates a GameObject to represent the position, orientation of the Camera in space at the point when that keyword is received.
- Once the common co-ordinate point has been defined, transmits its Camera position and forward vector relative to that GameObject over the network to the HoloLens on every Update().
That’s pretty much it – nothing too complicated once I’d decided on an approach although it took me a little while to figure things out initially and I learned a couple of things during the process.
As I said at the start of the post, this was “just for fun” and I’m not yet decided on the use cases for establishing a common co-ordinate system across HoloLens/immersive but something in me would like to take it one step further and add the code to make the immersive headset display the actual position of the HoloLens in its surroundings even if that doesn’t necessarily 100% make sense in an immersive environment.
Maybe I could also then add some code to create other holograms consistently positioned across the two devices. I’m not sure what it would ‘feel’ like to position an object in the real world with HoloLens and then to don an immersive headset and have that object appear “in the same location” given that I couldn’t see that location! Perhaps I need to try
I may update the code to do that at a later point – in the meantime, it’s all over here on github.