Following up on my previous post, I wanted to take the very basic test code that I’d got working ‘reasonably’ on UWP on my desktop PC and see if I could move it to HoloLens running inside of a Unity application.
The intention would be to preserve the very limited functionality that I have which goes something like;
- The app runs up, is given the details of the signalling service (from the PeerCC sample) to connect to and it then connects to it
- The app finds a peer on the signalling service and tries to get a two-way audio/video call going with that peer displaying local/remote video and capturing local audio while playing remote audio.
That’s what I currently have in the signalling branch here and the previous blog post was about abstracting some of that out such that I could use it in a different environment like Unity.
Now it’s time to see if that works out…
Getting Some Early Bits to Work With
In order to even think about this I needed to try and pick up a version of UWP webRTC that works in that environment and which has some extra pieces to help me out and, as far as I know, at the time of writing that involves picking up bits that are mentioned in this particular issue over on github by the UWP webRTC team;
and there are instructions in that post around how to get hold of some bits;
and so I followed those instructions and built the code from that branch of that repo.
From there, I’ve been working with my colleague Pete to put together some of those pieces with the pieces that I already had from the previous blog posts.
First, a quick look around the bits that the repo gives us…
Exploring the WebRtcUnity PeerCC Sample Solution
As is often the case, this process looks like it is going to involve standing on the shoulder of some other giants because there’s already code in the UWP webRTC repo that I pointed to above that shows how to put this type of app together.
The code in question is surfaced through this solution in the repo;
Inside of that solution, there’s a project which builds out the equivalent of original XAML+MediaElement PeerCC sample but in a modified way which here doesn’t have to use MediaElement to render and that shift in the code here is represented by its additional Unity dependencies;
This confused me for a little while – I was wondering why this XAML based application suddenly had a big dependency on Unity until I realised that what’s been done here to show that media can be rendered by Unity is that the original sample code has been modified such that (dependent on the conditional compilation constant UNITY) this app can either render media streams;
- Using MediaElement as it did previously
- Using Unity rendering pieces which are then hosted inside of a SwapChainPanel inside of the XAML UI.
Now, I’ve failed to get this sample to run on my machine which I think is down to the versions of Unity that I’m running and so I had to go through a process of picking through the code a little ‘cold’ but in so far as I can see what goes on here is that there are a couple of subprojects involved in making this work…
The Org.WebRtc.Uwp Project
This project was already present in the original XAML-based solution and in my mind this is involved with wrapping some C++/CX code around the webrtc.lib library in order to bring types into a UWP environment. I haven’t done a delta to try and see how much/little is different in this branch of this project over the original sample so there may be differences.
The MediaEngineUWP and WebRtcScheme Projects
Then there’s 2 projects within the Unity sample’s MediaEngine folder which I don’t think were present in the original purely XAML-based PeerCC sample;
The MediaEngineUWP and WebRtcScheme projects build out DLLs which seem to take on a couple of roles although I’m more than willing to admit that I don’t have this all worked out in my head at the time of writing but I think they are about bridging between the Unity platform, the Windows Media platform and webRTC and I think they do this by;
- The existing work in the Org.WebRtc.Uwp project which integrates webRTC pieces into the Windows UWP media pipeline. I think this is done by adding a webRTC VideoSinkInterface which then surfaces the webRTC pieces as the UWP IMediaSource and IMediaStreamSource types.
- The MediaEngineUWP.dll having an export UnityPluginLoad function which grabs an IUnityGraphics and offers a number of other exports that can be called via PInvoke from Unity to set up the textures for local/remote video rendering of video frames in Unity by code inside of this DLL.
- There’s a class in this project named MediaEnginePlayer which is instanced per video stream and which seems to do the work of grabbing frames from the incoming Windows media pipeline and transferring them into Unity textures.
- The same class looks to use the IMFMediaEngineNotify callback interface to be notified of state changes for the media stream and responds by playing/stopping etc.
The wiring together of this MediaEnginePlayer into the media pipeline is a little opaque to me but I think that it follows what is documented here and under the topic Source Resolver here. This seems to involve the code associating a URL (of form webrtc:GUID) with each IMediaStream and having an activatable class which the media pipeline then invokes with the URL to be linked up to the right instance of the player.
That may be a ‘much less than perfect’ description of what goes on in these projects as I haven’t stepped through all of that code.
What I think it does mean though is that the code inside of the WebRtcScheme project requires that the .appxmanifest for an app that consumes it needs to include a section that looks like;
<Extensions> <Extension Category="windows.activatableClass.inProcessServer"> <InProcessServer> <Path>WebRtcScheme.dll</Path> <ActivatableClass ActivatableClassId="WebRtcScheme.SchemeHandler" ThreadingModel="both" /> </InProcessServer> </Extension> </Extensions>
I don’t know of a way of setting this up inside of a Unity project so I ended up just letting Unity build the Visual Studio solution and then I manually hack the manifest to include this section
Exploring the Video Control Solution
I looked into another project within that github repo which is a Unity project contained within this folder;
There’s a Unity scene which has a (UI) Canvas and a couple of Unity Raw Image objects which can be used to render to;
and a Control script which is set up to PInvoke into the MediaEngineUWP to pass the pieces from the Unity environment into the DLL. That script looks like this;
using System; using System.Runtime.InteropServices; using UnityEngine; using UnityEngine.UI; #if !UNITY_EDITOR using Org.WebRtc; using Windows.Media.Core; #endif public class ControlScript : MonoBehaviour { public uint LocalTextureWidth = 160; public uint LocalTextureHeight = 120; public uint RemoteTextureWidth = 640; public uint RemoteTextureHeight = 480; public RawImage LocalVideoImage; public RawImage RemoteVideoImage; void Awake() { } void Start() { } private void OnInitialized() { } private void OnEnable() { } private void OnDisable() { } void Update() { } public void CreateLocalMediaStreamSource(object track, string type, string id) { Plugin.CreateLocalMediaPlayback(); IntPtr nativeTex = IntPtr.Zero; Plugin.GetLocalPrimaryTexture(LocalTextureWidth, LocalTextureHeight, out nativeTex); var primaryPlaybackTexture = Texture2D.CreateExternalTexture((int)LocalTextureWidth, (int)LocalTextureHeight, TextureFormat.BGRA32, false, false, nativeTex); LocalVideoImage.texture = primaryPlaybackTexture; #if !UNITY_EDITOR MediaVideoTrack videoTrack = (MediaVideoTrack)track; var source = Media.CreateMedia().CreateMediaStreamSource(videoTrack, type, id); Plugin.LoadLocalMediaStreamSource((MediaStreamSource)source); Plugin.LocalPlay(); #endif } public void DestroyLocalMediaStreamSource() { LocalVideoImage.texture = null; Plugin.ReleaseLocalMediaPlayback(); } public void CreateRemoteMediaStreamSource(object track, string type, string id) { Plugin.CreateRemoteMediaPlayback(); IntPtr nativeTex = IntPtr.Zero; Plugin.GetRemotePrimaryTexture(RemoteTextureWidth, RemoteTextureHeight, out nativeTex); var primaryPlaybackTexture = Texture2D.CreateExternalTexture((int)RemoteTextureWidth, (int)RemoteTextureHeight, TextureFormat.BGRA32, false, false, nativeTex); RemoteVideoImage.texture = primaryPlaybackTexture; #if !UNITY_EDITOR MediaVideoTrack videoTrack = (MediaVideoTrack)track; var source = Media.CreateMedia().CreateMediaStreamSource(videoTrack, type, id); Plugin.LoadRemoteMediaStreamSource((MediaStreamSource)source); Plugin.RemotePlay(); #endif } public void DestroyRemoteMediaStreamSource() { RemoteVideoImage.texture = null; Plugin.ReleaseRemoteMediaPlayback(); } private static class Plugin { [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "CreateLocalMediaPlayback")] internal static extern void CreateLocalMediaPlayback(); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "CreateRemoteMediaPlayback")] internal static extern void CreateRemoteMediaPlayback(); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "ReleaseLocalMediaPlayback")] internal static extern void ReleaseLocalMediaPlayback(); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "ReleaseRemoteMediaPlayback")] internal static extern void ReleaseRemoteMediaPlayback(); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "GetLocalPrimaryTexture")] internal static extern void GetLocalPrimaryTexture(UInt32 width, UInt32 height, out System.IntPtr playbackTexture); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "GetRemotePrimaryTexture")] internal static extern void GetRemotePrimaryTexture(UInt32 width, UInt32 height, out System.IntPtr playbackTexture); #if !UNITY_EDITOR [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LoadLocalMediaStreamSource")] internal static extern void LoadLocalMediaStreamSource(MediaStreamSource IMediaSourceHandler); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LoadRemoteMediaStreamSource")] internal static extern void LoadRemoteMediaStreamSource(MediaStreamSource IMediaSourceHandler); #endif [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LocalPlay")] internal static extern void LocalPlay(); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "RemotePlay")] internal static extern void RemotePlay(); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "LocalPause")] internal static extern void LocalPause(); [DllImport("MediaEngineUWP", CallingConvention = CallingConvention.StdCall, EntryPoint = "RemotePause")] internal static extern void RemotePause(); } }
and so it’s essentially giving me the pieces that I need to wire up local/remote media streams coming from webRTC into the pieces that can render them in Unity.
If feels like across these projects are the pieces needed to plug together with my basic library project in order to rebuild the app that I had in the previous blog post and have it run inside of a 3D Unity app rather than a 2D XAML app…
Plugging Together the Pieces
Pete put together a regular Unity project targeting UWP for HoloLens and in the scene at the moment we have only 2 quads that we try to render the local and remote video to.
and then there’s an empty GameObject named Control with a script on it configured as below;
and you can see that this configuration is being used to do a couple of things;
- Set up the properties that my conversation library code from the previous blog post needed to try and start a conversation over webRTC
- The signalling server IP address, port number, whether to initiate a conversation or not and, if so, whether there’s a particular peer name to initiate that conversation with.
- Set up some properties that will facilitate rendering of the video into the materials texturing the 2 quads in the scene.
- Widths, heights to use.
- The GameObjects that we want to render our video streams to.
Pete re-worked the original sample code to render to a texture of a material applied to a quad rather than the original rendering to a 2D RawImage.
Now, it’s fairly easy to then add my conversation library into this Unity project so that we can make use of that code. We simply drop it into the Assets of the project and configure up the appropriate build settings for Unity;
and also drop in the MediaEngineUWP, Org.WebRtc.dll and WebRtcScheme.dlls;
and the job then becomes one of adapting the code that I wrote in the previous blog post to suit the Unity environment which means being able to implement the IMediaManager interface that I came up with for Unity rather than for XAML.
How to go about that? Firstly, We took those PInvoke signatures from the VideoControlSample and put them into a separate static class named Plugin.
Secondly, we implemented that IMediaManager interface on top of the pieces that originated in the sample;
#if ENABLE_WINMD_SUPPORT using ConversationLibrary.Interfaces; using ConversationLibrary.Utility; using Org.WebRtc; using System; using System.Linq; using System.Threading.Tasks; using UnityEngine; using UnityEngine.WSA; using Windows.Media.Core; public class MediaManager : IMediaManager { // This constructor will be used by the cheap IoC container public MediaManager() { this.textureDetails = CheapContainer.Resolve<ITextureDetailsProvider>(); } // The idea is that this constructor would be used by a real IoC container. public MediaManager(ITextureDetailsProvider textureDetails) { this.textureDetails = textureDetails; } public Media Media => this.media; public MediaStream UserMedia => this.userMedia; public MediaVideoTrack RemoteVideoTrack { get => remoteVideoTrack; set => remoteVideoTrack = value; } public async Task AddLocalStreamAsync(MediaStream stream) { var track = stream?.GetVideoTracks()?.FirstOrDefault(); if (track != null) { // TODO: stop hardcoding I420?. this.InvokeOnUnityMainThread( () => this.CreateLocalMediaStreamSource(track, LOCAL_VIDEO_FRAME_FORMAT, "SELF")); } } public async Task AddRemoteStreamAsync(MediaStream stream) { var track = stream?.GetVideoTracks()?.FirstOrDefault(); if (track != null) { // TODO: stop hardcoding I420?. this.InvokeOnUnityMainThread( () => this.CreateRemoteMediaStreamSource(track, REMOTE_VIDEO_FRAME_FORMAT, "PEER")); } } void InvokeOnUnityMainThread(AppCallbackItem callback) { UnityEngine.WSA.Application.InvokeOnAppThread(callback,false); } void InvokeOnUnityUIThread(AppCallbackItem callback) { UnityEngine.WSA.Application.InvokeOnUIThread(callback, false); } public async Task CreateAsync(bool audioEnabled = true, bool videoEnabled = true) { this.media = Media.CreateMedia(); // TODO: for the moment, turning audio off as I get an access violation in // some piece of code that'll take some debugging. RTCMediaStreamConstraints constraints = new RTCMediaStreamConstraints() { // TODO: switch audio back on, fix the crash. audioEnabled = false, videoEnabled = true }; this.userMedia = await media.GetUserMedia(constraints); } public void RemoveLocalStream() { // TODO: is this ever getting called? this.InvokeOnUnityMainThread( () => this.DestroyLocalMediaStreamSource()); } public void RemoveRemoteStream() { this.DestroyRemoteMediaStreamSource(); } public void Shutdown() { if (this.media != null) { if (this.localVideoTrack != null) { this.localVideoTrack.Dispose(); this.localVideoTrack = null; } if (this.RemoteVideoTrack != null) { this.RemoteVideoTrack.Dispose(); this.RemoteVideoTrack = null; } this.userMedia = null; this.media.Dispose(); this.media = null; } } void CreateLocalMediaStreamSource(object track, string type, string id) { Plugin.CreateLocalMediaPlayback(); IntPtr playbackTexture = IntPtr.Zero; Plugin.GetLocalPrimaryTexture( this.textureDetails.Details.LocalTextureWidth, this.textureDetails.Details.LocalTextureHeight, out playbackTexture); this.textureDetails.Details.LocalTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = (Texture)Texture2D.CreateExternalTexture( (int)this.textureDetails.Details.LocalTextureWidth, (int)this.textureDetails.Details.LocalTextureHeight, (TextureFormat)14, false, false, playbackTexture); #if ENABLE_WINMD_SUPPORT Plugin.LoadLocalMediaStreamSource( (MediaStreamSource)Org.WebRtc.Media.CreateMedia().CreateMediaStreamSource((MediaVideoTrack)track, type, id)); #endif Plugin.LocalPlay(); } void DestroyLocalMediaStreamSource() { this.textureDetails.Details.LocalTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = null; Plugin.ReleaseLocalMediaPlayback(); } void CreateRemoteMediaStreamSource(object track, string type, string id) { Plugin.CreateRemoteMediaPlayback(); IntPtr playbackTexture = IntPtr.Zero; Plugin.GetRemotePrimaryTexture( this.textureDetails.Details.RemoteTextureWidth, this.textureDetails.Details.RemoteTextureHeight, out playbackTexture); // NB: creating textures and calling GetComponent<> has thread affinity for Unity // in so far as I can tell. var texture = (Texture)Texture2D.CreateExternalTexture( (int)this.textureDetails.Details.RemoteTextureWidth, (int)this.textureDetails.Details.RemoteTextureHeight, (TextureFormat)14, false, false, playbackTexture); this.textureDetails.Details.RemoteTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = texture; #if ENABLE_WINMD_SUPPORT Plugin.LoadRemoteMediaStreamSource( (MediaStreamSource)Org.WebRtc.Media.CreateMedia().CreateMediaStreamSource((MediaVideoTrack)track, type, id)); #endif Plugin.RemotePlay(); } void DestroyRemoteMediaStreamSource() { this.textureDetails.Details.RemoteTexture.GetComponent<Renderer>().sharedMaterial.mainTexture = null; Plugin.ReleaseRemoteMediaPlayback(); } Media media; MediaStream userMedia; MediaVideoTrack remoteVideoTrack; MediaVideoTrack localVideoTrack; ITextureDetailsProvider textureDetails; // TODO: temporary hard coding... static readonly string LOCAL_VIDEO_FRAME_FORMAT = "I420"; static readonly string REMOTE_VIDEO_FRAME_FORMAT = "H264"; } #endif
Naturally, this is very “rough” code right now and there’s some hard-coding going on in there but it didn’t take too much effort to plug these pieces under that interface that I’d brought across from my original, minimal XAML-based project.
So…with all of that said…
Does It Work?
Sort of Firstly, you might notice in the code above that audio is hard-coded to be switched off because we currently have a crash if we switch audio on and it’s some release of some smart pointer in the webRTC pieces that we haven’t yet tracked down.
Minus audio, it’s possible to run the Unity app here on HoloLens and have it connect via the sample-provided signalling service to the original XAML-based PeerCC sample running (e.g.) on my Surface Book and video streams flow and are visible in both directions.
Here’s a screenshot of that “in action” from the point of view of the desktop app receiving video stream from HoloLens;
and that screenshot is display 4 things;
- Bottom right is the local PC’s video stream off its webcam – me wearing a HoloLens.
- Upper left 75% is the remote stream coming from the webcam on the HoloLens including its holographic content which currently includes;
- Upper left mid section is the remote video stream from the PC replayed on the HoloLens.
- Upper right mid section is the local HoloLens video stream replayed on the HoloLens which looked to disappear when I was taking this screenshot.
You might see some numbers in there that suggest 30fps but I think that was a temporary thing and at the time of writing the performance so far is fairly bad but we’ve not had any look at what’s going on there just yet – this ‘play’ sample needs some more investigation.
Where’s the Code?
If you’re interested in following these experiments along as we go forward then the code is in a different location to the previous repo as it’s over here on Pete’s github account;
Feel free to feedback but, of course, apply the massive caveats that this is very rough experimentation at the moment – there’s a long way to go