Mike Taulty's Blog
Bits and Bytes from Microsoft UK
Silverlight 4 Rough Notes: Camera and Microphone Support

Blogs

Mike Taulty's Blog

Elsewhere

Archives

Note – these posts are put together after a short time with Silverlight 4 as a way of providing pointers to some of the new features that Silverlight 4 has to offer. I’m posting these from the PDC as Silverlight 4 is announced for the first time so please bear that in mind when working through these posts.

One of the things that was announced early on about Silverlight 4 was that it would have support for capture from web cams and microphones.

If you right mouse on a Silverlight 4 control you’ll find a new tab;

image

where you can specify which video and audio sources on your machine are the default for Silverlight to use ( naturally, this will look different on OS X ).

There are then new classes in System.Windows.Media – I put them onto a diagram as a way of trying to understand how they fit together and it helped me a bit so I’ve reproduced that here;

image

If I want to get a picture of what devices I’ve got that support capture then I can go ahead and build a UI something like;

<UserControl
    x:Class="SilverlightApplication35.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:dg="clr-namespace:System.Windows.Controls;assembly=System.Windows.Controls.Data"
    mc:Ignorable="d"
    d:DesignHeight="300"
    d:DesignWidth="400">

    <Grid
        x:Name="LayoutRoot"
        Background="White">
        <Grid.RowDefinitions>
            <RowDefinition Height="Auto"/>
            <RowDefinition />
        </Grid.RowDefinitions>
        <ComboBox
            x:Name="comboDeviceType"
            ItemsSource="{Binding}"
            DisplayMemberPath="Name"
            Margin="5" />
        <dg:DataGrid
            Grid.Row="1"
            ItemsSource="{Binding ElementName=comboDeviceType,Path=SelectedValue.CaptureDevices}"
            AutoGenerateColumns="false">
            <dg:DataGrid.Columns>
                <dg:DataGridTextColumn
                    Header="Friendly Name"
                    Binding="{Binding FriendlyName}" />
                <dg:DataGridTemplateColumn>
                    <dg:DataGridTemplateColumn.CellTemplate>
                        <DataTemplate>
                            <dg:DataGrid
                                ItemsSource="{Binding SupportedFormats}">
                            </dg:DataGrid>
                        </DataTemplate>
                    </dg:DataGridTemplateColumn.CellTemplate>
                </dg:DataGridTemplateColumn>
                <dg:DataGridTextColumn
                    Header="Desired Format"
                    Binding="{Binding DesiredFormat}" />
                <dg:DataGridTextColumn
                    Header="Audio Frame Size"
                    Binding="{Binding AudioFrameSize}" />
                <dg:DataGridCheckBoxColumn
                    Header="Default Device"
                    Binding="{Binding IsDefaultDevice}" />
            </dg:DataGrid.Columns>
        </dg:DataGrid>
    </Grid>
</UserControl>

with a little code behind it;

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Animation;
using System.Windows.Shapes;
using System.Collections;

namespace SilverlightApplication35
{
  public partial class MainPage : UserControl
  {
    public MainPage()
    {
      InitializeComponent();

      this.Loaded += (s, e) =>
        {
          this.DataContext = new DeviceType[] 
          {
            new DeviceType() { Name = "Audio Devices", 
              CaptureDevices = (IEnumerable)CaptureDeviceConfiguration.GetAvailableAudioCaptureDevices() },

            new DeviceType() { Name = "Video Devices", 
              CaptureDevices = (IEnumerable)CaptureDeviceConfiguration.GetAvailableVideoCaptureDevices() }
          };
        };
    }
  }
  public class DeviceType
  {
    public string Name { get; set; }
    public IEnumerable CaptureDevices { get; set; }
  }
}

note that the “Audio Frame Size” doesn’t apply to video sources but I left the column in as I was being lazy and that’ll give me a pretty ugly UI that’ll enumerate the audio/video devices on my machine for me;

image image

So, it’s easy to figure out what audio/video devices you’ve got and the capabilities of them and you can use the DesiredFormat to pick one of the SupportedFormats for capture ( if, unlike me, you’ve got enough-of-a-clue about audio/video formats to make an informed choice around which one to choose :-) ).

How to go about getting some input from one of them? First, I need to ask if it’s going to be ok;

  if (!CaptureDeviceConfiguration.AllowedDeviceAccess)
      {
        bool ok = CaptureDeviceConfiguration.RequestDeviceAccess();

        if (ok)
        {
        }
      }

So…if we haven’t already been told that it’s ok to access these devices then we ask if it is ok to do so and the user gets a consent dialog, currently;

image

and the code gets the return value from the Yes/No option as a boolean.

Having been granted access, we need to do something with the device. One of the easiest things to do is to snap a photo. With a little UI like this;

<UserControl
    x:Class="SilverlightApplication35.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:dg="clr-namespace:System.Windows.Controls;assembly=System.Windows.Controls.Data"
    mc:Ignorable="d"
    d:DesignHeight="300"
    d:DesignWidth="400">

    <Grid
        x:Name="LayoutRoot"
        Background="White">
        <Grid.RowDefinitions>
            <RowDefinition />
            <RowDefinition Height="Auto"/>
        </Grid.RowDefinitions>
        <Image
            x:Name="snapImage"
            Margin="10" 
            Stretch="Fill"/>
        <Button
            Grid.Row="1"
            Content="Snap"
            Margin="10"
            Click="OnSnap" />
    </Grid>
</UserControl>

and with a little code behind which asks for permission to use the devices and then sets up a new CaptureSource to use the default video device and then calls AsyncCaptureImage on it to grab an image;

 public partial class MainPage : UserControl
  {
    public MainPage()
    {
      InitializeComponent();
    }
    private void OnSnap(object sender, RoutedEventArgs e)
    {
      bool ok = CaptureDeviceConfiguration.AllowedDeviceAccess;

      if (!ok)
      {
        ok = CaptureDeviceConfiguration.RequestDeviceAccess();
      }

      if (ok)
      {
        CaptureSource cs = new CaptureSource()
        {
          VideoCaptureDevice = CaptureDeviceConfiguration.GetDefaultVideoCaptureDevice()
        };
        cs.Start();
        cs.AsyncCaptureImage((bitmap) =>
          {
            Dispatcher.BeginInvoke(() =>
              {
                cs.Stop();
                snapImage.Source = bitmap;
              });
          });
      }
    }
  }

and I took a nice picture of my phone that way;

image

Or if I want to build a quick UI that displays what’s currently coming from the web-cam then that’s pretty easy. Changing the UI slightly;

<UserControl
    x:Class="SilverlightApplication35.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:dg="clr-namespace:System.Windows.Controls;assembly=System.Windows.Controls.Data"
    mc:Ignorable="d"
    d:DesignHeight="300"
    d:DesignWidth="400">

    <Grid
        x:Name="LayoutRoot"
        Background="White">
        <Grid.RowDefinitions>
            <RowDefinition />
            <RowDefinition
                Height="Auto" />
        </Grid.RowDefinitions>
        <Rectangle
            x:Name="rectVideo"
            Stroke="Gray"
            StrokeThickness="2"
            HorizontalAlignment="Stretch"
            VerticalAlignment="Stretch"
            RadiusX="5"
            RadiusY="5"
            Margin="10">
        </Rectangle>
        <StackPanel
            Orientation="Horizontal"
            Grid.Row="1">
            <Button
                Content="Start"
                Margin="10"
                Click="OnStart" />
            <Button
                Content="Stop"
                Margin="10"
                Click="OnStop" /> 
        </StackPanel>
    </Grid>
</UserControl>

and the code behind so that we use a CaptureSource again but this time we set its VideoCaptureDevice and then we feed the CaptureSource into a VideoBrush and paint a Rectangle with that Brush;

 public partial class MainPage : UserControl
  {
    public MainPage()
    {
      InitializeComponent();
    }
    private void OnStart(object sender, RoutedEventArgs e)
    {
      bool ok = CaptureDeviceConfiguration.AllowedDeviceAccess;

      if (!ok)
      {
        ok = CaptureDeviceConfiguration.RequestDeviceAccess();
      }

      if (ok)
      {
        if (source == null)
        {
          source = new CaptureSource()
          {
            VideoCaptureDevice = CaptureDeviceConfiguration.GetDefaultVideoCaptureDevice()
          };
          VideoBrush brush = new VideoBrush();
          brush.SetSource(source);
          rectVideo.Fill = brush;
        }
        source.Start();
      }
    }
    void OnStop(object sender, RoutedEventArgs args)
    {
      source.Stop();
    }
    CaptureSource source;
  }

giving me a live view from my webcam;

image

( I know, it looks like the previous screenshot but this one was video, honest :-) ).

But I think that if I want to get to the actual captured audio or video then I need to look to an AudioSink or VideoSink implementation to grab the sampled bits and do something with them ( e.g. stream them over a network ).

The way that this looks to work is that you derive from AudioSink, do something with the captured bytes as they come along. What I decided to try first off was to push the bytes into a MemoryStream as in;

  /// <summary>
  /// This class is going to eat a tonne of memory...
  /// </summary>
  public class MemoryStreamAudioSink : AudioSink 
  {
    protected override void OnCaptureStarted()
    {
      stream = new MemoryStream();
    }
    protected override void OnCaptureStopped()
    {      
    }
    public AudioFormat AudioFormat
    {
      get
      {
        return (audioFormat);
      }
    }
    public MemoryStream AudioData
    {
      get
      {
        return (stream);
      }
    }
    protected override void OnFormatChange(AudioFormat audioFormat)
    {
      if (this.audioFormat == null)
      {
        this.audioFormat = audioFormat;
      }
      else
      {
        throw new InvalidOperationException();
      }
    }
    protected override void OnSamples(long sampleTime, long sampleDuration, byte[] sampleData)
    {
      stream.Write(sampleData, 0, sampleData.Length);
    }
    MemoryStream stream;
    AudioFormat audioFormat;
  }

so my particular AudioSink just takes all the audio and shoves it into a MemoryStream. It then exposes ( bad idea ) that MemoryStream as a property for a caller to grab hold of. It also makes a note of the AudioFormat that comes into OnFormatChange and exposes that as a property as well and it doesn’t allow it to change ( as that’d involve more complexity for me ).

With that in place, I can keep my existing UI and hijack the Start/Stop buttons that I’ve already got on screen in order to make use of my new MemoryStreamAudioSink class;

  public partial class MainPage : UserControl
  {
    public MainPage()
    {
      InitializeComponent();
    }
    private void OnStart(object sender, RoutedEventArgs e)
    {
      bool ok = CaptureDeviceConfiguration.AllowedDeviceAccess;

      if (!ok)
      {
        ok = CaptureDeviceConfiguration.RequestDeviceAccess();
      }

      if (ok)
      {
        if (audioSink == null)
        {
          CaptureSource source = new CaptureSource()
          {
            AudioCaptureDevice = CaptureDeviceConfiguration.GetDefaultAudioCaptureDevice()
          };
          audioSink = new MemoryStreamAudioSink();
          audioSink.CaptureSource = source;
        }
        audioSink.CaptureSource.Start();
      }
    }
    void OnStop(object sender, RoutedEventArgs args)
    {
      audioSink.CaptureSource.Stop();

      using (FileStream stream = File.OpenWrite(
        Environment.GetFolderPath(Environment.SpecialFolder.MyMusic) + "\\test.wav"))
      {
        byte[] wavFileHeader = WavFileHelper.GetWavFileHeader(audioSink.AudioData.Length,
          audioSink.AudioFormat);

        stream.Write(wavFileHeader, 0, wavFileHeader.Length);

        // Now write the rest of the data...
        byte[] buffer = new byte[4096];
        int read = 0;

        audioSink.AudioData.Seek(0, SeekOrigin.Begin);

        while ((read = audioSink.AudioData.Read(buffer, 0, buffer.Length)) > 0)
        {        
          stream.Write(buffer, 0, read);
        }
        stream.Flush();
        stream.Close();
      }
    }
    MemoryStreamAudioSink audioSink;
  }

so, now, when we click the Start button we go ahead and grab the default audio capture device and we set that as the CaptureSource of my new audioSink and then we tell that CaptureSource to Start() which looks to give me a call to OnAudioFormatChange on my MemoryStreamAudioSink class and then subsequent calls to OnSamples as audio comes in.

In the Stop button handler we tell the MemoryStreamAudioSink.CaptureSource to Stop() and then I use the MemoryStream ( available in property AudioData ) and the AudioFormat ( available in property AudioFormat ) on my MemoryStreamAudioSink class in order to write the data out to a file in the MyMusic folder.

Note – in order to have access to that file, this application would have to run out-of-browser and elevated which has not been necessary up until now and is only for the file access.

In order to get the data out in a format that can be played, I tried to write it out as a WAV file and after a little reading around what those files look like I added the WavFileHelper class here to put together the right file header. That’s included below;

  // Caveat - I know nothing about WAV files, I just read the header spec
  // on the web
  public static class WavFileHelper
  {
    public static byte[] GetWavFileHeader(long audioLength,
      AudioFormat audioFormat)
    {
      // This code could use some constants...
      MemoryStream stream = new MemoryStream(44);

      // "RIFF"
      stream.Write(new byte[] { 0x52, 0x49, 0x46, 0x46 }, 0, 4);

      // Data length + 44 byte header length - 8 bytes occupied by first 2 fields
      stream.Write(BitConverter.GetBytes((UInt32)(audioLength + 44 - 8)), 0, 4);

      // "WAVE"
      stream.Write(new byte[] { 0x57, 0x41, 0x56, 0x45 }, 0, 4);

      // "fmt "
      stream.Write(new byte[] { 0x66, 0x6D, 0x74, 0x20 }, 0, 4);

      // Magic # of PCM file - not sure about that one
      stream.Write(BitConverter.GetBytes((UInt32)16), 0, 4);

      // 1 == Uncompressed
      stream.Write(BitConverter.GetBytes((UInt16)1), 0, 2);

      // Channel count
      stream.Write(BitConverter.GetBytes((UInt16)audioFormat.Channels), 0, 2);

      // Sample rate
      stream.Write(BitConverter.GetBytes((UInt32)audioFormat.SamplesPerSecond), 0, 4); 

      // Byte rate
      stream.Write(BitConverter.GetBytes((UInt32)
          ((audioFormat.SamplesPerSecond *
          audioFormat.Channels * audioFormat.BitsPerSample) / 8)), 0, 4);

      // Block alignment
      stream.Write(BitConverter.GetBytes((UInt16)
          ((audioFormat.Channels * audioFormat.BitsPerSample) / 8)), 0, 2);

      // Bits per sample
      stream.Write(BitConverter.GetBytes((UInt16)audioFormat.BitsPerSample), 0, 2);

      // "data"
      stream.Write(new byte[] { 0x64, 0x61, 0x74, 0x61 }, 0, 4);

      // Length of the rest of the file
      stream.Write(BitConverter.GetBytes((UInt32)audioLength), 0, 4);

      return (stream.GetBuffer());
    }
  }

and so with that in place, I can get audio captured from the microphone and store it into a WAV file. Naturally, I could just as easily send it over the network to some server for some other kind of processing or relay.

Video works in a similar way in that I can write a VideoSink and use that to capture the video as it comes in. A dummy VideoSink that puts everything into a MemoryStream ( even less practical for video ) might be;

  public class MemoryStreamVideoSink : VideoSink
  {
    public VideoFormat CapturedFormat { get; private set; }
    public MemoryStream CapturedVideo { get; private set; }

    protected override void OnCaptureStarted()
    {
      CapturedVideo = new MemoryStream();
    }
    protected override void OnCaptureStopped()
    {      
    }
    protected override void OnFormatChange(VideoFormat videoFormat)
    {
      if (CapturedFormat != null)
      {
        throw new InvalidOperationException("Can't cope with change!");
      }
      CapturedFormat = videoFormat; 
    }
    protected override void OnSample(long sampleTime, long frameDuration, byte[] sampleData)
    {
      CapturedVideo.Write(sampleData, 0, sampleData.Length);
    }
  }

so it’s basically the same class derived from a different base class and then I could re-purpose my existing UI code-begin once again to do something like;

public partial class MainPage : UserControl
  {
    public MainPage()
    {
      InitializeComponent();
    }
    private void OnStart(object sender, RoutedEventArgs e)
    {
      bool ok = CaptureDeviceConfiguration.AllowedDeviceAccess;

      if (!ok)
      {
        ok = CaptureDeviceConfiguration.RequestDeviceAccess();
      }

      if (ok)
      {
        if (videoSink == null)
        {
          captureSource = new CaptureSource()
          {
            VideoCaptureDevice = CaptureDeviceConfiguration.GetDefaultVideoCaptureDevice()
          };
          videoSink = new MemoryStreamVideoSink();
          videoSink.CaptureSource = captureSource;
        }
        videoSink.CaptureSource.Start();
      }
    }
    void OnStop(object sender, RoutedEventArgs args)
    {
      captureSource.Stop();

      // Do something with the captured bytes in videoSink.CapturedVideo...
      
    }
    CaptureSource captureSource;
    MemoryStreamVideoSink videoSink;
  }

Now, that all seems to work fine for me – note that there’s a difference in what I did here to the Audio code. On the call to captureSource.Stop() I found that I got “Capture Source Is Not Stopped” exceptions and I wondered whether that was because I was reaching into the VideoSink whilst it was still running to try and grab the CaptureSource and stop it. So,I kept a separate reference to the CaptureSource and that seemed to resolve it – not sure at all on that one at the time of writing.

Beyond that, it looks like the audio example except I haven’t written anything that attempts to store this data into a playable file. On my machine the VideoFormat that I get by default was Format32bppArgb running at 640x480 and 30 frames per second but I don’t know ( and haven’t looked ) at how I’d write that into a file that I can play with Media Player – I suspect that’d be more work so I’m leaving right now as I guess that the primary use for this functionality will not be to capture to local files but, instead, to capture video and send it asynchronously over the network somewhere…


Posted Wed, Nov 18 2009 10:44 AM by mtaulty
Filed under: ,

Comments

Ken Smith wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Fri, Nov 20 2009 12:15 AM

Thanks, Mike, this is darned handy.  Does MS have any direction yet with regard to how best to publish a video stream up to a server somewhere?  It seems like streaming the raw video probably isn't a great option, but I'm coming up short on any Silverlight-callable codecs that could be used to transform the straight byte stream from the video into something that could reasonably be streamed to a remote server (either for recording, say, or rebroadcast).  Any thoughts as of yet?

Jacob wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Fri, Nov 20 2009 2:23 AM

Hi

mtaulty

Have a qustion , how way to save both audio and video stream as a wmv file?

And how way use socket develop a net meeting app?

Thank you!

Sergio wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Fri, Nov 20 2009 4:16 AM

Hi Mike,

Thanks for post.

How can I get one stream with video and audio from CaptureSource? For example write it to file.

Ken wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Fri, Nov 20 2009 9:40 AM

How to save an captured audio to the server, so I can run in browser.

Jason S. Clary wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Sun, Nov 22 2009 9:07 PM

It appears the video format is just 32bpp ARGB raw bitmap frames -- one per call to OnSample().  You can copy it straight into a WritableBitmap.

Using a library like FJCore (with a minor alteration to stop it from writing the JFIF header) you could write out a compressed Motion-JPEG stream. M-JPEG is just concatenated raw JPEG images without the JFIF container.

It shouldn't be too terribly difficult to create the RIFF headers necessary to write that and the PCM audio stream into an AVI file.  Microsoft has decent docs on the AVI format.

I'm also looking for a PCM to ADPCM codec to get a little compression there as well. It's simple enough it shouldn't be too hard to write an encoder, though.

It should also be fairly easy to bundle up the samples in RTP packets and send them via RTSP to a streaming server.  M-JPEG bundling for RTP is defined in RFC2435.

The RTP packet format for PCM and ADPCM is defined in RFC3551.

M-JPEG and ADPCM aren't the best compression methods but they don't require as much CPU as many others and they are widely supported so it's a good place to start until MS gives us some native codecs.  

This being an early beta, I'm assuming they'll probably provide a way to stream to Windows Media Services in the go-live beta.

Richard wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Wed, Nov 25 2009 10:40 AM

Is there not an option to turn off camera/microphone access from the configuration? Getting the confirmation message every time an SL application runs would be quite annoying - even more so if the app loops on RequestDeviceAccess until you click OK!

Manuel wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Thu, Nov 26 2009 10:40 AM

@Jason S. Clary

How would you go about copying the byte[] into a WriteableBitmap?

I was considering a streaming scenario where you could capture the video and send it frame by frame. the problem for me is transforming bytes into readable frames and if I could create a writeablebitmap from a byte[] my problem would be solved.

Philipp wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Tue, Dec 1 2009 1:24 PM

I never see any consent dialog after calling CaptureDeviceConfiguration.RequestDeviceAccess() as part of the constructor or the Loaded delegate.

Is there a particular step missing maybe for showing the consent dialog?

Rad wrote re: Silverlight 4 Rough Notes: Camera and Microphone Support
on Thu, Dec 10 2009 11:23 PM

I can't save this wav file, throw a exception at OpenWrite method :"System.Security.SecurityException: [FileSecurityState_OperationNotPermitted]"

How to solve it?

thanks

Blog J.Schweiss wrote Touchless Video Control
on Sun, Apr 22 2012 7:55 PM

Touchless Video Control

glad wrote glad
on Thu, Oct 9 2014 8:17 AM

Silverlight 4 Rough Notes: Camera and Microphone Support - Mike Taulty's Blog - Mike Taulty's Blog