One of the new areas of developer functionality that came along with Windows 10 was the new set of AudioGraph APIs.
There’s a good write-up of what these are all about over on MSDN;
I’ve only wandered into these APIs twice to date but I’ve found them to be really approachable and easy to use from the point of view of someone who doesn’t know a whole lot about mixing, routing and processing audio.
I first came across them when I was wanting to invoke the Microsoft Cognitive Services API for speaker identification and verification.
As far as I know, there’s no UWP client library that makes those APIs ‘easy’ to call in that they demand that you record PCM audio in a particular format on the client and send it to the server and so I had to reach into the AudioGraph APIs in order to get that audio recorded from a UWP app and I wrote that up in these posts;
and I also tidied up that code and built it into a partial wrapper for the Cognitive Services speaker verification & identification APIs that I put onto GitHub here;
and, specifically, it’s this class that provides basic functionality to record a particular length of audio into a temporary file.
I wandered back into the APIs in the last week as I got a mail out of the blue from a developer who’d seen that earlier post and wondered whether I could help him get unblocked around his scenario of combining two WAV files together (i.e. overlaying them) and outputting into another WAV file.
I had no idea quite how to do that at the point where the mail hit my inbox but I thought it would be interesting to go back into AudioGraph to see if I could figure it out and I found it to be ‘fun’ and ‘interesting’ enough to want to write up here and share as I don’t find a lot of AudioGraph articles when I search the web right now.
How to combine two WAV files into another WAV file? I thought I might try to generalise and combine N audio files (WAV or otherwise) into a single audio output file (WAV or otherwise).
I sketched out a quick XAML ‘UI’ to let me test that out;
<Page x:Class="App14.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:App14" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d"> <Grid> <Grid.RowDefinitions> <RowDefinition /> <RowDefinition Height="3*" /> <RowDefinition /> </Grid.RowDefinitions> <Grid.ColumnDefinitions> <ColumnDefinition /> <ColumnDefinition Width="3*" /> <ColumnDefinition /> </Grid.ColumnDefinitions> <Grid Background="Silver" Padding="8" Grid.Row="1" Grid.Column="1"> <Grid.RowDefinitions> <RowDefinition /> <RowDefinition Height="Auto" /> <RowDefinition Height="Auto" /> </Grid.RowDefinitions> <ListView Background="BLack" ItemsSource="{Binding InputFiles}" Header="Input Files"> <ListView.ItemTemplate> <DataTemplate> <TextBlock Text="{Binding Path}" /> </DataTemplate> </ListView.ItemTemplate> </ListView> <TextBox Grid.Row="1" IsReadOnly="True" Header="Output File" Text="{Binding OutputFile.Path}" /> <StackPanel Orientation="Horizontal" HorizontalAlignment="Right" Grid.Row="2"> <AppBarButton Label="Add Input File" Icon="Add" Click="OnAddInputFile" /> <AppBarButton Label="Set Output File" Icon="Save" Click="OnSetOutputFile" /> <AppBarButton Label="Mix" Icon="Play" Click="OnMix" /> </StackPanel> <Grid x:Name="gridMixing" Background="Black" Grid.RowSpan="3" Visibility="Collapsed"> <StackPanel HorizontalAlignment="Center" VerticalAlignment="Center"> <ProgressRing Foreground="White" Width="50" Height="50" IsActive="True" /> <TextBlock Margin="4" Text="Mixing..." /> </StackPanel> </Grid> </Grid> </Grid> </Page>
which just presents a button to add input files, a button to set the output file and a list of the input files added as below;
With that in place, I wrote a little code behind modelled around a class that I called AudioFileGraphMixer (naming is hard) and that code-behind looks like this;
namespace App14 { using System; using System.Collections.ObjectModel; using System.ComponentModel; using System.Runtime.CompilerServices; using System.Threading.Tasks; using Windows.Media.MediaProperties; using Windows.Storage; using Windows.UI.Popups; using Windows.UI.Xaml; using Windows.UI.Xaml.Controls; public sealed partial class MainPage : Page, INotifyPropertyChanged { public event PropertyChangedEventHandler PropertyChanged; public MainPage() { this.InitializeComponent(); this.inputFiles = new ObservableCollection<StorageFile>(); this.audioCombiner = new AudioGraphFileMixer(); this.Loaded += OnLoaded; } void OnLoaded(object sender, RoutedEventArgs e) { this.DataContext = this; } public ObservableCollection<StorageFile> InputFiles { get { return (this.inputFiles); } } public StorageFile OutputFile { get { return (this.outputFile); } set { this.SetProperty(ref this.outputFile, value); } } async void OnAddInputFile(object sender, RoutedEventArgs e) { var file = await FileDialogExtensions.PickFileForReadAsync( ".mp3", ".wav"); if (file != null) { this.inputFiles.Add(file); this.audioCombiner.AddInputFile(file); } } async void OnSetOutputFile(object sender, RoutedEventArgs e) { var file = await FileDialogExtensions.PickFileForSaveAsync( "Audio File", ".wav", "mix"); if (file != null) { this.OutputFile = file; } } async void OnMix(object sender, RoutedEventArgs e) { this.gridMixing.Visibility = Visibility.Visible; if ((this.inputFiles.Count > 0) && (this.outputFile != null)) { try { await this.audioCombiner.CombineFilesAsync( this.outputFile, MediaEncodingProfile.CreateMp3(AudioEncodingQuality.High)); } catch (Exception ex) { await this.DisplayMessageAsync(ex.Message); } } this.gridMixing.Visibility = Visibility.Collapsed; } bool SetProperty<T>( ref T storage, T value, [CallerMemberName] String propertyName = null) { if (object.Equals(storage, value)) return false; storage = value; this?.PropertyChanged(this, new PropertyChangedEventArgs(propertyName)); return true; } async Task DisplayMessageAsync(string message) { var dialog = new MessageDialog(message, "woops"); await dialog.ShowAsync(); } StorageFile outputFile; AudioGraphFileMixer audioCombiner; ObservableCollection<StorageFile> inputFiles; } }
and then the AudioGraphFileMixer class itself ended up looking like this (as an aside, I don’t think it’d be very hard to generalise this much further to take away the ‘file’ specific part of it);
namespace App14 { using System; using System.Collections.Generic; using System.Linq; using System.Threading; using System.Threading.Tasks; using Windows.Media.Audio; using Windows.Media.MediaProperties; using Windows.Media.Render; using Windows.Storage; class AudioGraphFileMixer { public AudioGraphFileMixer() { } public void AddInputFile(StorageFile file) { if (this.inputAudioFiles == null) { this.inputAudioFiles = new List<StorageFile>(); } this.inputAudioFiles.Add(file); } public async Task CombineFilesAsync( StorageFile outputFile, MediaEncodingProfile encodingProfile) { // Create our graph - its lifetime is really just for the 'duration' // of this async method call. var creationResult = await AudioGraph.CreateAsync( new AudioGraphSettings(AudioRenderCategory.Media)); if (creationResult.Status != AudioGraphCreationStatus.Success) { throw new Exception("Failed to create graph"); } this.audioGraph = creationResult.Graph; // go through each input file in our list and create an audio graph // file input node for it building up an enumeration of tasks that // we can 'wait' for. var inputFileTasks = this.inputAudioFiles.Select( file => this.audioGraph.CreateFileInputNodeAsync(file).AsTask()); // wait for all of those to complete. var results = await Task.WhenAll(inputFileTasks); if (results.Any(r => r.Status != AudioFileNodeCreationStatus.Success)) { throw new Exception("Failed to create one or more input file nodes"); } // the node that's going to mix all those together for us. var submixNode = this.audioGraph.CreateSubmixNode(); foreach (var result in results) { // connect the input into the submix result.FileInputNode.AddOutgoingConnection(submixNode); // we need to know when the inputs complete. result.FileInputNode.FileCompleted += OnFileInputCompleted; } //// Bit of a hack here - I find that if the output file has content //// then creating a file output node over it seems to fail. hence //// truncate it. //using (var stream = await outputFile.OpenAsync(FileAccessMode.ReadWrite)) //{ // stream.Size = 0; // await stream.FlushAsync(); //} // the output file node. var outputNode = await this.audioGraph.CreateFileOutputNodeAsync( outputFile, encodingProfile); if (outputNode.Status != AudioFileNodeCreationStatus.Success) { throw new Exception("Failed to create output node"); } // the submix sends its output to the output file. submixNode.AddOutgoingConnection(outputNode.FileOutputNode); // our means of 'communicating' with the OnFileInputCompleted handler // which will decrement this count as files complete. this.remainingFiles = this.inputAudioFiles.Count; this.audioCompleted = new TaskCompletionSource<bool>(); // set the graph going, pulling from the input, mixing and sending // to the output. this.audioGraph.Start(); // wait for all the inputs to complete. await this.audioCompleted.Task; this.audioGraph.Stop(); await outputNode.FileOutputNode.FinalizeAsync(); // I'm a bit unsure on what needs disposing here but the output node // can't be disposed at this point - it throws whereas all these // input nodes and the graph itself can be disposed. foreach (var foo in inputFileTasks) { foo.Result.FileInputNode.Dispose(); } this.audioGraph.Dispose(); this.audioGraph = null; } void OnFileInputCompleted(AudioFileInputNode sender, object args) { if (Interlocked.Decrement(ref this.remainingFiles) == 0) { this.audioCompleted.SetResult(true); } } TaskCompletionSource<bool> audioCompleted; int remainingFiles; AudioGraph audioGraph; List<StorageFile> inputAudioFiles; } }
and the only remaining piece is the little class that I sometimes use to make the file pickers in UWP take up less code;
namespace App14 { using System; using System.Threading.Tasks; using Windows.Storage; using Windows.Storage.Pickers; static class FileDialogExtensions { public static async Task<StorageFile> PickFileForReadAsync( params string[] fileExtensions) { var picker = new FileOpenPicker(); picker.SuggestedStartLocation = PickerLocationId.Desktop; foreach (var fileExtension in fileExtensions) { picker.FileTypeFilter.Add(fileExtension); } var file = await picker.PickSingleFileAsync(); return (file); } public static async Task<StorageFile> PickFileForSaveAsync( string typeOfFile, string typeOfFileExtension, string suggestedName) { var picker = new FileSavePicker(); picker.FileTypeChoices.Add( typeOfFile, new string[] { typeOfFileExtension }); picker.SuggestedFileName = suggestedName; picker.SuggestedStartLocation = PickerLocationId.Desktop; var file = await picker.PickSaveFileAsync(); return (file); } } }
and that seems to work quite nicely in terms of mixing together audio files (in so much as I’ve tried it to date) so, if it’s of use to you, feel free to take it and build on it.
As the code comments say – one of the areas where I felt a little unsure here is around the disposing of some of those objects – I need to dig a little deeper there.