This post is a direct follow-on from this earlier post which I wrote just 2 weeks ago about how ‘Project Oxford’ has preview APIs that can be used to verify that a piece of recorded speech belongs to a particular (pre-registered) speaker.
Not long after that post, I noticed a few news articles around one of the major UK high street banks starting to adopt voice identification technology as part of their customer login process. I’ve linked to one of the articles that I spotted on the BBC website below;
Just to be clear, I’ve no idea what sort of technology is in use by HSBC but it was very topical for me as I’d just recently been experimenting with those preview ‘Oxford’ APIs and trying out my own version of a similar thing and so it sparked my interest.
With that in mind, I put together this little follow-on sample below which is meant to show;
- The idea of registering for voice verification by repeating a phrase 3 times over and submitting to the cloud such that a voice profile can be built.
- The idea of then verifying a voice sample with the cloud to log in to a system.
This demo code uses both ‘Project Oxford’s preview APIs to do the speech verification but it also uses Windows 10 UWP speech APIs on the client side in order to do the work of handling the simple commands required to navigate through the process.
Clearly, the navigation here is a little clunky but you hopefully get the idea as to how it might work. Naturally, a bit more thought might need to be given around how a mechanism like this tried to avoid spoofing with recorded voices and so on but this is just some simple demo code rather than an actual banking system 🙂
I’d flag that the association between the user’s account number and the GUID that identifies their online voice profile is being stored locally in the app’s storage whereas, of course, in the real world you’d store this in the cloud somewhere.