“The Future’s So Bright, I Gotta Wear…Glasses?”

With reference to Timbuk3 and, I know, it’s a cheap line but it popped into my head the other day and won’t go away so maybe writing it down will help.

“The Future’s So Bright” by Timbuk3

I’ve been wearing glasses for about 35 years, often swapping them for contact lenses but it’s only in recent years when I’ve been watching what’s happening with mobile computing that I’ve come to the full realisation that we’re on a path where perhaps everyone is likely to join me and we’ll all end up wearing glasses somewhere down the line.

We’re headed to a level playing field of “eyewear-for-everyone” (TM) or maybe it’s eyeware-for-everyone.

As an aside, I thought I just made up the term ‘eyeware‘ then Googled it to see and, of course, there’s already a Swiss company there with that name doing some type of 3D eye tracking – it’s hard to come up with anything original these days!

Recent (and long term) rumours and actual products suggest that, one way or another, all the big players are working on some form of ‘glasses’;

and Google has had something out there for a long time and, of course, there are also a tonne of other players (like Magic Leap, North, realwear, etc.) out there too.

Naturally, you might argue that Microsoft has been blazing a trail here in shipping devices and services in this space for some years and far be it from me to stop you making that argument šŸ˜‰

I saw this YouTube session go by me on a timeline the other day which presents some analysis of the today/tomorrow state of the ‘glasses’ market and talks through some of the motivations of the players involved. It’s not Earth-shattering but I thought it was a good watch;

Naturally, not every rumour will come to fruition and, clearly, not all ‘glasses’ are made equal but, regardless, one version of the future promises to deliver glasses-for-all.

Glasses will become the great leveller as we all have to remember to clean them, not to sit on them and find them when we’ve misplaced them.

The only division is likely to be that some of us will still pay more than others to put prescription lenses into ours or maybe the future will solve that problem too šŸ™‚

Blog.Pause() – 15 Years, Time for a Breather :-)

I’ve realised recently that I’m not posting to this blog at the frequency that I have done in the past and so this post is just to flag that I’m aware of it and I’m pausing my posts over the Summer in order to regroup.

I started posting to ‘a blog’ back in 2003 when I was working in developer consultancy for Microsoft and I was finding that there were things that I was doing in my day job that I thought it would be useful to share and also some ‘spare time’ projects that I needed a home for. Blogs at the time were coming into vogue Winking smile with RSS Readers showing up everywhere. I think I even wrote a few of my own at the time.

The blogging activity wasn’t part of my job and the company took no interest in what I was writing.

In the 15 years that seem to have magically gone by, I’ve published around 150 posts per year as the website has been migrated from MoveableType to CommunityServer and on to WordPress where it lives today minus some tags and formatting that got messed up along the way.

During that time, I also migrated from Microsoft’s developer consultancy team into the developer evangelism group and I’ve been involved with and have written about a much broader set of developer technologies than I’d ever really expected to when I started posting in 2003. Those technologies have been centred around .NET but have branched out into many other places.

I had a look at the posts on the site and along the way I’ve written about C++, C#, COM, .NET, Tablet PCs, IIS, SQL, BizTalk Server, WSE, ClickOnce, ASMX, ASP.NET, Windows, Windows Phone, WPF, WCF, WF, Visual Studio, Expression Blend, Debugging, Silverlight, LINQ, LINQ to SQL, LINQ to XML, Entity Framework, ADO.NET Data Services & OData, MEF, Reactive Extensions, WinJS, JavaScript, UWP, WinRT, Xamarin, Azure Mobile Services, RealSense, HoloLens, Mixed Reality and perhaps a few more šŸ™‚

My approach has always been to keep hands-on and take a code-first path to learning new things and I’ve always tried to learn in public publishing posts along the way which I hoped might help others who might follow a similar path at a later point even if they weren’t necessarily perfect or complete.

When I joined the evangelism team, its activity was mainly around in-person developer events and so a blog was a ‘nice to have’ but over time the activity of the team shifted to more of an online model and I found that my blog started to intersect much more with the day job.

That intersection followed something of an arc where the level of interest that my job had in my blogging ramped from pretty much zero through to a period where there was much more focus on it and my management at the time wanted to steer what I was posting about (I avoided that) and to move my hosting to MSDN (also avoided) and to add analytics for their charts and reports (I gave in on that one :-)). At its peak, I can remember having some slightly warm discussions about who owned my website but that didn’t last very long and the focus soon faded away as has the developer evangelism group itself.

Today, I’ve gone full circle in that I’m in a different job role and am mostly back to where I was in 2003 in having a site that’s largely a hobby and is related to  my ‘day job’ only by way of it providing some topics for me to think and write about.

So, it feels like a good time to pause & see whether I can come up with some themes that might set me onto another 15 years of posting Winking smile

In the meantime, a big thanks to everyone who’s been reading these pieces along the way & who has provided feedback via the site or by mail – it’s much appreciated! Smile

Sketchy Thoughts on Bots/Agents/Conversations

It seems that ā€˜bots’ are ā€˜hot’ in the sense that the topic’s really attracting a lot of attention and if I had to pick out one great article that I read in this area then I’d say that it was thisĀ one on ā€œConversational Commerceā€ by Uber’s Chris Messina which really brought it home to me – I think that’s a really good read although I was late in coming to it Smile

You can read about some of Microsoft’s plans in the area of bots here [Bot Framework] and here [Cortana] and there were quite a few sessions at //build that relate to this area too.

The rest of this post does not relate to anything that Microsoft does or makes, it’s moreĀ my ownĀ brain dump from trying to think through some of the pieces that might be part of a platform that provides an ability to have these types of conversations.

I’ve been thinking on this topic of ā€˜bots’ for a little while and I wanted to;

  • start to try and get my thoughts in order and write them up so that I can come back to them and refine them
  • have a framework that I can use to look at particular platforms for ā€˜bots’ and try to evaluate whether that platform covers some of the areas that I’d expect a ā€˜bot’ platform to address.

Beyond here, I’m going to use the term ā€˜Agent’ rather than ā€˜Bot’ to avoid getting tied up in any particular implementation or anything like that.

Once again, it’s just a brain dump and a pretty sketchy one but you have to start somewhere Smile

Conversations

We’ve been having conversations of different forms with software Agents for the longest time. You could argue that when I do something like;

image

then I’m having a ā€˜conversation’ with the command prompt.

I ā€œsayā€ something and it ā€œsaysā€ something back. It’s a short conversation. It’s not a very natural conversation but, nonetheless, it’s a form of conversation.

It also doesn’t offer much in the way of choice around the input/output mechanisms here. As far as I know, I can’t speak to the command prompt and it doesn’t speak back although it may have accessibility capabilities that I’m unaware of.

At a more advanced level, I can have a conversation with an Agent on one of my devices today and I can actually say something like;

    • ā€œAgent, can you play the song [SONG TITLE] for me?ā€
    • ā€œDid you mean this one or that one?ā€
    • ā€œThe second oneā€
    • ā€œOk, playing itā€

This one definitely feels a bit more ā€œconversationalā€ and an Agent that accepts speech like this usually accepts at least typing as another possible input mechanism and displaying on a screen as another possible output mechanism.

Implicit in there is the idea that the Agent that I’m speaking to knows of some kind of app or service that can actually find music and get it played for me and it’s debatable as to whether that app/service does or doesn’t display a graphical interface as maybe sometimes it should and sometimes it shouldn’t depending on the context.

What’s interesting though would be that if I then continued the conversation with something like;

    • ā€œAgent, remember that song you played for me just before lunch? Play it againā€

then I don’t know whether there are any platforms out there today that can handle even a simple conversation like that and the notion that conversations might last a while and might have related history.

The context has been lost at that point and we have to start again and it feels like even the simplest elements of human conversations are going to need quite a lot of work if they’re going to be brought to a ā€˜conversation platform’ and, naturally, this will be done incrementally with ever growing value along the way.

With that in mind, I was trying to think of some of the pieces that might make up that kind of platform and mostly so that I can come back to them at a later point. Some of these pieces I’ve seen referred to in other articles, videos, etc. and others I’m perhaps conjuring out of the air.

Conversational Pieces

I scratched my head for quite a while and this list dropped out of some of the pieces that might be involved in a conversational platform when thinking of conversations in a broad sense;

  • The Conversational Host or Canvas
  • The Agent
  • Discoverability
  • Inputs
  • Outputs
  • Initiation
  • Identity
  • Dialogs
  • Language Processing
  • User Knowledge
  • Trust
  • Context
  • Termination
  • Services
  • Decisions
  • Telemetry
  • History

That list isn’t ordered in any fashion.

I did a quick sketch below and you’ll soon realise from the number of arrows on the diagram that I haven’t reached any kind of clarity on this just yet and am still fumbling a bit and ā€˜making it up’ Smile but, again, it’s a starting point that can be refined.

foo

The Conversational Host or Canvas

This feels like a very broad term but it seems that there’s a need for ā€œsomethingā€ to host the conversation and it might be something like an app that hosts a voice call or an SMS conversation. It might be a chat app or an email client. It might be a part of an operating system ā€œshellā€.

It’s the ā€œhostā€ of the conversation and, naturally, I might want to move from one host to another and have a conversation follow me which almost certainly comes with a set of technical challenges.

Some conversational hosts might serve a specific purpose. For example, a device on a kitchen table that is hard-wired to play music.

Others might broker between many agents – for example a chat application that can both book train tickets and return flight information.

It seems to me that it’s likely that the Canvas will control the modes of input and output, perhaps offering some subset of those available on the device that it’s running on and it also seems to me that it’s unlikely that developers will want to build for every Canvas out there and so, over time, perhaps some canvases will be specifically targeted whereas others might somehow be treated generically.

The Agent

The Agent is the piece of software that the user is having the conversation with through the Canvas in question. The Canvas and the Agent might sometimes be one and the same thing and/or might be produced by the same company but I guess that in the general case the Canvas (e.g. an IM chat window) could be separate from the Agent (e.g. a travel Agent) which might itself rely on separate Services (e.g. a weather service, a train booking service, a plane booking service) in order to get work done.

Discoverability

How does the user discover that a particular (complex) Canvas has an Agent available and, beyond that, how do they discover what that Agent can do?

It’s the age-old ā€˜Clippy’ style problem. A Canvas (e.g. a chat app) can broker conversations with N Agents but the user doesn’t know that and we see this today with personal assistants offering menus via ā€œTell me what I can sayā€ type commands.

It seems to me that there’s a general need for discovery and it might involve things like;

  1. Reading the instructions that came with the Canvas
  2. Asking the Canvas before…
  3. Asking the Agent.
  4. Looking up services in a directory.
  5. Being prompted by the Canvas (hopefully with some level of intelligence) when an appropriate moment seems to arrive – e.g. ā€œdid you know that I can book tickets for you?ā€.

and no doubt more but you need to know that you can start a conversation before you start a conversation Smile

Inputs/Outputs

There’s lots of ways to converse. We can do it by voice, by typing, by SMS. We might even stretch to include things like pointing with gamepads or waving our arms to dismiss dialogs but maybe that’s pushing things a bit far.

Equally, there’s many ways to display outputs and a lot of this is going to depend on the Canvas and device in question.

For example, if I have an Agent that knows how to find photos. I might input;

ā€œAgent, go get me some high quality, large size images of people enjoying breakfastā€

What should the output be? Maybe a set of hyperlinks? Maybe open an imaging app and display the photos themselves ready for copy/paste? Maybe offer to send me an email with all the details in so I can read it later?

I’d argue that it depends on the Canvas, the device and what I’m currently doing. If I’m walking down the street then the email option might be a good one. If I’m sitting at my PC then maybe open up an app and show me the results.

I suspect that this might get complex over time but I/O options seem to be a big part of trying to have a conversation.

Initiation

How to start a conversation?

At what point does a conversation with an Agent begin in the sense that the Agent tracks the flow of interactions back and forth such that it can build up Context and start to offer some kind of usefulĀ function?

Most Agents support some kind of ā€œHey, how are you?ā€ type interaction but that’s not really the conversation opener, it perhaps comes more at the point where someone says ā€œI need a trainā€ or ā€œI need a ticketā€ or similar.

Conversations are stateful and could potentially span across many devices and Canvases and so there’s going to need to be some kind of conversational identifier that can be (re)presented to the agent at a later point in time. The analogy in the human world would be something like;

Remember when we were talking about that holiday in Spain last week?

and, no doubt, if we’re to make conversations work in the virtual world then there is likely to be an equivalent.

Identity

An identifier for a conversation is one thing but it’s pretty much useless without a notion of the user who was involved in the conversation.

You’d imagine that this is perhaps one of the things that a Canvas can do for a user – e.g. an IM Canvas has presumably already authenticated the user so it might be able to provide some kind of token representing that identify to an Agent such that the Agent can know the differences between conversations with Mike and conversations with Michelle.

If a conversation then moves from one Canvas to another then the Agent has to be able to understand whatever identity token might come from the second Canvas as well.

I suspect that this is a roundabout way of saying that it feels to me like identity is going to be an important piece in a platform that does conversations.

Context

I’m obsessed with context Smile and I guess that a conversation with an Agent is, in some ways, about building up the context to the point where some ā€˜transaction’ can be completed.

That context needs to be associated with the conversation and with the identity of the user and perhaps needs to have some kind of associated lifetime such that it doesn’t stay around for ever in a situation where a conversation starts but never completes.

There’s then the question of whether that content can be;

  • pre-loaded with some of the knowledge that either the Agent or the Canvas has about the user.
  • used to add to the knowledge that either the agent or the Canvas keeps about the user after the conversation.

For example – if a user has a conversation with an Agent about a train journey then part of the context might be the start/end locations.

If one of those locations turns out to be the user’s home then that might become part of the future knowledge that an Agent or a Canvas has about the user such that in the future it can be smarter. Naturally, that needs to remain within the user’s control in terms of the consent around where it might be used and/or shared.

User Knowledge

I’m unsure whether knowledge about a user lives with an Agent, with a Canvas, with a Service or with all of them and I suspect it’s perhaps the latter – i.e. all of them.

No doubt, this is related to Identity, Context and Trust in the sense that if I use some Canvas on a regular basis (like a favourite chat app) and if it comes from a vendor that IĀ trust then I might be prepared to share more personal data with that Canvas than I do perhaps with a an Agent which does (e.g.) concert-ticket bookings and which I only use once every 1-2 years.

The sort of knowledge that I’m thinking of here stems from personal information like age, date-of-birth, gender, height, weight, etc. through into locations like home, work and so on and then perhaps also spans into things like friends/family.

You can imagine scenarios like;

ā€œHey, can you ask the train ticketing service to get me a ticket from home to my brother’s place early in the morning a week on Saturday and drop him an SMS to tell him I’m coming?ā€

and a Canvas (or Agent) that can use knowledge about the user to plug in all the gaps around the terms ā€˜home’ and ā€˜brother’ in order to work out the locations and phone numbers is a useful thing Smile

Now, whether it’s the Canvas that turns these terms into hard data or whether it’s the Agent that does it, I’m not sure.

Trust

Trust is key here. As a user, I don’t want to have a conversation with an Agent that is then keeping or sharing data that I didn’t consent to but, equally, conversations that are constantly interrupted by privacy settings aren’t likely to progress well.

In a conversation between User<->Canvas<->Agent<->Service it’s not always going to be easy to know where the trust boundaries are being drawn or stretched and perhaps it becomes the responsibility of the Canvas/Agent to let the user know what’s going on as knowledge is disseminated? For example, in a simple scenario of;

ā€œHey, can you buy me train tickets from home to work next Tuesday?ā€

there’s a question around whether the Agent needs to prompt if it’s not aware of what ā€˜home’ and ā€˜work’ might mean and doesn’t have a means to purchase the ticket.

Also, does the Canvas attempt to help in fleshing out those definitions of ā€˜home’ and ā€˜work’ and does it do with/without the user’s explicit consent?

Dialogs

It feels like a conversational platform needs to have the ability to define dialogs for how a conversation is supposed to flow between the user and the agent.

I suspect that it probably shouldn’t be a developer who defines what the structure and the content of these dialogs should be.

I also suspect that they shouldn’t really be hard-wired into some piece of code somewhere but should, somehow, be open to being created and revised by someone who has a deep understanding of the business domain and who can then tweak the dialogs based on usage.

That would imply a need for some sort of telemetry to be captured which lets that Agent be tweaked in response to how users are actually interacting with it.

Part of defining dialogs might tie in with inputs and outputs in that you might define different sets of dialogs depending on the input/output capabilities of the Canvas that’s hosting the conversation with the Agent. It’s common to use different techniques when using speech input/output versus (say) textual input/output and so dialogs would presumably need to cater for those types of differences.

Another part of defining dialogs might be to plug in decision logic around how dialogs flow based on input from the user and responses from services.

Language Understanding and Intent

One of the challenges of defining those dialogs is trying to cater for all the possible language variations in the ways in which a user might frame or phrase a particular question or statement. There are so many ways of achieving the same result that it’s practically impossible to define dialogs that cater for everything. For example;

  • ā€œI want to book a taxiā€
  • ā€œI need a lift to catch my planeā€
  • ā€œCan I get a car to the airportā€

are simple variations of possibly the very same thing and so it feels to me like there’s a very definite need here for a service which can take all of these variations and turn them into more of a canonical representation which can report the intent that’s common across all three of them.

Without that, I think all developers are going to be building their own variant of that particular wheel and it’s not an easy wheel to build.

Termination

Just for completeness, if there is an ā€œinitiationā€ step to a conversation with an Agent then I guess there should be a point at which a conversation is ā€œoverā€ whether that be due to a time-out or whether it be that the user explicitly states that they are done.

I can see a future scenario analogous to clearing out your cookies in the browser today where you want to make sure that whatever you’ve been conversing about with some Agent via some Canvas has truly gone away.

Services

An Agent is a representative for some collection of Services. These might be a bunch of RESTful services or similar and it’s easy to think of some kind of travel Agent that provides a more natural and interactive front end to a set of existing back-end services for booking hotels, flights, trains, ferries, etc. and looking up timetables.

A platform for conversations would probably want to make it as easy as possible to call services, bring back their data and surface it into decision-logic.

Decisions

Sticking with decisions – there’s likely to be a need for making decisions in all but the simplest of conversations and those decisions might well steer the conversational flow in terms of the ā€˜dialogs’ that are presented to the user.

Those decisions might be based on the user’s input, the responses that come back from services invoked by the Agent or might be based on User Knowledge or some ambient context like the current date and time, weather, trafficĀ or similar.

Some of that decision making might be influenced by use of the Agent itself – e.g. if the Agent uses telemetry to figure out that 95% of all users go with the default options across step 1 to step 5 of a conversation then maybe it can start to adapt and offer the user shortcuts based on that knowledge?

Telemetry

I’d expect an Agent to be gathering telemetry as it was progressing such that aggregate data was available across areas like;

  • Agent usage – i.e. where traffic is coming from, how long conversations last, etc.
  • Dialog flow – which paths through the Agent’s capabilities are ā€˜hot’ in the sense of followed by most users.
  • Dialog blockers – points where conversations are consistently abandoned.
  • Choices – the sorts of options that users are choosing as they navigate the Agent.
  • Failures – places where an agent isn’t understanding the user’s intent.

I’m sure that there’s probably a lot more telemetry that an Agent would gather – it’s definitely an important part of the picture.

History

It’s common to refer to a previous conversation from a current one and I think that over time a conversation platform needs to think about this as it’s pretty common in the real world to refer to conversations that happened at some earlier point in time including perhaps reaching back months or years.

That needs to fit with Trust but I think it would add a lot of value to an Agent to be able to cope with the idea of something like;

“I need to re-order the same lightbulbs that I ordered from you six months ago”

or similar. Whether that needs to be done by the Agent “remembering” the conversation or whether it needs to be done by one of its supporting Services taking on that responsibility, I’m not sure.

Done

That’s my initial ramble over. I need to go away and think about this some more. Please feel free to feedback as these are just my rough notes but there’s a few things in here that I think are probably going to stick with me as I think more on this topic of conversations…