It seems that ābotsā are āhotā in the sense that the topicās really attracting a lot of attention and if I had to pick out one great article that I read in this area then Iād say that it was thisĀ one on āConversational Commerceā by Uberās Chris Messina which really brought it home to me – I think thatās a really good read although I was late in coming to it 
You can read about some of Microsoftās plans in the area of bots here [Bot Framework] and here [Cortana] and there were quite a few sessions at //build that relate to this area too.
The rest of this post does not relate to anything that Microsoft does or makes, itās moreĀ my ownĀ brain dump from trying to think through some of the pieces that might be part of a platform that provides an ability to have these types of conversations.
Iāve been thinking on this topic of ābotsā for a little while and I wanted to;
- start to try and get my thoughts in order and write them up so that I can come back to them and refine them
- have a framework that I can use to look at particular platforms for ābotsā and try to evaluate whether that platform covers some of the areas that Iād expect a ābotā platform to address.
Beyond here, Iām going to use the term āAgentā rather than āBotā to avoid getting tied up in any particular implementation or anything like that.
Once again, it’s just a brain dump and a pretty sketchy one but you have to start somewhere 
Conversations
Weāve been having conversations of different forms with software Agents for the longest time. You could argue that when I do something like;

then Iām having a āconversationā with the command prompt.
I āsayā something and it āsaysā something back. Itās a short conversation. Itās not a very natural conversation but, nonetheless, itās a form of conversation.
It also doesnāt offer much in the way of choice around the input/output mechanisms here. As far as I know, I canāt speak to the command prompt and it doesnāt speak back although it may have accessibility capabilities that I’m unaware of.
At a more advanced level, I can have a conversation with an Agent on one of my devices today and I can actually say something like;
- āAgent, can you play the song [SONG TITLE] for me?ā
- āDid you mean this one or that one?ā
- āThe second oneā
- āOk, playing itā
This one definitely feels a bit more āconversationalā and an Agent that accepts speech like this usually accepts at least typing as another possible input mechanism and displaying on a screen as another possible output mechanism.
Implicit in there is the idea that the Agent that Iām speaking to knows of some kind of app or service that can actually find music and get it played for me and itās debatable as to whether that app/service does or doesnāt display a graphical interface as maybe sometimes it should and sometimes it shouldnāt depending on the context.
Whatās interesting though would be that if I then continued the conversation with something like;
- āAgent, remember that song you played for me just before lunch? Play it againā
then I donāt know whether there are any platforms out there today that can handle even a simple conversation like that and the notion that conversations might last a while and might have related history.
The context has been lost at that point and we have to start again and it feels like even the simplest elements of human conversations are going to need quite a lot of work if theyāre going to be brought to a āconversation platformā and, naturally, this will be done incrementally with ever growing value along the way.
With that in mind, I was trying to think of some of the pieces that might make up that kind of platform and mostly so that I can come back to them at a later point. Some of these pieces Iāve seen referred to in other articles, videos, etc. and others Iām perhaps conjuring out of the air.
Conversational Pieces
I scratched my head for quite a while and this list dropped out of some of the pieces that might be involved in a conversational platform when thinking of conversations in a broad sense;
- The Conversational Host or Canvas
- The Agent
- Discoverability
- Inputs
- Outputs
- Initiation
- Identity
- Dialogs
- Language Processing
- User Knowledge
- Trust
- Context
- Termination
- Services
- Decisions
- Telemetry
- History
That list isn’t ordered in any fashion.
I did a quick sketch below and youāll soon realise from the number of arrows on the diagram that I havenāt reached any kind of clarity on this just yet and am still fumbling a bit and āmaking it upā
but, again, itās a starting point that can be refined.

The Conversational Host or Canvas
This feels like a very broad term but it seems that thereās a need for āsomethingā to host the conversation and it might be something like an app that hosts a voice call or an SMS conversation. It might be a chat app or an email client. It might be a part of an operating system āshellā.
Itās the āhostā of the conversation and, naturally, I might want to move from one host to another and have a conversation follow me which almost certainly comes with a set of technical challenges.
Some conversational hosts might serve a specific purpose. For example, a device on a kitchen table that is hard-wired to play music.
Others might broker between many agents ā for example a chat application that can both book train tickets and return flight information.
It seems to me that itās likely that the Canvas will control the modes of input and output, perhaps offering some subset of those available on the device that itās running on and it also seems to me that itās unlikely that developers will want to build for every Canvas out there and so, over time, perhaps some canvases will be specifically targeted whereas others might somehow be treated generically.
The Agent
The Agent is the piece of software that the user is having the conversation with through the Canvas in question. The Canvas and the Agent might sometimes be one and the same thing and/or might be produced by the same company but I guess that in the general case the Canvas (e.g. an IM chat window) could be separate from the Agent (e.g. a travel Agent) which might itself rely on separate Services (e.g. a weather service, a train booking service, a plane booking service) in order to get work done.
Discoverability
How does the user discover that a particular (complex) Canvas has an Agent available and, beyond that, how do they discover what that Agent can do?
Itās the age-old āClippyā style problem. A Canvas (e.g. a chat app) can broker conversations with N Agents but the user doesnāt know that and we see this today with personal assistants offering menus via āTell me what I can sayā type commands.
It seems to me that thereās a general need for discovery and it might involve things like;
- Reading the instructions that came with the Canvas
- Asking the Canvas before…
- Asking the Agent.
- Looking up services in a directory.
- Being prompted by the Canvas (hopefully with some level of intelligence) when an appropriate moment seems to arrive ā e.g. ādid you know that I can book tickets for you?ā.
and no doubt more but you need to know that you can start a conversation before you start a conversation 
Inputs/Outputs
Thereās lots of ways to converse. We can do it by voice, by typing, by SMS. We might even stretch to include things like pointing with gamepads or waving our arms to dismiss dialogs but maybe that’s pushing things a bit far.
Equally, thereās many ways to display outputs and a lot of this is going to depend on the Canvas and device in question.
For example, if I have an Agent that knows how to find photos. I might input;
āAgent, go get me some high quality, large size images of people enjoying breakfastā
What should the output be? Maybe a set of hyperlinks? Maybe open an imaging app and display the photos themselves ready for copy/paste? Maybe offer to send me an email with all the details in so I can read it later?
Iād argue that it depends on the Canvas, the device and what Iām currently doing. If Iām walking down the street then the email option might be a good one. If Iām sitting at my PC then maybe open up an app and show me the results.
I suspect that this might get complex over time but I/O options seem to be a big part of trying to have a conversation.
Initiation
How to start a conversation?
At what point does a conversation with an Agent begin in the sense that the Agent tracks the flow of interactions back and forth such that it can build up Context and start to offer some kind of usefulĀ function?
Most Agents support some kind of āHey, how are you?ā type interaction but thatās not really the conversation opener, it perhaps comes more at the point where someone says āI need a trainā or āI need a ticketā or similar.
Conversations are stateful and could potentially span across many devices and Canvases and so thereās going to need to be some kind of conversational identifier that can be (re)presented to the agent at a later point in time. The analogy in the human world would be something like;
Remember when we were talking about that holiday in Spain last week?
and, no doubt, if weāre to make conversations work in the virtual world then there is likely to be an equivalent.
Identity
An identifier for a conversation is one thing but itās pretty much useless without a notion of the user who was involved in the conversation.
Youād imagine that this is perhaps one of the things that a Canvas can do for a user ā e.g. an IM Canvas has presumably already authenticated the user so it might be able to provide some kind of token representing that identify to an Agent such that the Agent can know the differences between conversations with Mike and conversations with Michelle.
If a conversation then moves from one Canvas to another then the Agent has to be able to understand whatever identity token might come from the second Canvas as well.
I suspect that this is a roundabout way of saying that it feels to me like identity is going to be an important piece in a platform that does conversations.
Context
Iām obsessed with context
and I guess that a conversation with an Agent is, in some ways, about building up the context to the point where some ātransactionā can be completed.
That context needs to be associated with the conversation and with the identity of the user and perhaps needs to have some kind of associated lifetime such that it doesnāt stay around for ever in a situation where a conversation starts but never completes.
Thereās then the question of whether that content can be;
- pre-loaded with some of the knowledge that either the Agent or the Canvas has about the user.
- used to add to the knowledge that either the agent or the Canvas keeps about the user after the conversation.
For example ā if a user has a conversation with an Agent about a train journey then part of the context might be the start/end locations.
If one of those locations turns out to be the userās home then that might become part of the future knowledge that an Agent or a Canvas has about the user such that in the future it can be smarter. Naturally, that needs to remain within the userās control in terms of the consent around where it might be used and/or shared.
User Knowledge
Iām unsure whether knowledge about a user lives with an Agent, with a Canvas, with a Service or with all of them and I suspect itās perhaps the latter ā i.e. all of them.
No doubt, this is related to Identity, Context and Trust in the sense that if I use some Canvas on a regular basis (like a favourite chat app) and if it comes from a vendor that IĀ trust then I might be prepared to share more personal data with that Canvas than I do perhaps with a an Agent which does (e.g.) concert-ticket bookings and which I only use once every 1-2 years.
The sort of knowledge that Iām thinking of here stems from personal information like age, date-of-birth, gender, height, weight, etc. through into locations like home, work and so on and then perhaps also spans into things like friends/family.
You can imagine scenarios like;
āHey, can you ask the train ticketing service to get me a ticket from home to my brotherās place early in the morning a week on Saturday and drop him an SMS to tell him Iām coming?ā
and a Canvas (or Agent) that can use knowledge about the user to plug in all the gaps around the terms āhomeā and ābrotherā in order to work out the locations and phone numbers is a useful thing 
Now, whether itās the Canvas that turns these terms into hard data or whether itās the Agent that does it, Iām not sure.
Trust
Trust is key here. As a user, I donāt want to have a conversation with an Agent that is then keeping or sharing data that I didnāt consent to but, equally, conversations that are constantly interrupted by privacy settings arenāt likely to progress well.
In a conversation between User<->Canvas<->Agent<->Service itās not always going to be easy to know where the trust boundaries are being drawn or stretched and perhaps it becomes the responsibility of the Canvas/Agent to let the user know whatās going on as knowledge is disseminated? For example, in a simple scenario of;
āHey, can you buy me train tickets from home to work next Tuesday?ā
thereās a question around whether the Agent needs to prompt if itās not aware of what āhomeā and āworkā might mean and doesnāt have a means to purchase the ticket.
Also, does the Canvas attempt to help in fleshing out those definitions of āhomeā and āworkā and does it do with/without the userās explicit consent?
Dialogs
It feels like a conversational platform needs to have the ability to define dialogs for how a conversation is supposed to flow between the user and the agent.
I suspect that it probably shouldnāt be a developer who defines what the structure and the content of these dialogs should be.
I also suspect that they shouldnāt really be hard-wired into some piece of code somewhere but should, somehow, be open to being created and revised by someone who has a deep understanding of the business domain and who can then tweak the dialogs based on usage.
That would imply a need for some sort of telemetry to be captured which lets that Agent be tweaked in response to how users are actually interacting with it.
Part of defining dialogs might tie in with inputs and outputs in that you might define different sets of dialogs depending on the input/output capabilities of the Canvas thatās hosting the conversation with the Agent. Itās common to use different techniques when using speech input/output versus (say) textual input/output and so dialogs would presumably need to cater for those types of differences.
Another part of defining dialogs might be to plug in decision logic around how dialogs flow based on input from the user and responses from services.
Language Understanding and Intent
One of the challenges of defining those dialogs is trying to cater for all the possible language variations in the ways in which a user might frame or phrase a particular question or statement. There are so many ways of achieving the same result that itās practically impossible to define dialogs that cater for everything. For example;
- āI want to book a taxiā
- āI need a lift to catch my planeā
- āCan I get a car to the airportā
are simple variations of possibly the very same thing and so it feels to me like thereās a very definite need here for a service which can take all of these variations and turn them into more of a canonical representation which can report the intent thatās common across all three of them.
Without that, I think all developers are going to be building their own variant of that particular wheel and it’s not an easy wheel to build.
Termination
Just for completeness, if there is an āinitiationā step to a conversation with an Agent then I guess there should be a point at which a conversation is āoverā whether that be due to a time-out or whether it be that the user explicitly states that they are done.
I can see a future scenario analogous to clearing out your cookies in the browser today where you want to make sure that whatever youāve been conversing about with some Agent via some Canvas has truly gone away.
Services
An Agent is a representative for some collection of Services. These might be a bunch of RESTful services or similar and itās easy to think of some kind of travel Agent that provides a more natural and interactive front end to a set of existing back-end services for booking hotels, flights, trains, ferries, etc. and looking up timetables.
A platform for conversations would probably want to make it as easy as possible to call services, bring back their data and surface it into decision-logic.
Decisions
Sticking with decisions – thereās likely to be a need for making decisions in all but the simplest of conversations and those decisions might well steer the conversational flow in terms of the ādialogsā that are presented to the user.
Those decisions might be based on the userās input, the responses that come back from services invoked by the Agent or might be based on User Knowledge or some ambient context like the current date and time, weather, trafficĀ or similar.
Some of that decision making might be influenced by use of the Agent itself ā e.g. if the Agent uses telemetry to figure out that 95% of all users go with the default options across step 1 to step 5 of a conversation then maybe it can start to adapt and offer the user shortcuts based on that knowledge?
Telemetry
Iād expect an Agent to be gathering telemetry as it was progressing such that aggregate data was available across areas like;
- Agent usage ā i.e. where traffic is coming from, how long conversations last, etc.
- Dialog flow ā which paths through the Agentās capabilities are āhotā in the sense of followed by most users.
- Dialog blockers ā points where conversations are consistently abandoned.
- Choices ā the sorts of options that users are choosing as they navigate the Agent.
- Failures ā places where an agent isnāt understanding the userās intent.
I’m sure that thereās probably a lot more telemetry that an Agent would gather – it’s definitely an important part of the picture.
History
It’s common to refer to a previous conversation from a current one and I think that over time a conversation platform needs to think about this as it’s pretty common in the real world to refer to conversations that happened at some earlier point in time including perhaps reaching back months or years.
That needs to fit with Trust but I think it would add a lot of value to an Agent to be able to cope with the idea of something like;
“I need to re-order the same lightbulbs that I ordered from you six months ago”
or similar. Whether that needs to be done by the Agent “remembering” the conversation or whether it needs to be done by one of its supporting Services taking on that responsibility, I’m not sure.
Done
That’s my initial ramble over. I need to go away and think about this some more. Please feel free to feedback as these are just my rough notes but there’s a few things in here that I think are probably going to stick with me as I think more on this topic of conversations…