Ah, the early 90’s. Everyone remembers that story. The woman who sued McDonald’s was an example of stupid lawsuits. But read the link. It doesn’t sound like she was greedy and McDonald’s did know it was handing out coffee that was too hot. McDonald’s presented a different story which became embedded in society. Media and truth are fungible concepts.
To understand how the different pieces of building a chatbot fit together, I decided to prototype a Facebook Messenger insurance claim submission chatbot. The set of requirements were:
- Be able to submit an auto accident-related claim (location, vehicle, number of passengers) and be able to check the status of a claim.
- Be able to upload photos of the accident.
- Be able to understand and process when the claimant is submitting multiple pieces of relevant information at once.
- Leverage Facebook Messenger as the consumer touchpoint.
There are three components to chatbots:
- Consumer touch point (social platforms, website, SMS, etc.)
- Middleware which can handle the business logic and be the knowledgebase and database for customer information
- Tools and platforms that do the natural language processing to help the middleware understand input from the consumer.
Let’s discuss the tools and platforms. Like any technology, there are easy-to-use tools that perform limited functions, and there are complex ones that have to ability to provide incredible experiences but require time and resources to build. It’s the difference of using a free website building tool like Wix or Weebly vs. creating a custom website on Sitecore or Drupal. For our prototype, we are going to tie into the IBM Watson platform, which won Jeopardy! in 2011.
There are two Watson services we used: Natural Language Classifier (NLC) and Dialog. The purpose of NLC is to understand the context and intent of the user input. We used NLC to know when a user was trying to submit a claim versus checking the status of a claim. The Dialog service is trained with a general script, which it uses as building blocks to create a logic flow for the conversation. This flow helps Dialog ask clarifying questions (or understand when it has all the information) and gather information it needs to collect to complete the given task. The natural language processing ability of Dialog allows it to parse the user input and extract the required information (location, vehicle, the number of passengers). For all this magic to work, both the NLC and Dialog have to be trained with a set of grammars the user might input since these are domain specific.
Our middleware provides several functions for the prototype. First, it is the interface to Facebook Messenger. The middleware takes in user input from Messenger and decides whether to forward it on to NLC or Dialog. At the beginning of a conversation, user input is received and forwarded to the NLC to process and analyze what task the claimant is attempting to complete. From then on, the middleware talks to the Dialog and Messenger service to facilitate the rest of the conversation.
Second, when the Dialog service is in use, we check to see if the user uploaded photos. If the user did upload a photo, the middleware stores it.
Third, once the claimant has submitted all the necessary data, the middleware stores it. Last, when the claimant is requesting a status update, the middleware creates a Messenger-defined image template, including the claimant submitted photo and information, and sends it to Messenger.
Node.js was our programming language of choice for the middleware. While creating in node.js is not necessary since interactions with Facebook and the IBM Watson services are REST API calls, a majority of documentation provides example code for node.js.
MongoDB was used as the persistence layer since it does not force pre-defined structures which allow for easier prototyping. Mongoose was used to interface with MongoDB.
Webhooks are a way of integrating systems. They usually are configured to call a URL when an event is triggered. When setting up a bot for Facebook Messenger, it requires a webhook URL that will call the middleware with the user input (whether text, photos, or voice) as the payload. The middleware has to be accessible to Messenger at the time of setting up. While not a problem if deploying the chatbot on a cloud infrastructure like Heroku or AWS, but can create problems if the chatbot resides inside of a private network. Then, the best option is to use a service like ngrok which can provide a public interface for the duration of the development cycle.
Like any good webhook callback, Messenger expects a response status for each request. Obviously, this can be problematic while building the chatbot, especially when not all error handling code is in place. Expect Messenger to keep sending those messages repeatedly till they are acknowledged.
Access to Watson APIs is through IBM’s Bluemix platform, which is the largest Cloud Foundry installation. Cloud Foundry CLI tools can be used to set up all the Watson services before use. While we used NLC and Dialog for the prototype, it is worth investigating the full array of Watson APIs. In the future, we plan on integrating the sentiment analysis, tone analyzer, and emotional analyzer to determine how the chatbot can communicate more intelligently with users.
As seen below, there is no shortage of platforms to leverage when building chatbots.
One platform of note is wit.ai. It is owned by Facebook (but not meant to be exclusively used on Facebook) and has an interface for building mock conversations, called Stories, in the browser. Wit creates models from the Stories and tracks user input that can be categorized to provide supervised learning for the AI.
The existing rules on product design apply to building chatbots. Analyzing the intended goal, identifying the parameters of where chatbots can help, prototyping and evaluating, and repeating as necessary.
Gain real consensus on the purpose of the chatbot. Stakeholders need to understand what is possible and the limitations. Scope creep will create complexity at an exponential rate that will be hard to implement and frustrate users. On the flip side, when the chatbot starts interacting with humans, it should make clear the scope of what it can do.
For humans, conversations are natural. It is a process we learned starting at birth. Codifying it enough to a machine is a challenge. Mocking up conversations becomes a critical part of the process. Tools like Twine help map out the flow of communications. The next step is user research. Would users interact the chatbot as mapped? Understanding what users think, feel, and how they react while interacting with the chatbot will provide valuable feedback for gathering the NLP data systems need.
Temporary middleware systems are necessary because of how some customer facing touch points, like Facebook, interact with the middleware. If the chatbot middleware is down for maintenance or upgrades, a user may come in and try to initiate a conversation with a “Hello.” If the user receives no response, they will likely say “Hello” again. When the chatbot comes back online, Facebook will send both “Hello” inputs to the middleware. The chatbot will probably assume the both statements are the start of a conversation and provide duplicate responses. Obviously a terrible user experience. A solution would be to stand up a temporary middleware system that can respond to the user after the initial “Hello” and state the chatbot is unavailable and will return soon. At the same time, the temporary system needs to store the user input so that the chatbot can respond to them appropriately when back online.
Remember to provide an escape route. In the early iterations of a chatbot, it will likely not know what to do in every scenario. Track if the conversation is going sideways and provide alternatives communication methods.
We are early in the consumer facing AI chatbot revolution. There are opportunities in enhancing existing methods of communications to create better engagement, but new types of interactions will develop to bring brands to the consumers, not depend on the customers to land where brands want them to be.
While we are discussing text-based communications, the next evolution is already underway: voice interactions. The current incarnations of Siri, Google Now, and Alexa have shown us what will be possible shortly. The good news is the infrastructure and systems brands will build for chatbots will be portable to voice.
Since the mainstreaming of the Internet, the assumption has been that we are at the cusp of the next big revolution.
We are going on 20 years, and it’s not here yet. Will it ever?
Let’s back up a bit. My degree is in computer engineering. So I learned how to design circuits and chips, how to write operating systems. To me, there is no magic to digital or executions which are digital. Digital is what we want it to be. Nothing more, nothing less.
Then why do people have unrealistic expectations of it?
Look at your average consultant, CX expert, or futurist talking about digital. They understand the current applications of digital. They have all used digital tools. The output of digital. The ones who are or have been in the bowels of digital are more neutral about its prospects (unless they are trying to sell you something).
I stumbled across Robert Gordon’s US Economic Growth is Over: The Short Run Meets the Long Run, which codified some of what I was seeing and felt.
Visit a third world country; it is easy to see what is revolutionary. Things like electricity, gas, water, and sewer access.
The combination of these connections resulted is societal shifts of the 1920s to 1970s. It changed where and how we lived our lives. That is what we are seeing in the BRIC countries currently. Everyone expected benefits of these new connections to continue and compound and get us our jetpacks in the 21st century. But, Peter Thiel said it best, “We wanted flying cars, instead we got 140 characters.” Most people are still commuting and pushing around (digital) paper.
We don’t have the right combination to create the next epic change. Think of marketing; you need to get the 4Ps correct. Read the Ten Types of Innovation. The authors point out that for a company to differentiate, it needs a combination of 4 or 5 things right.
Will big data and AI get us there? If only applied to digital, probably not. We are going to need more significant changes from other fields. How about the nano-, neuro-, and green-?
With AI chatbots, the conversation is the hard part, not the frameworks and tools.
Think about prototyping an insurance bot, integrating IBM Watson, Facebook Messenger, and claims databases. The most challenging component, by far, is the conversation. We have to understand the audience and their communication patterns. Talk to customers who have filed claims or are filing claims over the phone and online. Talk to customer service reps and claims adjusters. Listen to logged phone calls and read case emails. Researching conversation patterns in messaging.
Right now chatbot developers are treating the conversation component like a simple decision tree. But people don’t talk in decision trees. It’s a back and for negotiation. Non-developers are using the “no coding frameworks” and they are disappointing. When developers try to integrate conversational intelligence, they fail. When there is a problem definition, developers are great at integrating tools and frameworks. But mapping out how ordinary people converse or within a domain is a different skill set. Most UX professionals aren’t prepared for that either, so there is a talent gap.
tl;dr: There are lots of good use cases for AI chatbots, but too many people see it as a gold rush. Most are not doing the work to solve real problems.
Chatbots are hot right now. It has all the hype. That and virtual reality. But let’s talk about chatbots right now. I’m writing a post on the company blog about chatbots, but I thought I’d put down some thoughts here. Figuring out why chatbots even matter is an interesting exercise. It seems obvious: communications on the Internet is an utter disaster at the moment.
There are four factors. First, online ads downright suck. The ads themselves are not creative, and they are in your face. That is a terrible combination. Agencies and marketers don’t get excited about banner ads and text ads. Unless they are “digital” marketers, and that is all they know. Video and native ads have possibilities, but those executions are bad. Our brains have become dulled from creating banner and text ads. What’s wrong with interrupting the user experience with overlaying ads and tiny close buttons?
More important question is why TV and radio get away with it instead. My main theory is even though we all grew up with interruptive ads on those mediums, we had no alternatives. There was no Facebook or Snapchat to occupy our attentions during commercial breaks. Would TV and radio ads be possible if created today? Another theory is that Americans like breaks in stories. Look at popular American sports: football, baseball, and basketball. They are all situational. One team gets the opportunity to score, and then the next team tries to score. While not entirely applicable to basketball, timeouts make it situational.
If a viewer has watched a few shows on Netflix, how can they go back to live TV? It is painful. The only reason to watch live TV is live events like sports. But social media, which is taking eyeballs from TV ads, makes live TV worth watching. Sharing thoughts on social networks when watching live TV makes it more enjoyable.
Back to why chatbots are huge right now. The second factor is messaging is such a big deal. Business Insider created the chart above showing how messaging usage has surpassed social networks. It’s understandable. As soon as a social network gets big, mom & dad come, the advertisers move in. Then the cool kids migrate to another social network. But the way messaging works, you don’t have to see your dumb uncle’s Trump posts. Messaging allows users the one on one interaction, while interacting with separate small groups. Kids are comfortable with the UX and asynchronous nature of messaging. They are the perfect audience for chatbot-type interactions.
Next, websites and apps are painful. Marketers and agencies design websites and apps without putting the customer at the center. Look at any “digital” agency website. Stuff is flying around, punch the visitor in the face annoying. The websites are a demonstration of how they can grab the visitors attention. Sure. Throw in the bloated nature of websites due to ads and trackers. Why are we surprised ad blocking is on the rise?
The last factor, AI is more accessible. Computing and infrastructure costs have plummeted. The resources to build AI and machine learning are available to anyone. Granted, a lot of what we see are more of decision trees than AI, but it is a start in the right direction. The investments in technologies are already evident. Siri (ok, not so much), Alexa, the multitude of Google products. Plugin architecture of these platforms increases the impact and value of what AI provides. Once users are comfy with AI as their primary interface, they will demand it everywhere. More on this next time.
Let’s hope brands don’t build crappy chatbots and screw up this game-changing opportunity.
List of bot resources I’ve investigated or tried out:
Natural Language Processing (NLP) Platforms:
No-coding Bot Platforms:
Brand Bots (brands that have their own bots):
Other Bot Aggregator Lists: