Time picked these photos. They are memorable. They are influential in part because everyone knows them. Is it possible for any image/video/story to be that prominent nowadays? There are too many channels vying for our attention. 75 million watched the Seinfeld series finale. The Walking Dead had ratings of 9.6 in 2016. Sure, comparing a series finale against a regular episode may not be fair, but how does any group of people have a set of common cultural stories if there are no standard channels for consumption? The assumption is that it is much easier and faster to share, so theoretically it is possible. If all the social platforms are algorithmically personalizing content for clicks/engagement/whatever, essentially customizing for the individual instead of the common, how do you build the common?
Fascinating topic. We think of societal and cultural breakdowns led to the collapse of the Roman Empire. That might not be true. Rome had a thin layer of bureaucracy. Local officials ran the different regions and cities. They invested in their communities, and they connected the locals to Rome. Over time and changes in leadership, the governing model became more centralized. Rome co-opted the local officials, and those officials were more interested in making Rome happy and less invested in their local communities. Collapse coming in 10…9…8…
Ah, the early 90’s. Everyone remembers that story. The woman who sued McDonald’s was an example of stupid lawsuits. But read the link. It doesn’t sound like she was greedy and McDonald’s did know it was handing out coffee that was too hot. McDonald’s presented a different story which became embedded in society. Media and truth are fungible concepts.
To understand how the different pieces of building a chatbot fit together, I decided to prototype a Facebook Messenger insurance claim submission chatbot. The set of requirements were:
- Be able to submit an auto accident-related claim (location, vehicle, number of passengers) and be able to check the status of a claim.
- Be able to upload photos of the accident.
- Be able to understand and process when the claimant is submitting multiple pieces of relevant information at once.
- Leverage Facebook Messenger as the consumer touchpoint.
There are three components to chatbots:
- Consumer touch point (social platforms, website, SMS, etc.)
- Middleware which can handle the business logic and be the knowledgebase and database for customer information
- Tools and platforms that do the natural language processing to help the middleware understand input from the consumer.
Let’s discuss the tools and platforms. Like any technology, there are easy-to-use tools that perform limited functions, and there are complex ones that have to ability to provide incredible experiences but require time and resources to build. It’s the difference of using a free website building tool like Wix or Weebly vs. creating a custom website on Sitecore or Drupal. For our prototype, we are going to tie into the IBM Watson platform, which won Jeopardy! in 2011.
There are two Watson services we used: Natural Language Classifier (NLC) and Dialog. The purpose of NLC is to understand the context and intent of the user input. We used NLC to know when a user was trying to submit a claim versus checking the status of a claim. The Dialog service is trained with a general script, which it uses as building blocks to create a logic flow for the conversation. This flow helps Dialog ask clarifying questions (or understand when it has all the information) and gather information it needs to collect to complete the given task. The natural language processing ability of Dialog allows it to parse the user input and extract the required information (location, vehicle, the number of passengers). For all this magic to work, both the NLC and Dialog have to be trained with a set of grammars the user might input since these are domain specific.
Our middleware provides several functions for the prototype. First, it is the interface to Facebook Messenger. The middleware takes in user input from Messenger and decides whether to forward it on to NLC or Dialog. At the beginning of a conversation, user input is received and forwarded to the NLC to process and analyze what task the claimant is attempting to complete. From then on, the middleware talks to the Dialog and Messenger service to facilitate the rest of the conversation.
Second, when the Dialog service is in use, we check to see if the user uploaded photos. If the user did upload a photo, the middleware stores it.
Third, once the claimant has submitted all the necessary data, the middleware stores it. Last, when the claimant is requesting a status update, the middleware creates a Messenger-defined image template, including the claimant submitted photo and information, and sends it to Messenger.
Node.js was our programming language of choice for the middleware. While creating in node.js is not necessary since interactions with Facebook and the IBM Watson services are REST API calls, a majority of documentation provides example code for node.js.
MongoDB was used as the persistence layer since it does not force pre-defined structures which allow for easier prototyping. Mongoose was used to interface with MongoDB.
Webhooks are a way of integrating systems. They usually are configured to call a URL when an event is triggered. When setting up a bot for Facebook Messenger, it requires a webhook URL that will call the middleware with the user input (whether text, photos, or voice) as the payload. The middleware has to be accessible to Messenger at the time of setting up. While not a problem if deploying the chatbot on a cloud infrastructure like Heroku or AWS, but can create problems if the chatbot resides inside of a private network. Then, the best option is to use a service like ngrok which can provide a public interface for the duration of the development cycle.
Like any good webhook callback, Messenger expects a response status for each request. Obviously, this can be problematic while building the chatbot, especially when not all error handling code is in place. Expect Messenger to keep sending those messages repeatedly till they are acknowledged.
Access to Watson APIs is through IBM’s Bluemix platform, which is the largest Cloud Foundry installation. Cloud Foundry CLI tools can be used to set up all the Watson services before use. While we used NLC and Dialog for the prototype, it is worth investigating the full array of Watson APIs. In the future, we plan on integrating the sentiment analysis, tone analyzer, and emotional analyzer to determine how the chatbot can communicate more intelligently with users.
As seen below, there is no shortage of platforms to leverage when building chatbots.
One platform of note is wit.ai. It is owned by Facebook (but not meant to be exclusively used on Facebook) and has an interface for building mock conversations, called Stories, in the browser. Wit creates models from the Stories and tracks user input that can be categorized to provide supervised learning for the AI.
The existing rules on product design apply to building chatbots. Analyzing the intended goal, identifying the parameters of where chatbots can help, prototyping and evaluating, and repeating as necessary.
Gain real consensus on the purpose of the chatbot. Stakeholders need to understand what is possible and the limitations. Scope creep will create complexity at an exponential rate that will be hard to implement and frustrate users. On the flip side, when the chatbot starts interacting with humans, it should make clear the scope of what it can do.
For humans, conversations are natural. It is a process we learned starting at birth. Codifying it enough to a machine is a challenge. Mocking up conversations becomes a critical part of the process. Tools like Twine help map out the flow of communications. The next step is user research. Would users interact the chatbot as mapped? Understanding what users think, feel, and how they react while interacting with the chatbot will provide valuable feedback for gathering the NLP data systems need.
Temporary middleware systems are necessary because of how some customer facing touch points, like Facebook, interact with the middleware. If the chatbot middleware is down for maintenance or upgrades, a user may come in and try to initiate a conversation with a “Hello.” If the user receives no response, they will likely say “Hello” again. When the chatbot comes back online, Facebook will send both “Hello” inputs to the middleware. The chatbot will probably assume the both statements are the start of a conversation and provide duplicate responses. Obviously a terrible user experience. A solution would be to stand up a temporary middleware system that can respond to the user after the initial “Hello” and state the chatbot is unavailable and will return soon. At the same time, the temporary system needs to store the user input so that the chatbot can respond to them appropriately when back online.
Remember to provide an escape route. In the early iterations of a chatbot, it will likely not know what to do in every scenario. Track if the conversation is going sideways and provide alternatives communication methods.
We are early in the consumer facing AI chatbot revolution. There are opportunities in enhancing existing methods of communications to create better engagement, but new types of interactions will develop to bring brands to the consumers, not depend on the customers to land where brands want them to be.
While we are discussing text-based communications, the next evolution is already underway: voice interactions. The current incarnations of Siri, Google Now, and Alexa have shown us what will be possible shortly. The good news is the infrastructure and systems brands will build for chatbots will be portable to voice.
Since the mainstreaming of the Internet, the assumption has been that we are at the cusp of the next big revolution.
We are going on 20 years, and it’s not here yet. Will it ever?
Let’s back up a bit. My degree is in computer engineering. So I learned how to design circuits and chips, how to write operating systems. To me, there is no magic to digital or executions which are digital. Digital is what we want it to be. Nothing more, nothing less.
Then why do people have unrealistic expectations of it?
Look at your average consultant, CX expert, or futurist talking about digital. They understand the current applications of digital. They have all used digital tools. The output of digital. The ones who are or have been in the bowels of digital are more neutral about its prospects (unless they are trying to sell you something).
I stumbled across Robert Gordon’s US Economic Growth is Over: The Short Run Meets the Long Run, which codified some of what I was seeing and felt.
Visit a third world country; it is easy to see what is revolutionary. Things like electricity, gas, water, and sewer access.
The combination of these connections resulted is societal shifts of the 1920s to 1970s. It changed where and how we lived our lives. That is what we are seeing in the BRIC countries currently. Everyone expected benefits of these new connections to continue and compound and get us our jetpacks in the 21st century. But, Peter Thiel said it best, “We wanted flying cars, instead we got 140 characters.” Most people are still commuting and pushing around (digital) paper.
We don’t have the right combination to create the next epic change. Think of marketing; you need to get the 4Ps correct. Read the Ten Types of Innovation. The authors point out that for a company to differentiate, it needs a combination of 4 or 5 things right.
Will big data and AI get us there? If only applied to digital, probably not. We are going to need more significant changes from other fields. How about the nano-, neuro-, and green-?
With AI chatbots, the conversation is the hard part, not the frameworks and tools.
Think about prototyping an insurance bot, integrating IBM Watson, Facebook Messenger, and claims databases. The most challenging component, by far, is the conversation. We have to understand the audience and their communication patterns. Talk to customers who have filed claims or are filing claims over the phone and online. Talk to customer service reps and claims adjusters. Listen to logged phone calls and read case emails. Researching conversation patterns in messaging.
Right now chatbot developers are treating the conversation component like a simple decision tree. But people don’t talk in decision trees. It’s a back and for negotiation. Non-developers are using the “no coding frameworks” and they are disappointing. When developers try to integrate conversational intelligence, they fail. When there is a problem definition, developers are great at integrating tools and frameworks. But mapping out how ordinary people converse or within a domain is a different skill set. Most UX professionals aren’t prepared for that either, so there is a talent gap.
tl;dr: There are lots of good use cases for AI chatbots, but too many people see it as a gold rush. Most are not doing the work to solve real problems.