Building a Facebook Messenger Chatbot

Posted — Oct 18, 2016

To understand how the different pieces of building a chatbot fit together, I decided to prototype a Facebook Messenger insurance claim submission chatbot. The set of requirements were:

There are three components to chatbots:

  1. Consumer touch point (social platforms, website, SMS, etc.)
  2. Middleware which can handle the business logic and be the knowledgebase and database for customer information
  3. Tools and platforms that do the natural language processing to help the middleware understand input from the consumer.

Let’s discuss the tools and platforms. Like any technology, there are easy-to-use tools that perform limited functions, and there are complex ones that have to ability to provide incredible experiences but require time and resources to build. It’s the difference of using a free website building tool like Wix or Weebly vs. creating a custom website on Sitecore or Drupal. For our prototype, we are going to tie into the IBM Watson platform, which won Jeopardy! in 2011.

There are two Watson services we used: Natural Language Classifier (NLC) and Dialog. The purpose of NLC is to understand the context and intent of the user input. We used NLC to know when a user was trying to submit a claim versus checking the status of a claim. The Dialog service is trained with a general script, which it uses as building blocks to create a logic flow for the conversation. This flow helps Dialog ask clarifying questions (or understand when it has all the information) and gather information it needs to collect to complete the given task. The natural language processing ability of Dialog allows it to parse the user input and extract the required information (location, vehicle, the number of passengers). For all this magic to work, both the NLC and Dialog have to be trained with a set of grammars the user might input since these are domain specific.

Our middleware provides several functions for the prototype. First, it is the interface to Facebook Messenger. The middleware takes in user input from Messenger and decides whether to forward it on to NLC or Dialog. At the beginning of a conversation, user input is received and forwarded to the NLC to process and analyze what task the claimant is attempting to complete. From then on, the middleware talks to the Dialog and Messenger service to facilitate the rest of the conversation.

Second, when the Dialog service is in use, we check to see if the user uploaded photos. If the user did upload a photo, the middleware stores it.

Third, once the claimant has submitted all the necessary data, the middleware stores it. Last, when the claimant is requesting a status update, the middleware creates a Messenger-defined image template, including the claimant submitted photo and information, and sends it to Messenger.

Technical Notes

Node.js was our programming language of choice for the middleware. While creating in node.js is not necessary since interactions with Facebook and the IBM Watson services are REST API calls, a majority of documentation provides example code for node.js.

MongoDB was used as the persistence layer since it does not force pre-defined structures which allow for easier prototyping. Mongoose was used to interface with MongoDB.

Webhooks are a way of integrating systems. They usually are configured to call a URL when an event is triggered. When setting up a bot for Facebook Messenger, it requires a webhook URL that will call the middleware with the user input (whether text, photos, or voice) as the payload. The middleware has to be accessible to Messenger at the time of setting up. While not a problem if deploying the chatbot on a cloud infrastructure like Heroku or AWS, but can create problems if the chatbot resides inside of a private network. Then, the best option is to use a service like ngrok which can provide a public interface for the duration of the development cycle.

Like any good webhook callback, Messenger expects a response status for each request. Obviously, this can be problematic while building the chatbot, especially when not all error handling code is in place. Expect Messenger to keep sending those messages repeatedly till they are acknowledged.

Access to Watson APIs is through IBM’s Bluemix platform, which is the largest Cloud Foundry installation. Cloud Foundry CLI tools can be used to set up all the Watson services before use. While we used NLC and Dialog for the prototype, it is worth investigating the full array of Watson APIs. In the future, we plan on integrating the sentiment analysis, tone analyzer, and emotional analyzer to determine how the chatbot can communicate more intelligently with users.

As seen below, there is no shortage of platforms to leverage when building chatbots.

One platform of note is wit.ai. It is owned by Facebook (but not meant to be exclusively used on Facebook) and has an interface for building mock conversations, called Stories, in the browser. Wit creates models from the Stories and tracks user input that can be categorized to provide supervised learning for the AI.

Lessons Learned

The existing rules on product design apply to building chatbots. Analyzing the intended goal, identifying the parameters of where chatbots can help, prototyping and evaluating, and repeating as necessary.

Gain real consensus on the purpose of the chatbot. Stakeholders need to understand what is possible and the limitations. Scope creep will create complexity at an exponential rate that will be hard to implement and frustrate users. On the flip side, when the chatbot starts interacting with humans, it should make clear the scope of what it can do.

For humans, conversations are natural. It is a process we learned starting at birth. Codifying it enough to a machine is a challenge. Mocking up conversations becomes a critical part of the process. Tools like Twine help map out the flow of communications. The next step is user research. Would users interact the chatbot as mapped? Understanding what users think, feel, and how they react while interacting with the chatbot will provide valuable feedback for gathering the NLP data systems need.

Temporary middleware systems are necessary because of how some customer facing touch points, like Facebook, interact with the middleware. If the chatbot middleware is down for maintenance or upgrades, a user may come in and try to initiate a conversation with a “Hello.” If the user receives no response, they will likely say “Hello” again. When the chatbot comes back online, Facebook will send both “Hello” inputs to the middleware. The chatbot will probably assume the both statements are the start of a conversation and provide duplicate responses. Obviously a terrible user experience. A solution would be to stand up a temporary middleware system that can respond to the user after the initial “Hello” and state the chatbot is unavailable and will return soon. At the same time, the temporary system needs to store the user input so that the chatbot can respond to them appropriately when back online.

Remember to provide an escape route. In the early iterations of a chatbot, it will likely not know what to do in every scenario. Track if the conversation is going sideways and provide alternatives communication methods.

What’s Next

We are early in the consumer facing AI chatbot revolution. There are opportunities in enhancing existing methods of communications to create better engagement, but new types of interactions will develop to bring brands to the consumers, not depend on the customers to land where brands want them to be.

While we are discussing text-based communications, the next evolution is already underway: voice interactions. The current incarnations of Siri, Google Now, and Alexa have shown us what will be possible shortly. The good news is the infrastructure and systems brands will build for chatbots will be portable to voice.