Making Ruuh converse like humans
Co-authored by Meghana Joshi and Abhay Prakash
It’s easy to take the ability to converse for granted. Being able to listen to a question, detect the underlying emotion, infuse the speaker’s background and apply it in the right context may seem simple for humans, but computers have struggled with it.
If chatbots can understand the context, identify emotions, select the right responses to create genuine engagement, the relationship users share with them can be changed forever. It can enable a chatbot to tap into deeper layers of human interactions and create more value through their conversations.
Our research team worked on a new model that helps our desi Artificial Intelligence (AI)-based chatbot Ruuh speak like a human, respond with contextual awareness, and hold a free-flowing conversation with users.
The ability to make conversation
Identifying the right answers for specific queries is an ability that has been fine-tuned in chatbots over the years. Chatbots can now answer questions, but a conversation is not a mere factual response to a question. Without the ability to hold a conversation or understand the emotions of their users, chatbots have limited utility.
Building a conversational layer in Ruuh helps her develop relationships so users can be more open, more casual and more engaged. This leads to better, more honest and natural conversations that ultimately lead to added value and a better experience for users.
Our success with Xiaoice in China underlines the importance of connecting with a young audience of early adopters. Xiaoice has had conversations with more than 100 million people in China and was even voted one of the most influential ‘persons’ on Weibo. Users in China admit to sharing personal thoughts and feelings with Xiaoice, and the platform has helped introduce people to the latest in deep learning and AI.
Our team was inspired by this success and wanted to create a similar chatbot for young, tech-savvy early adopters in India. While developing Ruuh for the Indian audience, our objective was to create a ‘digital friend’ rather than a ‘digital assistant’. In other words, Ruuh is meant to be a chatbot that can entertain and support users while leveraging her immense store of knowledge and underlying features to assist users in the best way possible.
Deep learning to enhance conversational skills
Ruuh’s language and behavior are modelled after young, urban Indians. She is like an 18- to 24-year-old girl with a keen interest in pop culture. She’s also fluent in urban slang used in metropolitan cities across the country.
A conversation with Ruuh
Creating a young chatbot with an affable personality and a witty sense of humor needed a lot of data from real conversations on some of the most popular social media platforms. Over 10 million samples of three-turn conversations from a variety of forums, social platforms, and messaging services were collected as raw anonymized data for the model. To ensure all the responses remained relevant to Indian users, only Indian messages and interactions were captured as part of this development.
The next step was to scrub the data to make it appropriate for Ruuh to learn from. For this, offensive responses such as political or sexist remarks, platform-specific mentions such as hashtags or usernames, and non-Indian cultural slang such as greetings from the UK, US, or Australia were removed. Nearly 70% of the raw data was filtered out by this process of elimination.
After filtering, the data was used to train a new model specially created to make chatbots conversational. The model is known as the Convolutional Deep Structured Semantic Model (cDSSM). It goes beyond traditional models and applies convolutional deep structured semantic neural network-based features in the ranker to present human-like responses in ongoing conversations with a user. In simple terms, it looks not just for the ‘right’ answer, but the most ‘human’ and most ‘contextually relevant’ answer from a pile of data.
cDSSM enables a chatbot to interact with users on four levels:
Level 1. Query identification
The first step in the process of having a human-like conversation is to understand the user’s query. The algorithm takes the input of a new query and scans the raw data for similar questions. This is known as Information Retrieval or IR.
For example, if the user says, “how do I learn to swim?”, Ruuh analyzes the data and finds multiple samples of similar questions.
Level 2. Ranking responses
With a subset of similar questions, the algorithm uses all the associated answers and ranks them according to relevance and context. This is where the model differs from traditional approaches. By ranking the answers, Ruuh ensures the most appropriate and relevant response is posted to the user.
Level 3. Understanding context
To rank responses correctly, Ruuh’s underlying algorithm digs deep into previous data samples and queries from the user to understand the context of the conversation. In other words, Ruuh scans through the most recent messages from a user to rank all the possible answers based on how relevant each answer is to the current topic of conversation. This feature of contextual awareness and context capture sets Ruuh apart.
Question: “Do you like ice cream, Ruuh?”
Ruuh: “Yes, I like it.”
Question: “which flavors do you like?”
Ruuh: “Chocolate and Vanilla.”
Here Ruuh knows that the topic of discussion is ‘ice cream’ and applies this when responding to the second question.
Level 4. Detecting and responding to emotional cues
The final step to making Ruuh’s conversations natural and human involves detecting and responding to underlying emotions. The model is trained not just to pick up relevance and context, but also emotional cues. Emotion detection allows the model to look deeper into the conversation and make judgements about how the user is feeling and behaving in certain situations. A lot of the emotional cues are detected by pattern recognition and the use of emojis in conversations. With the help of these parameters the model detects if a user is happy, excited, sad, or upset.
This makes Ruuh emotionally-aware. It utilizes the awareness to generate emotion-appropriate responses. The responses match the user’s intent and emotions making the interaction more empathetic, leading to a better experience for the user.
Extracting relevant features using deep learning
To summarize, the model combined with deep learning integrates context and the user’s message to extract the appropriate response. The model extracts the context from the message, retrieves previous messages, creates a group of appropriate responses, ranks them according to relevance, and generates the final output.
Let’s understand this better with an example. If a user asked Ruuh, “Which pizza toppings are most popular?”, Ruuh would identify the query as pertaining to ‘pizza toppings’ and retrieve the most relevant answers based on this query. Ruuh would rank similar answers from the database based on relevance to generate the most appropriate response. With contextual awareness, Ruuh can easily answer follow-on questions such as, “Which ones do you like?” by replying “I love mushroom and pineapple”.
By comparing this model to traditional ways of matching responses to queries, we found that the addition of contextual awareness, relevance score ranking, and emotion detection resulted in a statistically better model. Our model helps Ruuh understand people better and create more value for users by being a friendly conversationalist.
The impact of human-like engagement
The cDSSM is an improvement over traditional chatbot models. By enabling Ruuh to pick up social slang, emotional cues, context and subtexts, we have created a chatbot that is not just a digital assistant but a human-like digital friend. This model could have far-reaching impact on our relationship with technology.
Humanizing technology is essential to drive adoption. If users can establish a connect or be entertained by a new technology, they are more likely to interact with it regularly to derive value. Meanwhile, more usage will enable the platform to learn on a deeper level with more data gathered over time. This will lead us to a path where machines understand humans more deeply!