Google’s Meena, World’s Best Chatbot3 min read

Bright minds at Google have developed Meena, which they claim to be the best open-domain chatbot in the world.

Meena is an end-to-end, neural conversational model that learns to respond sensibly to a given conversational context. The Meena model has 2.6 billion parameters and is trained on 341 GB of text, filtered from public domain social media conversations. Compared to an existing state-of-the-art generative model, OpenAI GPT-2, Meena has 1.7x greater model capacity and was trained on 8.5x more data.

Meena is based on the Evolved Transformer seq2seq (sequence-to-sequence) architecture. Generally, seq2seq turns one sequence into another sequence by using a recurrent neural network architecture such as RNN or LSTM. The context for each item is the output from the previous step.

When evaluating the chatbot’s pertformance, Google applied its own newly designed human evaluation metric, the Sensibleness and Specificity Average (SSA), which captures basic, but important attributes for natural conversations. Basically, Meena’s responses were graded on not only sensibility (“Does it make sense?”) but also specificity (“Is it specific enough for this discussion?”) in relation to the statement the response was given to. For example, if A says, “I love tennis,” and B responds, “That’s nice,” then the utterance should be marked, “not specific”, since that reply could be used in dozens of different contexts. But if B responds, “Me too, I can’t get enough of Roger Federer!” then it is marked as “specific”, since it relates closely to what is being discussed.

For comparison, the SSA metric was used to evaluate the performance of the current state-of-the-art open-domain chatbots Mitsuku, Cleverbot, XiaoIce, and DialoGPT. Meena beats all of them by a large margin showing 79% SSA comparing to 86% SSA of humans. This seems to be the closest a chatbot has ever gotten to the level of human performance.

Another interesting metric used in Meena’s training and performance evaluation is the model perplexity, an automatic metric that is readily available to any neural seq2seq model which measures the uncertainty of a language model. When building Meena, Google discovered that perplexity exhibits a strong correlation with human evaluation, such as the SSA value. As Google explain it, the lower the perplexity, the more confident the model is in generating the next token (character, subword, or word). Conceptually, perplexity represents the number of choices the model is trying to choose from when producing the next token. Therefore, the training objective was to minimize perplexity.

Google might release Meena as an external research demo in the coming months after the chatbot has been tested for safety and bias and the risks and benefits associated with Meena’s release have been evaluated.

Read the original post by Google AI here.

Apart from a general domain discussion, this development is interesting from a LegalTech standpoint too. To get to the human level of conversation closer than ever, Google used a mind-blowing amount of text data and extruded it through a state-of-the-art neural network architecture. While the latter is open-sourced, the former is the most challenging part: the amount of resources spent on collecting and correctly labeling high-quality data is usually the main bottleneck for implementing AI solutions. In the field of law, this is even more challenging due to how expensive legal services (i.e. lawyers who review and label data) are and also thanks to the often confidential nature of legal documents. The advancement of LegalTech AI products largely depends on the amount of well-structured and adequately labeled data available. Quite logically, the path to truly amazing results in LegalTech lies through the combination of high-quality legal datasets and leading legal AI expertise.

Sergii Shcherbak