Episode 2

Expanding What’s Possible: The Advancement of Conversational AI Technology

Welcome back as we continue the video series from our team here at Interactions! In this series, we are sharing conversations between our CMO, Peter Mullen, and some of the incredible people who are key to shaping the future of Conversational AI.

Today, Peter is joined by Srinivas Bangalore, Vice President of AI Research at Interactions. With decades of research experience, experience in technology, over 100 patents, and 100 published research articles, Srinivas has a lot of insight to share with us today.

In this episode, you’ll hear about Srinivas’ journey into the world of AI, the birth and evolution of AI, and how the human mind is the most powerful technology we have. You’ll also hear why it is vital that AI and human intelligence work together to deliver a superior customer experience.

Short on time? Here are some quick takeaways:

  1. AI is all about understanding human intelligence.

    Ever since the early days of AI in the ‘50s and ‘60s, it has been all about understanding human intelligence and shaping it in such a way that we could teach machines to understand language. So you could consider AI an avenue of better understanding ourselves.

  2. AI interactions should be created to be worthy of our time.

    There is nothing worse than walking away from an AI interaction (or a customer service experience overall) feeling frustrated and annoyed. That is why we recommend that AI is designed to seamlessly incorporate human assistance where necessary, offering a superior interaction. Human language is ever-evolving and can even differ from person to person.

  3. Do we need to worry about sentient AI?

    Will there be a robot uprising? Not quite. AI is a tool and it will never be able to replace a human. While AI can only reflect the intelligence we give it, Srinivas shares that we still need to be concerned about the leaders that are in control of the technology. If the leaders that use these tools have strong values that seek to improve the customer experience, then we don’t need to fear the developments that allow AI to become more “human.”

Srinivas Bangalore

Srinivas Bangalore

SVP, Engineering

Dr. Srinivas Bangalore is currently the Director of AI Research technologies at Interactions LLC. and was a Lead Inventive Scientist at Interactions (2015-2017) and a Principal Research Scientist at AT&T Labs--Research (1997-2014). He has a PhD in Computer Science from University of Pennsylvania and has made significant contributions to many areas of natural language processing including Spoken Language Translation, Multimodal Understanding, Language Generation and Question-Answering. He has co-edited three books on Supertagging, Natural Language Generation, and Language Translation, has authored over 100 research publications and holds over 120 patents in these areas. Dr. Bangalore has been an adjunct associate professor at Columbia University (2005), a visiting professor at Princeton University (2008-present) and Otto Monstead Professor at Copenhagen Business School (2013). He has been awarded the Morris and Dorothy Rubinoff award for outstanding dissertation, the AT&T Outstanding Mentor Award, in recognition of his support and dedication to AT&T Labs Mentoring Program and the AT&T Science & Technology Medal for technical leadership and innovative contributions in Spoken Language Technology and Services. He has served on the editorial board of Computational Linguistics Journal, Computer, Speech and Language Journal and on program committees for a number of ACL and IEEE Speech Conferences.
Read the transcript

Peter Mullen
Hello. My name is Peter Mullen, CMO of Interactions. Today we’re continuing the conversation series from Interactions to leaders in conversational AI excellence. And today we are joined by Srinivas Bangalore, Vice President of AI Research for Interactions. I’ve got to say, our team at Interactions has some of the most incredible world class brains thinking about AI ML, where we are going, where we are today, and what the future holds for improving the overall customer experience using a combination of artificial intelligence and humans.

And Srini is the incredible conductor of this orchestra who helps put so many different pieces together to deliver an excellent result to millions of customers. He has decades of research, experience and experience and technology. We as a company have more than 100 patents and more than 100 published research articles, and we have a ton to learn from Srini. Let’s dive right in.

So, Srini, thank you for being here. I’d love to kick off by having you tell us a little bit about yourself, where you come from and how you arrived at Interactions.

Srinivas Bangalore
Thank you, Peter. Thank you for the opportunity to talk about myself, the team and the work that we do at Interactions. My story sort of begins right from India, and then I came to the US in 1991, to get my PHD in natural language understanding at University of Pennsylvania and subsequent to that.

In ‘97 when I graduated, I had the incredible opportunity to work with the best minds at AT&T byLabs, and I was a part of the speech and language team and subsequent to leading teams and inventing, patenting, I had the opportunity to come to Interactions in 2014 to make the technologies that I used to invent to business impacting solutions, and that’s what we do now at Interactions.

I lead a team of some of the best and brightest engineers, experienced researchers, scientists in spoken language understanding, speech recognition, dialog management, machine learning and so forth. And the idea here is to have the team contribute to the most impactful business problems and give the differentiation from Interactions. That’s in a nutshell about my job.

Peter Mullen
I want to go all the way back for a quick second. I did not realize you came to America in 1991. A young Srini getting off the airplane landing in Philadelphia, and the hum and the vibrancy of this American city going to University of Pennsylvania, where so many great minds have built the future that we now live in today.

Srini that was four years before the Internet, what became a thing with Netscape. Natural Language, of course it had been talked about and been built out decades earlier, but it was in its infancy. Talk to me a little bit about what that was like in the early nineties when you were just at the cutting edge of all of this.

Srinivas Bangalore
Yeah. In the early days of AI you know, it traced back to its genesis in the fifties and sixties. It was all about trying to understand human intelligence by modeling it to a point where we could teach the machine how to understand language or speech or vision or move around like robots do. And it was a way, we think about AI as a mechanism in one sense, to understand ourselves. How does the human brain work and so forth.

So when I came to the University of Pennsylvania I was part of a cognitive research program. I used to be in the Cognitive Science Institute there. And the idea here was to bring interdisciplinary activities into one single place and have the cycle talk to scientists, talk to linguists, and design solutions that would enable us to further our understanding of ourselves in one sense.

And that was AI back in the early days. Subsequent, as you rightly said, you know, the Internet was just beginning to happen and search engines were on the horizon and, you know, how do you apply natural language understanding so that within a matter of a few key words, we would understand what the intent of the expression ought to be.

And so there was a great deal of excitement towards the mid-nineties and late-nineties to make this real in one sense and take it from the lab and put it out into the world. And that was an accomplishment in itself. Taking the engineering aspect of it seriously and making it available for the masses while the academic research was still continuing to happen and we see the fruits of that labor made sense and many of the technologies that we use today.

Peter Mullen
I’m going to surge forward several decades and talk about the implications about this and where we are today and the mass consumer. I want to say adoption or maybe it’s the embrace of AI technology. It’s a part of almost all of our lives, at least in the first world and in much of the developing world. But before I do that, I want to talk about Interactions because we came into existence still at the front end of mass adoption and usage of AI back in 2004 when Interactions was founded.

And we came at this with an incredibly unique way of solving problems in customer contact centers and for customer support that we still use today and its adaptive understanding. Lets get really nitty gritty and dive into what Interactions does. How are we different from all of the other companies with natural language technology and AI?

Srinivas Bangalore
Yeah, absolutely. The challenge always has been as a technologist that you would want the technology to be at the forefront and to show the difference in technology. The idea at Interactions is more about customer experience. Eventually, we don’t want to hold the customer hostage to technology gaps, no matter the progress that they have made in the last 20 years in AI, it is not perfect.

So if you treat customer experience as a non-negotiable entity, the eventual solution that you come up with is can you have a human in the background to assist the technology so that we can deliver that maximum customer experience? So Interactions is sort of unique in the way that we have a blended hybrid solution that in real time brings in human assistance so that we can deliver that frictionless conversational experience we all aspire to have.

Even as a consumer, I would like to call a call center and have an experience that is worthy of my time as opposed to being repeatedly annoyed because they couldn’t get my name or what have you. So this human assisted understanding is sort of a unique differentiation because you get a real time assistance when a difficult name appears, when an email has to be captured, when the intent that the user is expressing is difficult to discern because there is noise in the background or the TV is glaring or the child is crying.

And these are challenging situations where technology is likely to fail because it’s not designed to work in those kinds of adverse situations. So instead of annoying the customers and repeatedly asking for the same thing in different ways and customers getting hyper articulation and emotionally upset, we have humans in the background to help out the technology to do its thing. A consequence of that is not only that we have a better user experience, but we also have a virtuous cycle of annotated supervised conversations that will be next generation in data feeds, so to speak, to improve the technology so I can constantly improve the technology while still delivering customer experience that is par none in one sense.

So that’s the real part of human assisted AI.

Peter Mullen
You know, I think this is what gets me really excited about what we do. And this is not a commercial for Interactions, but I’m new to Interactions and when I looked at the opportunity, one of the things that I just found both practical and thrilling was this constant interplay between human and machine. So I live in Silicon Valley, as you know, and we are so optimistically standing around the altar of machines and technology and zeros and ones that I think sometimes we move ahead too fast.

And I’ve been around long enough to see how pushing too fast and too hard with ideals does not lead to a practical solution. Yet at Interactions we knew from day one that machines would be fantastic and exponentially improve their efficiencies. But the greatest computer is always the human brain. And so I see this happening all the time, this interplay back and forth.

I think of it metaphorically as a professional basketball team, passing the ball down court between the AI and the human while the customer is having a seamless, frictionless experience. So just just for a second, could you compare the different types of AI experiences that we have as consumers to a Conversational AI experiences in customer support? I recognize that I might call three 800 numbers today, and I’m getting extremely different experiences from one to the next.

Talk to the people who are doing those calls about what the differences are that they are experiencing

Srinivas Bangalore
Yeah. In the early days, there were simple IVR systems where we used BTM, BTMF, press one for this, press two for that and press 3 for that, and back in 2000 or so when I was part of AT&T Research Labs, it was a revolution. In one sense, Speech recognition came to a point where we could just simply say what we wanted. There was a 14 way classification system where you could say, well, I want to pay my bill or I need technical support or what have you, and that would lead us down some sort of a tree in the conversation.

And that was phenomenally successful in that sense. In the third generation, positional systems started with automating those aspects, where it’s difficult for us to capture certain entities, names, certain email addresses, certain account numbers because they’re complicated and technology is not perfect. That’s partly where Interactions came in and that point and said, look, we are not going to hold the customer hostage to the gaps in the technology, but we will treat customer experience as the product that we are going to sell, right?

So instead of holding technology as the place where the experience stops, we are going to have a blended experience, blending technology with humans in order to deliver that experience that we all aspire. So that’s partly the differentiation, that’s partly the innovation, because it is enormously challenging when a different accented speaker or a noisy backgrounded utterance is spoken, to understand what is being said.

Now having said that, and having delivered 80 plus applications that take about a billion utterances a year now, we are moving to the next level in a blended AI solution where having understood the customer, what is the next step in the process in a conversation. It is when the customer has to accomplish some sort of a task. It could be a payment task, it could be a technical assistance task, it could be a new account set up task.

Whatever the task, these tasks now could be resolved. They could be accomplished not just within AI solutions, but in a blended human AI solution. For example, because it may be a difficult task to accomplish that requires humanous intelligence. Or it could be as simple as well. We don’t have access to the system, so I can’t tell you whether you have that or not.

So those are some of the advances that we are making at Interactions. Going beyond recognition of human assisted understanding, to human assisted task completion.

Peter Mullen
As we head into 2023, what are the expectations for today’s consumer when they encounter an IVR system?

Srinivas Bangalore
Yeah, I think 2023 customers are extremely savvy. The plain old days of telephone 2000, people would pick up the phone and make a phone call and it was a linear channel of speech coming in, speech going out and there was not much more you could do. But in the current times, with smartphones and streaming devices and home smart speakers, customers expect that they will interact with the conversational systems in a variety of different channels seamlessly, like the conversation to begin in one channel and continue in a different channel.

Just because there is a screen on the second channel that can display or build much more effectively on a channel like a smartphone than on a sporting telephone channel. So these are the aspects that the customers are starting to expect because in everyday life we use these devices quite effectively to do a variety of different things. So why can’t the conversation be omni channel in one sense across different channels?

Why should it be tied down to a particular channel? And why can’t a conversational system really leverage the characteristics of these different devices? Some devices are better for certain things. So why don’t we use the broadband and acoustics of these devices and deliver much more effective conversations and much more action as one musician? So I expect that 2023 and beyond, these will be aspects that will play out.

Peter Mullen
At Interactions we can get very precise about these prefixes as we talk about omnichannel. We also talk about something called opti-channel, which is not yet a commonplace word. What is the difference between the two? What does opti-channel mean for us?

Srinivas Bangalore
Yeah, it’s a great question. So omni means that you can interact with various conversational systems in a variety of different channels. It could be in speech, text, SMS or what have you. The opti-channel concept is to optimize the conversation across these different channels. So for example, if I have to verify and authenticate the customer, it doesn’t necessarily have to be speech, it could be sending an SMS and creating a form to be filled out, or a similar story with making a reservation.

I don’t need to ask you for five different pieces of information in speech. It could be a form that is sent to your phone to be filled out. So the idea of opti-channel is to maximize the information throughput across different channels as opposed to siloed different channels, which is sort of omni.

Peter Mullen
You know, there is a recent survey that indicated 20 plus percent of respondents would rather spend a night in jail than call customer service one time. And there’s other surveys that say 85% of customers will leave the brand if they have one negative experience with customer service. Why do we still have so many challenges getting it right in this sector of the overall customer experience?

Srinivas Bangalore
Yeah, that’s a great question. And you know, I was a victim of such an experience not long ago, over the weekend when I had called a big brand name and I spent an hour and a half getting an issue resolved. But I was persistent and patient. And yes, it is true that you become hostage to these systems, but the challenge is two ways, in my opinion, that one is of course language is complicated and it’s ambiguous and people have to express the issue, in a much more than dynamic situation.

And machines are not designed to handle such sort of dynamic situations. That’s on the front end and then on the back end as well. There are complicated back office systems. Most of these enterprises have organically grown. As a result, there are a multitude of systems and you have to stitch the information across multiple systems to even come up with a resolution which, as you know, takes a long time.

Then there is the third aspect of it, which is that increasingly there is pressure on customer care centers from the cost perspective to minimize certain things like everage time. As a result, the experience is diminished when you have to compromise that conversation. And so at Interactions, what we are looking at is ways in which we can actually augment the experience of an agent so as to not replace them, but actually amplify that effort through a variety of different technologies that are under development.

And one such technology that we are looking at is how do you either attenuate the signal or amplify a signal that will enable the conversation to go faster, enable the conversational experience to be richer and yet all through the constraints of a context to the realities?

Peter Mullen
Are you a guy who presses zero or shouts agent into the phone?

Srinivas Bangalore
Knowing the technology that I develop, I am patient with the technology because I believe that I have to provide the data so that the technology can be ready.

Peter Mullen
That’s your curse, you are trapped! By being an academic to the core. I will press zero. I am that guy. Just to move it faster. And of course it puts me back into a loop and it never really solves the problem.

Many ways are there to say ‘yes’ in the English language that you analyze and you work with?

Srinivas Bangalore
Well, sometimes, yes can also mean no, right. So it’s highly contextualized. And currently Interactions systems have been designed with the humans in the loop and automated many of the yes and nos. However, we still have challenges sometimes to get exactly what the user means without when he or she says in an expression that does not directly answer yes or no.

And so quite a few of the challenges that we have are mostly driven based on functional interpretations of the system prompts. When the system asks, do you mean X or Y, the user may come back with a yes as an answer. What does that really mean without necessarily choosing the choice? So some of these are issues, because we have a user base that is diverse and some of these are design choices that our experts make.

So in a sense, many of these diverse challenges that we face in interactive systems are not necessarily technology oriented, but naturally human related aspects that come in the picture.

Peter Mullen
I think about so much of what you do is punk rock. And when I say that, what I mean is that it’s constantly evolving and it’s pushing the envelope. I have a very good friend who works for one of the largest Conversational AI providers in the world, hundreds and hundreds of millions of utterances per day. And her role is to translate from English into French, French national.

And I said, What is the biggest pressure on your job? And her response is that every several weeks the slang is already changing so quickly that she must keep up and she can not. And that’s a real struggle right now because this is emerging so quickly. But the world is emerging and evolving so quickly as well. Right?

Srinivas Bangalore
Yes. Language is the most organic object that I label it, in one sense. Language is personal and language is dynamic. And language when it is coupled with a business, you know, you have a trifecta because the business keeps changing and if an enterprise is releasing a new product or service, you can talk about it in a very different way that you have not seen before.

And so how do you keep up in one sense with the dynamic nature of language and the personal nature of language, especially if you want to design user experience, that is part nun, you need to understand the user in all the dimensionality like the speech and the user profile that they have and what services and products that have subscribed to and what past calls they have made. To bring all of that knowledge together the next time the system says something, we want to personalize that experience.

So in one sense, that’s the challenge of our enterprise, virtual agents partly because dynamically the business changes and these new products and services that have a different price plan, different technical features, and we have many, many conversations. In our Interactions data logs, so to speak, where folks are trying to understand what is this new object that has been created by the business and how do I interact with it.

And that knowledge has to be assimilated into the conversational system to be maximally efficient. And that’s where the human assistant understanding is, we constantly introduce new intents and new meanings for words that didn’t exist before because of the dynamic nature of the business. And so it’s never evolving.

Peter Mullen
Where does it go over the next couple years? This year there have been articles and concerns about machines becoming sentient. There was a big kerfuffle about that. And if we look at many of the science fiction movies of the past 30 years, they actually have turned out to predict a lot of what we’re looking at today.

So, you know, when we think about this, what is the opportunity and what are some of the risks of this entire category over the next, say, 20 years?

Srinivas Bangalore
Yes, that indeed is a good question. And this is my perspective, for me AI is a tool. AI is never going to replace a human. And there is a lot of talk about AGI, general intelligence. However, I view AI as a tool to enhance human enterprise. Whether it’s conversation, understanding each other, or building trust.

AI has evolved in the most recent times in the last 10 to 15 years and has become heavily, heavily data driven. And the idea here is that the historical data that we collect will reflect some of that intelligence that we are going to need for predicting the future. Now that comes with a risk factor because the society that created that data may not be the society we want to create in the future.

We don’t want to be a part of such a society in the future. So how do you, in a sense, bleach the inadequacies of the society that computed that dataset and yet create some valuable tools that will help out people in the future? And I strongly believe that technology is not the answer to this.

It’s the human elements and the humanitarian values that have to be inculcated in leaders who use the technology and how they use it will have to be installed from the very beginning because it’s not the fault of the technology. We talk a lot about bias in AI these days. It is not a part of the technology. Technology is doing what it’s supposed to be doing.

However, because it was trained on certain aspects that reflected the state of society at that time, we see technology looking at it in a way of bias.

Peter Mullen
I have observed how the pendulum has swung back and forth and how, as technology accelerated the rest of the world, and let’s in this case, say the humanity side of the world has not been able to keep pace because the technology pace is just at a different cadence of the pendulum that I’m observing right now is the reintroduction of morality and ethics to the conversation.

And many people will say it had never not been there. But I am saying there is a much stronger emphasis today. That is an extremely good thing. The philosophers have not been pushed aside. They have just been moved to the back of the room. But now they’re coming back to the front of the room, as far as I can tell. And as you just indicated, this is what has to happen.

We have complex questions. We have convergence that is occurring around here and around this category. We’ve got to solve this. We’ve got to get ahead of it. And one of the neat things I think about your work, our teammates’ work, is that we actually are thinking about this and we’re thinking about the biases that might get built in and how we can break those down before they ever get out into the world.

And it’s a never ending job right.

Srinivas Bangalore
It is not an easy task and it’s never ending.

Peter Mullen
What do you think of this new, I’m going to call it category of art, for lack of a better word, where people are able to articulate a vision. And now there are several companies that are creating that as graphical representations. Google, for example, this week just announced its tool where you can speak sentences and it will create a street image.

What is that and what does that lead to?

Srinivas Bangalore
It’s a fascinating development and it sort of traces back to the origins of the AI, where we wanted to have systems that would be comprehensive in one sense in all aspects and all of our senses, vision and speech and text and hearing and so forth. However, along the way in the eighties and nineties, each of these areas became a gigantic subject area by itself, and so much so that we lost that interlinked way to talk across different fields of AI.

With the advent of deep neural networks, we have regained a common parlance, the lingua franca, in the sense that allows us to blend speech with text, with teammates, with video, with robotics, even, and think about a framework that is fostering across these seemingly different perceptual tasks.

And that opportunity can be leveraged in a sense, in a translation mechanism. So you could take one modality like speech and make it into an image. You could make the image into a piece of text and you can take music into an image or a video. So once you have a common provenance in the lingua, just like human language, you could go from English to French if you are able to understand the meaning of it without necessarily going directly from English to French, like many of the translation systems do.

Once you have that lingua franca, then we can actually mix and match. So the constraints of one modality are no longer the information that’s expressed in one modality can be easily transformed.

Peter Mullen
It’s so exciting. And it’s hard for people to conceive until they can see it. And when you can look at some of these new apps that enable you to speak what you want to see. And then something gets created, a tapestry right there on your screen, it stuns you and you can feel the walls begin to crumble between these different disciplines and you just give me a new phrase, interrupt franca.

I love that. I love what is happening and how we can think about what this does for the world because of the creativity that it starts to unleash and comes all the way back to customer experience. The ability to have these walls come down creates challenges, curiosities, opportunities, innovation around it. And I’m thrilled that we are so much at the cutting edge of thinking about this in all the right, complex, fun ways.

What else should we discuss right now as we kind of move toward the end of this conversation? If anything that you want to share about what you are specifically working on here at Interactions right now?

Srinivas Bangalore
Sure. So as we talk about Interactions started with the conversational system that was designed by experts, voices of the physics experts, and we have successfully deployed it in many, many different enterprise applications. So the thing that is still a question that we would like to understand is if I were to be given a million conversations between a customer and an agent, would I be able to understand it and design a conversational bot from gleaning all of that information in a way that will facilitate a human machine conversation, at least in parts of the dialog.

It doesn’t have to be best. The entire conversation would not be automated, but are there pieces of that conversation that are so repeated, so formulaic and we modify them and we replace them and have hybrid conversations. We have some parts that are machines and some parts that are human with the same sort of spirit and philosophy that is a hybrid intelligence as opposed to one or the other.

So that’s something that we are actively looking at, engaging with a multitude of tools at Interactions. We’re trying to pin down the value that we can offer when such large amounts of data is available in real time. And the second piece along those lines, speech, as we discussed, is a personal medium for people to express their point. Words are one aspect of that medium.

There are other dimensions to speech. I can tell your age. I can tell your emotion level, I can tell your ethnicity in a sense, based on what we see and what we say and how you say it. Socioeconomics. These are aspects that have to be gleaned out so as to give a personalized experience. I don’t want to talk to a 20 year old the same way I talk to the 60 year old, right?

So those are challenges that we would like to extract and assimilate and design systems that deliver that level of personalization. And so those are a couple of research directions that we are looking at.

Peter Mullen
This is why Interactions is the best solution in the world for this front door experience, for customer support. This type of nuance with everything that we’ve talked about, the different organizational structures, the ethics, the morality associated with it, the speed and the efficiency is what makes us wake up each morning and try to drive this great, as we say, front door experience when someone calls in or someone has an omnichannel experience because 99.9% of people know what they want when they contact customer support.

And if we can make that happen via efficiency, we’ve given them a positive experience, which then of course leads to a better business outcome for all of our customers. Srini, this is fantastic. I hope that you will come back and we continue this type of conversation because there’s so much here that we can dive into.

Srinivas Bangalore
I’ve truly enjoyed it Peter. Thank you.

Peter Mullen
Well, thank you for that. For those of you that have had a chance to watch or listen to this, we do this regularly with our experts at Interactions where we just try to pull back the layers and get into what makes Interactions special and learn from our experts about what they do and why they do it so thank you for joining me.

I’m Peter Mullen of Interactions. Until next time. Thank you.