Using plain language but avoiding AI issues like back-end complexity or hallucinations.

Learn more about your ad choices. Visit megaphone.fm/adchoices

Powered by the WRKdefined Podcast Network. 

[00:00:00] Welcome to PeopleTech, the podcast of WorkforceAI.news. I'm Mark Pfeffer. My guest today is Magnus Revang, the Chief Product Officer of Openstream AI. Their engine, which they call Ava, allows users to interact with technology naturally,

[00:00:28] using plain language but avoiding AI issues like back-end complexity or hallucinations. We're going to talk about true conversational AI, human collaboration, agents, and how you build a solution with data that's never been written down. All on this edition of PeopleTech. Hey Magnus, welcome. You folks are very active in building conversational AI. And I wondered if we could talk about that a bit.

[00:00:58] Can you tell me what sort of led you into wanting to get into conversational AI and what you think its potential is? Of course, yeah. So conversational AI is the ability for a machine or an AI to hold a conversation with a human, a natural language conversation. And really, it's the most natural way to communicate, right?

[00:01:27] If you look at the science fiction movies going back to the 1950s, where people imagine sort of the future operating system, the future way to communicate with machines, they have always ended up pointing out that it's going to be a conversation, a natural English conversation with a machine. And conversational AI has always been a part of computing way back.

[00:01:56] But really, it was, I think, in around 2014, 2015, where this concept of chatbots started getting a rise up in the market. What people didn't tell you at that time was that conversational AI was very little AI at the time.

[00:02:22] So basically, it was the ability to take what somebody said and do two things with it. And it tried to find out, by saying this, what is the intent of the user? What's the intention? And what keywords were mentioned that fills out the slots of the request?

[00:02:50] So basically, that just mapped to a decision tree that were part of a dialogue tree, so to speak, right? And from there, it was very little AI. And that's how conversational AI started out. Now, my interests have always been unscripted conversational AI.

[00:03:15] Conversational AI that can deal with things where the scripts, it becomes too complex for a script. And when BERT first came in 2018, and later on, the GPT versions in 2022 with chatGPT, and instruction following LLMs, things started to become very interesting.

[00:03:43] Now, one thing that you have to think about in conversational AI is that if you're doing something for consumers, you don't need to have full control, right? Just like an LLM, you don't have full control of what it's going to say. But if you're an enterprise where you're doing things like opening accounts, canceling credit cards,

[00:04:10] and doing customer service things, you can't really live with something that is unscripted and low control. You need to have unscripted and high control, right? And that is where I play around with conversational AI, where you have full control of what the machine can say,

[00:04:34] but also the ability of the machine to deal with things that a scripted approach can't. So what would kind of like a thing that scripted approaches can't do is simple things where there is some sort of ambiguity. For example, if I go a simple thing of, what's your name, right?

[00:04:58] Well, for a lot of women, for example, they don't necessarily remember if, oh, did I get this account before I was married or after? Is it my registered with my maiden name or my real name, right? And being able to say both is actually hard when it's a scripted approach, because the scripted approach assumes that you're going down to one

[00:05:28] and you're not splitting and doing both two in parallel to check, for example. So that's a simple example of where you can trip up any scripted chatbot is by just giving two names or, oh, maybe I use my middle name, three names, right? And you start to add to it. Another reason is scheduling, right? Where the natural way to schedule is that I give you a set of constraints.

[00:05:56] So if you ask me, then can we do a podcast next week, right? I will go, hey, any Monday or Tuesday after three o'clock, right? And suddenly I have constraints on it. And you go maybe, yeah, six o'clock on Tuesday. Is that okay? And I, oh, that's a little bit too late, right? Maybe five. So it becomes a negotiation between us in a way.

[00:06:26] And being able to deal with that in a scripted sense is also, you know, almost impossible because the number of branches you would need to, the script would be too many to create anything. So you'd need to have alternate approaches to it. LLMs can come in and do some of it, but with LLMs, you don't have the full control. You don't have the guardrails to say, don't go outside of these things.

[00:06:56] So often you need a more symbolic AI system to determine what to say. And then you can use LLMs to determine how to say it, right? And you can combine the technologies in various ways. And that's how you can get to this form of conversational AI with high control and with kind of the strengths of,

[00:07:24] you know, LLM-based systems in having unscripted approaches. But conversational AI is kind of expanding as well. So for us, I mean, we do things like avatars and emotion and personality detection, being able to render personality and being able to render emotion into both in paraphrasing of what's being said,

[00:07:54] but also in tone of voice if it's going to be rendered for text-to-speech, right? So all of those things come in as well. And it's becoming quite complex. Looking for the inside scoop on payroll? Whether you're a payroll pro or just curious about your paycheck, we've got two podcasts you cannot miss. It's About Payroll delivers the latest trends for payroll pros, and it's about your paycheck is here for employees.

[00:08:22] Breaking down pay and rights in simple terms. Two great shows, two great hosts, and endless insights to keep you informed and empowered. Subscribe now and elevate your payroll knowledge. Brought to you by Work Defined, where payroll meets people. Welcome to the Hire Her Podcast, where women and talent call all the shots. We know, another recruiting podcast, just what the industry needs. Okay, but hear us out. You ever spend hours sourcing that perfect candidate,

[00:08:53] but they end up ghosting you harder than your hinge date? Or finally, you get that hiring manager feedback, only for them to, of course, say, can we see more candidates? Lame, right? But let's talk about what's really lame in the industry. How is it that 65% of recruiters are brilliant women, yet most of the decision makers look like a convention for middle-aged dudes named Brad? That's where we come in. Hire Her isn't just another podcast about hiring trends you already heard like 50 times this week.

[00:09:22] We're here to elevate women in talent, share their career journeys, and make sure they take the lead. Because let's face it, if recruiting is the backbone of every company, why are we still fighting for a seat at the table? So whether you're a recruiter, a talent leader, or just someone trying to swim through the sausage fest of TA leadership, we got you, girl. Welcome to Hire Her, the recruiting podcast where women's voices aren't just heard, but they're leading the conversation. So subscribe now,

[00:09:51] because middle management Brad sure as hell isn't going to do it for you. And how far along are we with this from the end user's point of view? A lot of people say, well, you know, I was expecting Star Trek. You know, I'm waiting for something that's that natural and that good. And what you're describing is very complex. So where are we? I mean, where is the reality leaving us right now? And where do you think it could go?

[00:10:21] So reality is that if you dial back the control a bit and live with uncertainty of things going off the rails and stuff like that, you can do a lot more than if you need full control over the tools being used and that nobody's canceling a credit card off of tricking your agent or whatever.

[00:10:49] So that's kind of the challenge, right? In all of this is that consumer technology, you see all these demos that are super impressive, right? But then you find out that, well, there is limitations in how you can, how you can force it to follow business process, how to follow compliance directives,

[00:11:15] how to follow business policies and stuff like that. If you start to tweak the technology to be able to do that as well, suddenly it gets a lot less impressive because it's basically based on just audio from a lot of conversations that it's mimicking, right? But the training data, you have no way to go through that to see if there's the policy and stuff like that.

[00:11:43] But so this is kind of the balance that we see that enterprises can't necessarily adopt the technology that, you know, demos nice in a voice setting. There's also, you know, if you look at a lot of the voice demos out there, they kind of trick you as well.

[00:12:05] Because if you have a conversation with somebody that is flirty and pays attention to you, laughs, you know, stuff like that, and is very casual, you know, a lot of people see through the faults of it. And they will accept some inconsistencies and stuff like that because it's so nice to have this kind of conversation where I'm the center of attention and stuff like that. But if you're doing customer service,

[00:12:33] you don't want your assistant to go flirting with the customers and, you know, doing stuff like that. There's certain professionality that needs to be there. So for most brands, I should say, there are some brands where you could get away with it. And suddenly, you might not be as tolerant for the assistant. And add to that,

[00:12:59] you're actually mad at the brand because the credit card didn't work or something like that. So you're starting from a... And if somebody comes along and starts to flirt with you and laugh and stuff like that, when I'm actually having a problem, you know, that might not be as good as well. So when you adopt this to the enterprise, there's so many more factors to consider, so many more control mechanisms that you need,

[00:13:29] that a lot of the same technologies can't actually be brought into that domain easily. You know, I always think that conversational AI is kind of like the holy grail of user experience and the user interface. And it makes me wonder, you know, why aren't there more people out there really focusing on developing this? And I wonder, are there a lot of people out there doing it?

[00:13:58] They're just doing it quietly? It seems like sort of a natural. It is. The problem, again, is this balance between control and non-control. Because a lot of, you know, if you look at the field of conversational design, which is a field that was emerged as a way to make these interfaces, these natural language interfaces. And what they concentrated on initially,

[00:14:28] and it's slowly moving away from that, but initially, was the script dialogue. It basically creates scripted dialogue, right? That was seen as compelling and stuff like that. And really, that's not where you want to go. You want to create systems where you're able to have any conversation within that conversational system, right? And that's much harder because there is no standardization.

[00:14:57] There is no standard model of how to do that. We have a lot of technology there, but it's, you know, it's not based on standards. Now, I would say that conversational AI is kind of just the beginning as well. And I think that this is important to say because conversational AI is just the human

[00:15:25] and natural approach to what you can say is unplanned or unscripted interaction. So sometimes you want to bring in the abilities that a phone or a computer have into the conversation in the form of multimodal interaction. So if you're going to, you know,

[00:15:55] point out where did the accident happen? If you're talking to an insurance company, do you want them to kind of say in natural language an address? You know, most are driving on the road, right? So they don't know the address where the accident happened. So, or just show a map and have them just point at the map, right? This, in this area, circle it. And that becomes sort of the input. Or if you're a cable company, right?

[00:16:24] And somebody calls in and, you know, what kind of setup box do you have? Or just take a picture of your setup box. I take a picture of the setup box. I can see what type it is and what lights are on the front of it. So suddenly, I have two pieces of information, right? And I can actually do more than what a human could in that conversation. So I think it's important to think about it

[00:16:54] the more as this, the ability to go back and forth and the ability to go down many paths and have, you know, a long conversation or a short conversation depending on the skill level or the complexity of the problem that's being solved. And I think that is more important, the most important aspect of conversational AI. So I don't think it stops at just being natural language. It can go further than that.

[00:17:22] So how does this align with the work of agents or how do they go together? Because agents are obviously the big thing right now that everyone's talking about. Yeah. And if you ask 10 people about the definition of agent, you will get 12 answers, right? So one thing is that you have agents that is single agents. And single agents is just an encapsulation of an AI system

[00:17:51] with ways to communicate and render out outputs, right? So, you know, a simple LLN could be an agent, right? And a virtual assistant could be an agent or a virtual assistant with an avatar and the ability to detect emotions and all stuff like that. That could be an agent. It's that encapsulation of capabilities around the AI

[00:18:19] and the knowledge base and everything. So agent is sort of like a software construct in a way. Now, the interesting thing is when you start to talk about multi-agent systems as well, where you have multiple agents that are communicating between each other in a system and sometimes also communicating with humans. So human operator or multiple humans

[00:18:49] and there are a lot of AI agents is together solving a problem that might require the agents to communicate between each other and share information. They can be from different vendors, they can be from different systems and stuff like that. And I think that, you know, agents we have had for a long time, multi-agent systems as we see them now,

[00:19:19] you know, although there's multi-agent protocols back from the 1990s, the current ones just rely on natural language as a communication between the agents in a multi-agent system. And with modern kind of large language models, it makes it possible to have that communication and it's very, very interesting what can be done.

[00:19:49] Well, it sounds challenging because you also have to have this conversation with something or somebody that's going to tell all these multiple agents what needs to be done. And I can't imagine that's easy. You know, multi-agent systems are hard to build. They are also, you have to, there's three things you want it to solve, right?

[00:20:19] First, you want it to solve the ability to communicate between agents without having a predefined API or protocol to communicate. So they can use natural language between and they can figure out what is being said and how this applies to me, right? So it can go away with protocols and stuff like that. But you also want it to kind of solve the discoverability. So if I put five agents

[00:20:49] together or I put ten together in a multi-agent systems, I would, I want to do the minimum amount of work as a developer or business user when I invite a new agent in in telling all the other agents what it's capable of and how to use it, right? And the third is, of course, the ability to kind of self-assemble. So the promise of multi-agent systems is that

[00:21:18] they self-discover, they self-assemble and do things like that. Now, reality is that in the agent frameworks that are out there today, very little self-discovery and self-assembly is taking place. That is actually done by the developer. So they are doing the discovery and they are doing the, and they set up a graph of agents or a workflow or something like that to make them work together. You know, but the promise is much grander,

[00:21:48] right? And what current systems look like is more like microservices frameworks with AI. It's not very different. But I do think that with, if you give users the ability to instruct their own agents, create their own specialist agents and then instruct groups of agents what to do and guide them and stuff

[00:22:17] like that, that multi-agent systems can deliver, you know, in a couple of years on that self-assembly and self-discovery problem as well. And that's where it becomes really, really interesting because the capabilities of swarms of dumb AI services is probably going to be much higher than the capability of a single smart AI system. So.

[00:22:48] Let me shift gears for a little bit because there's something else that you guys are up to that really fascinates me which is, you know, this talk of unstructured data, everyone talks about that as a challenge, but you're actually putting together knowledge bases based on unwritten materials. So you haven't even gotten to unstructured yet. Can you talk about that? What are you folks up to? That's cool that you mentioned. So of course

[00:23:17] we have a knowledge ingestion system that takes all kinds of data from Excel, PDFs and stuff like that, and brings that into and create knowledge graphs and stuff like that from it. But the interesting thing when you work with professionals, highly paid experts, is that they have a lot of knowledge that is only in their head and they're not writing it down. It's just how it always been and this is how we do it or, you know, I always do

[00:23:47] this and that. And when we have a conversational AI front end and we have multiple agents behind, what we do is that we capture knowledge from the interaction between the expert user and agents. So if somebody, for example, is doing, writing a report on investments or whatever, an investment report,

[00:24:18] and they go, and you can say things like whenever you encounter this code or that code, they're effectively the same category. Well, you know, that's a thing that the system didn't know, but it's being told as it's working with the expert user, so it stores it for subsequent work bid on that domain.

[00:24:48] And capturing that expert knowledge is, you know, there's so many nuances to it. You have to figure out, oh, is this for this particular case, or is this something that applies to all cases, right? What's the scope of it? Is it a guidance? Is it a correction? Is it a feedback on what I need to do? Is it additional knowledge? Is it an additional instruction? There's all kinds of facets around

[00:25:16] just what something is that somebody says. But the idea, right, and this is what a lot of AI systems, which is strange, a lot of AI systems out there don't learn. They've been trained, but they don't learn, right? And to me, that's a very, very strange thing, because why would you give up the ability to continuously learn?

[00:25:50] you would want there to be supervised checks and balances on what is being learned, but having an AI system that is just trained one time and doesn't learn after that, it can be anything. that, right? That's not the ability of AI to learn is what you really want, and you want that to happen over time, especially when you work with expert users, because you want to capture that

[00:26:19] knowledge, and the more you can capture, the more value the system gets. So it's about capturing more and more over time. How do you capture it? Yeah, it is a different way to do it, but basically you're categorizing what kind of feedback the user is giving. That's the first thing. And you're determining from that feedback if

[00:26:49] this is something that is outside of the scope of this particular task. So is this something that applies to other tasks as well? And if it is, you run that into a system that captures that instruction and you paraphrase it in a way that's generic. And in most cases, we can do it unsupervised, but in most cases you will have a supervised list

[00:27:19] of these are the most common feedbacks given by the expert users, right? And an admin goes in or a super user goes in and goes, oh, these, yeah, I would want to add that and I want to add that and I want to add that. And when those are added, those are added to the system's interpretation of what to do. So it's added to the instructions to the system. And if it's an LLM, you're added to the prompt and then you optimize the prompt and

[00:27:49] stuff like that. Or if it's not an LLM, you have to maybe change the weights and do some reinforcement training or things like that. But it's possible to do it in a lot of different AI systems to capture that and make it part of later runs. So, but, you know, it's interesting because for the super users that are doing this in a supervised fashion as well, they might go and say, hey, oh, yeah, I'm taking

[00:28:18] this for granted, right? Or this isn't just an assumption we always make. I didn't know that that had to be explicitly said. And that makes training of new expert users and stuff like that also easier. because it might make them more aware of that unwritten knowledge. So, it actually also affects the expert users and how good the humans can be in addition to the machines. And I

[00:28:48] think for a lot of the tasks that we use AI for, you're not replacing expert users or highly paid expert users in things like investment like investment banking and insurance underwriting and stuff like that anytime soon, you're helping them be faster, more precise, more effective.

[00:29:17] so they can do the things that humans are really good at, which is kind of the intuition part. They can spend more time on that versus collecting information, researching information, comparing different things to each other and stuff like that that machines are really good at. Magnus, thank you so much. It was great to talk with you. I hope we can do it again. Thank you. Absolutely. Thanks for having me.

[00:29:46] I'll gladly be on again. My guest today has been Magnus Riving, the Chief Product Officer of OpenStream.ai. And this has been People Tech, the podcast of Workforce AI.news. We're part of the Work Defined Podcast Network. Find them

[00:30:13] at www.wrkdefined.com. And to keep up with technology in HR, subscribe to Workforce AI today. We're the most trusted source of news in the HR tech industry. Find us at www.workforceai.news. I'm Mark Pfeffer.