Hello, and welcome to the NVIDIA AI podcast. I'm your host, Noah Kravitz. Digital humans and AI agents are poised to be big news this year. They're already making waves, in fact. At CES 2025 in Las Vegas, Logitech G's Stream Labs unveiled its intelligence streaming assistant powered by technologies from in-world AI and NVIDIA.
The Intelligent Streaming Assistant is an AI agent designed to provide real-time commentary during downtime and amplify excitement during high-stakes moments like boss fights or chases. The collaboration brings together streamlabs, expertise, and live streaming tools, NVIDIA ACE technology for digital humans, including AI vision models that can understand what's happening on screen, and in-worlds advanced generative AI capabilities for perception, cognition, and adaptive output.
But what are digital humans? Where are they going to be used? And where are they going to make an impact in the enterprise and in gaming and entertainment in particular? And how does a designer design in the age of digital humans, agentic AI and beyond? Chris Covert, director of product experiences in world AI, whose press release, by the way, or blog post, I paraphrased heartily from in that introduction. So credit to Chris is here to dive into these points and more.
as we talk about some of the technology that I think is really poised to really make a big impact this year going forward and kind of shape this new area of digital experience for all of us in, you know, what we'll broadly call the AI age. But at any rate, Chris is here. So Chris, thank you so much for joining the AI podcast and welcome. Thank you for having me today. It's a pleasure to be here not only as a longtime partner of NVIDIA, but also as a huge fan of your recent announcements and AI at CES this year as well. Genuinely amazing time to be talking about AI in this space.
It really is. And so you were at the show. You were in Vegas. Couldn't make it personally. I'll be doing dice. I'll be doing GDC, but our team was there. We worked on the project. All right. So let's get into it. Tell us a bit about in world in world AI for listeners who may not know. We can get deeper into the assistant in particular, a little late in the conversation, if that's cool, just so you can kind of set the stage for all of us about what it is that we're going to be talking about.
Yeah, so I, you know, extremely biased. I have the best job in the world. And in world, we make the leading AI engine for video games. And I get to work with the most creative minds in the industry, most creative minds in the world of gaming and entertainment to answer the question, how can we make fun, more accessible to the people that make our favorite experiences? And I'm not going to definitely as you go in sounds like a great job.
It is, it is. And I'm not just blowing smoke here. And we'll definitely, as we go through this, emphasize fun, accessible and people and why I think those are the most important here. But in world's mission is to be that AI engine for this industry, right? The more that the technology progresses, the lower the barrier to entry to make AI experiences. But we find there are still challenges, regardless of if you're a studio with a massive staff or you're an AI
kind of native company building up these experiences from grassroots, that there's a massive gap between prototyping and deployment of AI systems. And then to do that and find the fun for a user for an experience is still incredibly challenging. So we offer not only the platform, but the services to help you create that fun in a deployable manner. So kind of briefly, run down maybe what you guys talked about at CES this year, and then we'll put a pin in the assistant, like I said, and come back.
Yeah, it's awesome. So blog posts is out there. Fantastic videos out there. A lot of demonstrations of what happened, but succinctly, you know, to plug the collaborative effort here in world and video and stream labs got together for CES and we put to the test this, you know, streaming companion that could not only act as a co-host to a streamer, but also serve to support the creator in other ways, like handling production for them on their stream as well.
We showed a demo of this happening at the conference. Always a risk because you're playing in a big space with low, you know, internet bandwidth, but we played this over Fortnite and the streamer, the demo where in this case, and their agentic co-host, you know, chatting during the game, grilling each other for bad rounds, reacting to trends that are happening in chat, cheering together when you manage to get kills. But then interestingly enough, you know, when, when chat or when the streamer wants a replay, streamer has to just ask the agent, they'll clip it, they'll set up the screen for that replay and they'll keep that stream feeling fluid.
So it's one of those use cases, again, trying to align on like where do a Gentic systems actually play in enterprise and this entertainment industry is we're finding they're hyper specialized. And in this use case, we have this perfect, you know, one streamer has to wear all of these different hats during their stream. How can an agentic system come assist them so that they can focus on making the best content and all of the other things in managing that content can be done for them.
So back in the day, and this is part of why we got to come back to it. I did a lot of YouTube stuff and we tried to do some live streaming. This was in like the early mid 2000s. And I remember setting up, you know, open source plugins to try to get graphics over the live stream. And then I had like a secondary camera, you know, like a phone with a camera with a USB cable and all this stuff. But that sounds amazing. So I'm excited to talk about that.
to kind of then back into it or start with the technology and go forward. Agentic AI co-host digital humans. Let's talk about those things. You want to start with digital humans? Because we've talked about Agentic AI and we'll get to it and it has the different, you know, what an agent is or isn't, I think is still a little bit malleable and context specific. But when we talk about digital humans, when Inworld talks about digital humans, what are we talking about?
You know, that's such a good question. And it brings up an important distinction. When most people say digital humans, they're thinking of chat box, text-based tools that respond with mostly pre-programmed scripts to help us specific tasks or guide users through a set of questions or instructions. But in world, we focus on AI agents that go far beyond simple chat. These agents can autonomously plan, take actions, and proactively engage with their environment.
Unlike a typical chatbot, which waits for a user to have some kind of question or response in, and then that AI agent provides a canned response, our agents that enrolled are designed to interpret multiple types of inputs, whether that's speech or text or even vision or sensor data, and then dynamically think and respond in real time. They don't just answer questions, they can initiate tasks, they can adapt to different contexts, and they can carry out sophisticated interactions that make sense within the context of what you're doing.
So if you look at where digital humans are used mostly today, and again, I'm super biased, it's often in the high volume chatbot or, you know, personal digital assistance space. But if you consider where they can have the biggest impact, that's only function as these truly autonomous agents, planning and acting and proactively helping people in ways that traditional chatbots can't. It's not question and answer. It's answer a question that I didn't even know I had yet. And that's the core of our focus in world, building AI agents that can not only look human,
but can think and react and solve problems like one using multiple modalities to interact with their world, in our case, a digital one. When it comes to building with a Gentic AI, do you feel even like you have kind of a solid vision of what is to come even near term? Or is it still very early days of figuring out everything from trying different types of infrastructure to sort of the design and higher level thinking about how to architect these things?
That's a great question. And the reality is we're shifting toward an agentic AI framework, as I mentioned before, and shifting toward that framework is no longer really an option. It's becoming essential for us and our partners. When you need any sense of autonomy or kind of real time or runtime adaptability, latency optimization, any sense of custom decision making, you need a robust framework that lets you take control of that stack.
No one platform is going to serve the needs of our industry as standalone. So building a framework that is flexible, that is robust to the needs and growing needs, changing needs, super diverse needs of our industry is incredibly important. What we're seeing more and more of is enterprises that want to own that architecture end to end.
What I mean by that is they need the flexibility to decide what models to use, where to use them, how data is fed into their systems, how to manage compliance and risk, all of these factors that require really custom logic and custom implementations. So it's adopting this, you know, agentic AI framework.
that really puts the power in their hands. It's not just about future proofing. It's about giving them full control over how AI evolves within their organization at any given time. The industry is going to move quickly. We want to make sure our partners can move just as quickly as they get inspired. But to me, my favorite key differentiator here, future of in-world, why the framework model, why all of these changes, is that such a framework doesn't just guarantee technical flexibility. That's great.
And that's going to make a lot of people happy, but it also opens up the creative design space as well. By collaborating with people that are thinking of the most outrageously beautiful, outrageously wonderfully weird experiences you could possibly imagine and giving them tools that are opened up to really let them craft their own AI architectures. We're really helping push the boundaries of what and how we can deliver innovative experiences.
We made the jokes earlier, but if we say that conversational AI, this chatbot nature is now feeling very entry-level, it's feeling baseline, then what we're doing with this agentic framework is building out that blueprint for the future so that our partners and our customers can help imagine, build, and deploy AI experiences that genuinely didn't feel possible even six months ago, taking that to a whole new level.
If we go more than like two years out, I'm at a 10% confidence level of where this is actually going to be. Yeah, because I appreciate the candor. Yeah, it's just, you know, and I've been doing this a couple of times in my career throughout where I'll work with somebody in advisory capacity where we're looking at the future three to five years out. And by the time we finish our analysis, the thing that we had two years out is now open sourced.
And you're like, cool. So back to the drawing board because the industry moves so quickly that what is possible, it's actually really hard to nail down and we will get into kind of hopefully in this conversation, how designing around AI and these systems is that challenge because I think we've learned some fantastic lessons that hopefully people listening can take away as well. I really see their
You know, when it comes to trying not to sound too obvious, when we're talking about agents, the evolutionary chain of how an agent will grow, how an agent, you know, the technology behind it will proceed in even the near future. It comes all down to agency. And again, it's agents. It sounds obvious. But what I really mean by that is the complexity of that cognition engine. And I'll try and crack this now with an analogy and I'll try and make it fast because I could talk about this. No, take my time. We'll take your time with the analogy.
We have this, you know, first, again, is that conversational AI phase. And I'll use a gaming analogy, right? The conversational AI phase gives avatars, gives agents, I'll use them interchangeably today, extremely little agency in doing anything other than speaking, right? It may be able to respond to my input if I ask it to do something, but it's not going to physically change the state of something other than the dialogue it's going to tell me back. So I think in terms of an analogy, like a city building simulation game,
Okay. You know, in this phase, conversational AI, I can ask it about architecture and it could give me some facts about architecture. I can ask it about plans I have to build this little area and it could give me some advice. I could ask it, hey, how do I limit the congestion or how do I maximize traffic over this little commercial area? And it could tell me in words how to potentially do those things, but it won't be able to do things like place buildings or reconstruct roads or even select the optimal facade for the style of the region because that's completely out of its scope and it can't act.
You can't act. I can't act at all. It is just making sure that I'm fine. Yeah. It is a very rudimentary perception, kind of what is it? What is it seeing? What context does it have and cognition? How is it planning? How is it reasoning? Very simple engines that drive that.
Really, I'm going to talk to it. It's going to say, oh, this person cares about architecture. I'm going to pull everything I know about architecture, turn that into a nice response. We've together their question into my response, and then boom, now we have simple conversation AI, but not too far ahead of that.
And I say that, you know, lovingly and endearingly is a simple task completion AI, right? It's very much still call and response. You have an action engine. You say, you know, I'm having an agent that can only construct new buildings exactly where and when I tell them, I can say place a building at this intersection and it does it boom. I can say change the building to this type and boom, it does it. Right.
While you argue that that requires a little bit more perception, maybe a little bit more reasoning, definitely some action. It's limited to an extremely small set of actions, and it's pretty much just scripting with extra steps. It knows what it's going to expect. It's waiting for me to say it in a way that it can map to that is the place of building action. I'm going to place a building done.
If a language model provides the description, including like, I wish I could change the background to, you know, mountains or whatever, right? Or envision a background with mountains and it can't act, right? Can the system be automated? What you're talking about? So that the language model can tell the action cognition model, what to do, every term used. I have the term large action model in my head, which I have thoughts about that terminology, but I don't know if they're my own thoughts or not.
We can get into it, but you're definitely hinting at, and again, I should have up front said this. There are four phases. We are at phase two. I think a lot of virtual assistants are at that phase two. What you're describing in kind of intent recognition or kind of passive intent recognition even. Yes.
is, you know, there are no formal names to this, but I'll call this like an adaptive partner phase where the AI is observing and responding to changes on its own, right? And this is the natural evolution of where a lot of AI systems are heading today. CES, a good example of that. I have no doubt throughout the year, it'll get really close to getting here as another core standard. I think right now, a lot of the kind of simple task completion AI is a core standard across enterprise and gaming.
this adaptive partner phase where this agent in this analogy, extended analogy, would notice changes like newly built roads or new residents influx into this tiny area and automatically adapt construction plans. It's not micromanaging every decision, but it feels like you're collaborating with an agent or a unit that has just enough context to make smart decisions on its own, like an evolution of a recommendation engine being driven by a cognition engine here.
It's not just learning, but it feels like it's learning what we need even before we ask it. And I expect to see, again, a lot of this in the next year. Again, I think that's phase three. I think there's still a phase four. And I think that's a fully autonomous agent. And that stage, again, continuing our analogy is a player two, where player three is it's adapting to us. Stage four is
Hey, this thing is an agent all on its own. It feels like I'm playing against another human. It is making decisions that feel optimal to its own objectives, alike to mine or not. And I think this is where a lot of people think about Agentic AI. They think we are there today. There's quite a big gap in the deployment of these systems to get to that truly autonomous agent.
That is kind of the gold rush right now is creating a cognition and a perception and an action engine ecosystem that can feel natural to a user.
I'm speaking with Chris Covert. Chris is director of product experiences at N World AI. I made him kind of, I made us kind of put a pin in there, their product announcement to kind of bring this into the, from the abstract a little bit into the concrete. I'm glad you humor me, Chris, because I think kind of digging into what does this stuff actually mean? And it's sort of moving in. One of the things I should say it's kind of moving and evolving the, maybe the lingo is moving targeting, but the tech and what's actually happening is evolving and still so fast.
All right. So let's talk about the big announcement from CES. You said you wanted to talk through kind of the design process a little bit and how you make these choices. So if that's something that makes sense to you to do in the context of talking about the streaming assistant, great. And if not, take it whichever way you want to go first. That sounds perfect. Yeah. So I'll be honest around here, right? In any demo that you see publicly,
Especially when you have more than one company demoing it. There's a lot of design, but also a lot of urgency to show something quickly and cool. So of the things that I'll say are like design axioms that I love to live by. We had to sacrifice some just to make sure that something awesome could be demonstrated at CES. Putting a timeline on creative vision is always challenging.
Right. And that's like you can't, we live in, I mean, I don't know, I don't build software for a living. You do, but we live in such an age of like, we're past the state where like beta became, all right, everybody ships beta, and then we got used to it. And it's like, right, like there'll be updates and downloadable content and you know, this and that, but
Not CS, that's the date, show up or don't. So. Exactly. Exactly. But in terms of design, and this might sound odd coming from a tech startup on the call here today, but my first piece of advice, regardless of who we're working with, is always to ignore the technology and focus solely on the experience that you really want to create. Where is the value in the experience, the why?
And as soon as you can lead, yeah, or not as soon as you can lead with the tech, but as soon as you do lead with the tech, hey, you know, I have an idea what if we had an LLM that could you already are risking thinking too narrowly and missing their opportunities. And I genuinely would love to say the reason behind this is because it's human centered. I come from the intersection of autonomous systems, AI and human centered design.
So I would love to say that it's a human-centered approach to design, and that's why I do it. But in reality is that the technology just advances so quickly. By the time your idea goes into production, that tech-grounded idea that you had now looks like a stone wheel compared to the race car you thought it was six months ago. So again, like...
The urgency behind some of these things is challenging, especially when you come from a tech-grounded design principle standard. I'm a firm believer in the moonshot first approach, where you begin by clarifying why you want to build something before how you decide to build it, especially for anything with a shelf life greater than just a few months at this stage. Again, if you start with how you end up with a bunch of low-hanging fruit and a bunch of low-hanging fruit makes for a really bad smoothie.
So what is a moon shot experience? Like I often refer in design thinking workshops to the classic impact versus feasibility matrix. And this isn't like a very, you know, exciting conversation. So I'll try and keep it high energy. But typically when you're looking at impact versus the smoothie, sorry, I couldn't let the smoothing go unacknowledged, but I didn't want to interrupt us. Well, we'll come back to smoothie. Okay, good.
Because again, this is that it's in that quadrant or it's in that matrix, right? Impact feasibility, your low hanging fruit ideas are the things that are going to be low sensibility or low impact. But typically you're aiming for the upper right quadrant, which is the highest feasibility and the highest value where impact and feasibility are so high, the ideas feel easy to build and have a lot of inherent value in them. But what I've experienced when working with partners and their AI ambitions is that that's actually a trap.
that that quadrant right there is doomed for failure because often the ideas that are the ones kind of worthy of pursuit the most are the ideas that almost feel impossible, but if real would deliver extraordinary impact.
As you get closer to building that value and you're going from idea to prototype to implementation, the technology has likely grown in not only capability, efficiency, but also accessibility in ways that probably outpace our own traditional lens of what is feasible and our imagination of where the tech can go. That sounds like a lot of nothing.
So my actual advice is when we're designing these types of experiences, start from that moonshot that feels impossible and then break it down into a functional roadmap. What you saw at CES in this demo with Streamlabs and NVIDIA and in-world technology creating the streamer assistant.
is actually just horizon one, that insight. What can we build that gives us insight into whether this is something that provides value? Where does it provide value? And to who? The thing we demonstrated at CES was just the first past proof of concept of this capability, the kind of
Upper potential of where this can go, where we want this to go, and where we're going to explore its value is still very much architected out toward a much longer roadmap. And again, showing what we showed at CES and what I love about this industry right now is we're able to show really compelling integrations
with very meaningful use cases in this industry. I'm not about to wax poetic, like our industry is the same as healthcare or things like that in enterprise, but showing digital companions, virtual assistants, digital humans, at the state that we're showing them now, knowing what's to come is an incredible place to look ahead and say, okay, cool, this scratch is an itch that either the market needed or that technologically is doing something that could never be done traditionally before.
where this moves in the next six months to six years is anyone's game. Okay. So two questions, but they're related. One's easy. The other is a follow-up. What is the availability of the assistant? It was demo state very early. Do you have a roadmap for that? I shouldn't say availability, but roadmap.
And that might be the answer to the second part, but what else is on the horizon for you for in-world AI? I mean, you talked a little bit before about the broader horizon for agentic AI and avatars and assistants and such, but you can go back there if you like. What's coming up this year, maybe near term that you're excited about? Well, man, this year being near term is funny at in-world because
We move very quickly. This year is many near-term stacked against the other. Well, right, and I'm figuring we're taping now, but it's going to go live. You know, there's a little bit of a buffer, and we represent. Yeah, so we certainly have productization ambitions for this demo.
What we're doing post CES is we're taking the feedback of what we've built and we're augmenting it in many new ways. Again, what was built for the demo was a proof of concept as many proof of concepts on show floors are it wasn't robust to all the inputs we would want. It wasn't robust to all the games that we would want it to be playable with. So we're trying to build out that ecosystem in an intelligent strategic way so that if it were to go to market, it would be usable to as many streamers as possible who wanted to leverage this type of technology.
Keep your ears on the beat for what's about to come out. I have no doubt that between NVIDIA in-world and Streamlabs, all announcements and all possible show floors that we can show our advancements on will be shown at the right time. So super exciting for that. As it relates to in-world, oh boy.
It's, it's such a fascinating question because a lot of what we're doing, we're doing with partners with such fascinating game development lead times that I hope that we'll be playing more in world driven experiences in the next, you know, when this releases, but also over the next four or five years, depending on the scale of the company we're working with.
I'm genuinely excited for what's to hold as our platform develops again, as we've seen in just the last year or so with more and more competitive players in the space of providing AI tools and then just fantastic partners that are helping this industry become more accessible to people of all different backgrounds. Really hope to see
The, I wouldn't say consolidation of tools, but the accessibility, I'll keep using that word, the accessibility of different AI platforms. Hey, I want to use this model, but in this engine to become a lot easier and in world's goal is certainly to make that happen for as many industries as we can in particular, the gaming and entertainment industry, but it doesn't happen without partners like Nvidia and the work that you guys are doing with ACE. So where I, you know,
where I think in world is going is to make that easier to continue to work with studios to find the fun and to convince every player. And in particular, you know, let me be honest, the YouTube commenters that hey, there is actually a world here where this technology is not only fun and immersive, but it's something that the entire industry views as its gold standard.
So I think we're there a lot sooner than we think. I think it's right around the corner, but could not be more excited to continue to work with creatives to help them tell stories, to help them, you know, flex and use their imagination as much as possible to make the best possible experiences. We do it every day. We may not see it in the near term because games take a while to make, but genuinely excited.
Excellent. Well, I'm all for more fun. The word always needs more fun to counterbalance everything else. Chris, for listeners who would like to learn more about anything specific we might have talked about or just broadly about in-world, where can they go online website, obviously, but are there social handles? Is there a separate blog or even research blog? Where would you direct folks?
Yes, to all of the above, you can find all of that at inworld.ai. And we have our blog for partners and experiences there. We have technical releases. We have research done. All of our socials are linked there. If you want to stay up to date with all the announcements that we make, we have a lot of fun and we like to talk about it. So definitely stay up to date. Fantastic. Well, thanks for taking a minute to come talk about it with us. We appreciate it. And best of luck to you and in world AI with everything you're doing this year. And maybe we can do it again down the road.
I love it, thank you so much and I appreciate it.