Hi, no priors listeners. I hope it's been an amazing 2024 for you all. Looking back on this year, we wanted to bring you highlights from some of our favorite conversations. First up, we have a clip with the one and only Jensen Hwan, CEO of NVIDIA, the company powering the AI revolution.
Since our 2023, no priors chat with Jensen, NVIDIA's tripled in stock price, adding almost $100 billion of value each month of 2024 and entering the $3 trillion club. More recently, Jensen shared his perspective again with us, this time on why NVIDIA's no longer a chip company, but a data center ecosystem. Here's our conversation with Jensen.
NVIDIA has moved into larger and larger, let's say, unit of support for customers. I think about it going from single chip to server to rack and VL72. How do you think about that progression? What's next? Could NVIDIA do full data center? In fact, we built full data center. The way that we build everything, unless you're building
If you're developing software, you need the computer in its full manifestation. We don't build PowerPoint slides and ship the chips.
and we built a whole data center. And until we get the whole data center built up, how do you know the software works until you get the whole data center built up? How do you know your, you know, your fabric works and all the things that you expected the efficiencies to be? How do you know it's going to really work at scale? And that's the reason why it's not unusual to see somebody's
actual performance be dramatically lower than their peak performance as shown in PowerPoint slides. Computing is just not what it used to be. I say that the new unit of computing is the data center. That's to us. That's what you have to deliver. That's what we build. We build a whole thing like that. For every single thing, every combination, air-cooled, x86, liquid-cooled, grace,
Ethernet, Infiniban, EnvyLink, No EnvyLink, you know what I'm saying? We build every single computer issue. We have five supercomputers in our company today. Next year, we're going to build easily five more. So if you're serious about software, you build your own computers. If you're serious about software, then you're going to build your whole computer and we build it all at scale. This is the part that is really interesting. We build it at scale and we build it vertically integrate. We optimize it full stack and then, and then we disaggregate everything.
and we sell it in parts. That's the part that is completely utterly remarkable about what we do. The complexity of that is just insane. And the reason for that is we want to be able to graft our infrastructure into GCP, AWS, Azure, OCI. All of their control planes, security planes are all different. And all of the way they think about their cluster sizing, all different.
But yet we make it possible for them to all accommodate NVIDIA's architecture so that CUDA could be everywhere. That's really in the end the singular thought that we would like to have a computing platform that developers could use that's largely consistent. Modulo 10% here and there because people's infrastructure are slightly optimized differently. Modulo 10% here and there, but everything they build will run everywhere.
This is the one of the principles of software that should never be given up and we protected quite dearly. It makes it possible for our software engineers to build once run everywhere. That's because we recognize that the investment of software is the most expensive investment. It's easy to test.
Look at the size of the whole hardware industry, and then look at the size of the world's industries. It's $100 trillion on top of this $1 trillion industry, and that tells you something. The software that you build, you have to basically maintain for as long as you shall live. We, of course, have to mention our conversation with the lovely Andrei Karpoffi, where we dig into the future of AI as an exocortex, an extension of human cognition.
Andra, who's been a key figure in AI development from open AI to Tesla to the education of us all, shares a provocative perspective on ownership and access to AI models and also makes a case for why future models might be much smaller than we think. If we're talking about a exo cortex that feels like a pretty fundamentally important thing to democratize.
How do you think like the current market structure of what's happening in LN research, you know, there's a small number of large labs that actually have a shot at the next generation progressing training? Like, how does that translate to what people have access to in the future?
So what you were kind of alluding to maybe is the state of the ecosystem, right? So we have kind of like an oligopoly of a few closed platforms. And then we have an open platform that is kind of like behind, so like metallama, et cetera. And this is kind of like mirroring the open source kind of ecosystem. I do think that when this stuff starts to, when we start to think of it as like an exocortex,
So there's a saying in crypto, which is like, not your keys, not your notes. Yeah. Like, is it the case that if it's like not your weights, not your brain? That's interesting, because a company is effectively controlling your exocortex in their product. Yeah, it starts to feel kind of invasive. This is my exocortex. I think it will care much more about ownership, yes. Like, yeah, you realize you're renting your brain. Like, it seems strange to rent your brain. The thought experiment is like, are you willing to give up ownership and control to rent a better brain? Because I am.
Yeah, so I think that's the trade-off, I think. We'll see how that works, but maybe it's possible to like, by default, use the closed versions because they're amazing, but you have a fallback in various scenarios. And I think that's kind of like the way things are shaping up today, even, right? Like when APIs go down on some of the closed source providers, people start to implement fallbacks to like the open ecosystems, for example, that they fully control and they're in, and they feel empowered by that, right? So maybe that's just the extension of what we'll look like for the brain is you fall back on the,
Open source stuff should anything happen, but most of the time you actually. So it's quite important that the open source stuff continues to progress. I think so, 100%. And this is not like an obvious point or something that people maybe agree on right now, but I think 100%. I guess one thing I've been wondering about a little bit is.
What is the smallest performant model that you can get to in some sense, either in parameter size or everyone to think about it? And some a little bit curious about your view. You've thought a lot about the distillation, small models. I think it can be surprisingly small. And I do think that the current models are wasting a ton of capacity, remembering stuff that doesn't matter. They remember shahashes, they remember the ancient
Because the data set is not curated. Yeah, exactly. And I think this will go away. And I think we just need to get to the cognitive core. And I think the cognitive core can be extremely small. And it's just this thing that thinks, and if it needs to look up information and knows how to use different tools. Is that like three billion parameters? Is that 20 billion?
I think even a billion surprises will probably get to that point and the models can be very, very small. And I think the reason they can be very small is fundamentally, I think, just like distillation works, and maybe like the only thing I would say. Distillation works surprisingly well. Distillation is where you get a really big model or a huge amount of computer or something like that, supervising a very small model.
Our conversation with Brett Taylor, OpenAI board member and founder of Sierra, painted a really different picture of how we interact with businesses in the future. Here's a clip of Brett explaining company agents and why the website is going to take a backseat.
The other category, which is there that my company, Sarah works in, is what I call company agents. And it's really less simply about automation or autonomy, but in this world of conversational AI, how does your company exist digitally? I would use the metaphor of it. We're in 1995.
You know, if you existed digitally about having a website and being in Yahoo directory, right? In 2025, existing digitally will probably mean having a branded AI agent that your customers can interact with to do everything that they can do on your website, whether it's, you know, asking about your products and services, doing commerce, doing customer service.
Um, that domain I think is shovel ready right now with card technology, because again, like the persona based agents, it's not boiling the proverbial ocean. Technically, you know, you have well defined processes for your customer experience, well defined systems that are your systems of record. And it's really about saying in this world where we've gone from websites to apps to now conversational experiences, what is the conversational experience you want around your brand?
And it doesn't mean it's perfect or it's easy. Otherwise, we wouldn't have started a company around it, but it's least well defined. And I think that right now in AI, if you're working on artificial general intelligence, your version of agent probably means something different and that's okay. That's just a different problem to be solved. But I think, you know, particularly in the areas that Sierra works and a lot of the companies that you all have invested in is this saying, you know, are there some shovel ready opportunities right now with existing technology? And I absolutely think there are.
Can you describe the shoveling cycle of building a company agent? What is the gap between research and reality? What do you invest in as an engineering team? How do you understand the scope of different customer environments? What are the factors of investment here?
And maybe, as a starting point, it may even be worth also defining what are the products that Sierra provides today for its customers, and then where do you want that to go? And then maybe we can feed that back into what are the components of that? Because I think, obviously, folks are really emerging as a leader in your vertical, but it'd be great just for a broader audience to understand what you focus on.
Yeah, sure. I'll just give a couple of examples to make it concrete. So if you buy a new Sonos speaker or you're having technical issues with your speaker, you get the dreaded flashing orange light, you'll now chat with the Sonos AI, which is powered by Sierra to help you on board, help you debug whether it's a hardware issue, Wi-Fi issue.
things like that. If you're a SiriusXM subscriber, their AI agent is named Harmony, which I think is a delightful name. It's everything from upgrading and upgrading your subscription level to if you get a trial and you purchase a new vehicle, speaking to you about that.
Broadly speaking, I would say we help companies build branded customer facing agents. And branded is an important part of it. It's part of your brand experience. And I think that's really interesting and compelling because I think just like when I go back to the proverbial 1995, your website was on your business card. It was the first time you had some of this digital presence. And I think the same novelty and probably we'll look back at the agents today with the same sense of, oh, that was quaint.
You know, I remember if you go back to the way back machine, you look at early websites, it was either someone's phone number, and that's it. Or it looked like a DVD intro screen with like lots of graphics, you know, a lot of the agents that customers start with are often around areas of customer service, which is a really great use case. But I do truly believe if you fast forward three or four years, your agent will encompass all that your company does. I use this example before, but I like it. But just imagine an insurance company, all that you can do when you're engaged with them.
Maybe you're filing a claim. Maybe you're comparing plans. We were talking about our kids earlier. Maybe you're adding your child to your insurance premium when they get old enough to have a driver's license. All of the above, you know, all of the above will be done by your agent. So that's what we're helping companies build.
Next, we talk to the Sora team at OpenAI, which is building an incredibly realistic video AI generation model. In this clip, we talk about their research and how models that understand the world fit into the road to AGI. Is there anything you can say about how the work you've done with Sora sort of affects the broader research roadmap? Yes, so I think something here is about
the knowledge that Sora ends up learning about the world, just from seeing all this visual data. It understands 3D, which is one cool thing because we haven't trained it to. We didn't explicitly bake 3D information into it whatsoever. We just trained it on video data and it learned about 3D because 3D exists in those videos and it learned that when you take a bite out of a hamburger that you leave a bite mark. So it's learning so much about our world and
when we interact with the world, so much of it is visual. So much of what we see and learn throughout our lives is visual information. So we really think that just in terms of intelligence, in terms of
leading toward AI models that are more intelligent, that better understand the world like we do. This will actually be really important for them to have this grounding of like, hey, this is the world that we live in. There's so much complexity in it. There's so much about how people interact, how things happen, how events in the past end of impacting events in the future, that this will actually lead to just much more intelligent AI models more broadly than even generating videos.
It's almost like you invented the future of visual cortex plus some part of the reasoning parts of the brain or something, so simultaneously. Yeah, and that's a cool comparison because a lot of the intelligence that humans have is actually about world modeling, right? All the time when we're
thinking about how we're going to do things. We're playing out scenarios in our head. We have dreams we're playing out scenarios in the head. We're thinking in advance of doing things. If I did this, this thing would happen. If I did this other thing, what would happen? So we have a world model and building Sora as a world model is very similar to a big part of the intelligence that humans have.
How do you guys think about the sort of analogy to humans as having a very approximate world model versus something that is as accurate as like, let's say a physics engine in the traditional sense, right? Because if I, you know, hold an apple and I drop it, I expect it to fall at a certain rate. But most humans do not think of that as articulating a path with a speed as a calculation. Do you think that sort of learning is like parallel in large models?
I think it's a really interesting observation. I think how we think about things is that it's almost like a deficiency in humans that it's not so high fidelity. So the fact that we actually can't do very accurate long-term prediction when you get down to a really narrow set of physics is something that we can improve upon with some of these systems. And so we're optimistic that Sora will supersede that kind of capability and will, in the long run, enable it to be
more intelligent one day than humans as world models. But it is certainly an existence proof that it's not necessary for other types of intelligence. Regardless of that, it's still something that Sora and models in the future will be able to improve upon.
So it's very clear that the trajectory prediction for throwing a football is going to be better than the next versions of these models in minus, let's say. If I could add something to that, this relates to the paradigm of scale and the better lesson a bit about how we want methods that as you increase, compute, get better and better. Something that works really well in this paradigm is doing the simple but challenging task of just
predicting data. And you can try coming up with more complicated tasks, for example, something that doesn't use video explicitly, but is maybe in some space that simulates approximate things or something. But all this complexity actually isn't beneficial when it comes to the scaling laws and how methods improve as you increase scale. And what works really well as you increase scale is just
predict data. And that's what we do with text. We just predict text. And that's exactly what we're doing with Visual Data with Sora, which is we're not making some complicated, trying to figure out some new thing to optimize. We're saying, hey, the best way to learn intelligence in a scalable matter is to just predict data. That makes sense. And relating to what you said, Bill, predictions will just get much better with no necessary limit that approximates humans.
We also sat down with Dimitri Dolgov, co-CEO of Waymo. Today, the company is scaling its self-driving fleet, completing over 100,000 fully autonomous rides per week in cities like San Francisco and Phoenix. It's my favorite way to travel. In this trip, Dimitri explains why achieving full autonomy, removing the driver entirely, and achieving 100% accuracy rather than 99.99% accuracy in self-driving is much harder than it might appear.
Why is it breaking from like, let's say, advanced driver assistance that seems to work in more and more scenarios versus, let's say, full autonomy? What's the delta? Yeah. It's the number of nights.
And it is the nature of this problem. If you think about where we started in 2009, one of our first milestones, one of the goals that we set for ourselves was to drive 10 routes. Each one was 100 miles long all over the Bay Area, freeways, downtown San Francisco, around Lake Tahoe, everything. And you had to do 100 miles with no intervention. So the car had to drive autonomous from beginning to end. That's the goal that we created for ourselves.
About a dozen of us took us maybe 18 months, which you've done. 2009, no image net, no confidence, no transformers, no big models, tiny computers. Very easy to get started. It's always been the property. And with every wave of technology, then all we have very easy to get started. But that the hard problem, that early part of the curve has been getting steeper and steeper. But that's not where the complexity is. The complexity is in the long tail of the many, many, many nights.
And you don't see that if you go for a prototype, if you go for a driver's system, and this is where we've been spending all of our, that's the only hard part of the problem. And I guess nowadays, it's always been getting easier with every technical cycle. So nowadays, you can take with all of the advances of NAI, and especially in the generative AI world and the LLMs and BLMs. You can take an almost off the shelf, transformers are amazing.
VLMs are amazing. You can take a VLM that can accept images or video, and it has a decoder where you can give it a text prompt and a lot of book text, and you can fine-tune it with just a little bit of data to go from, let's say, camera data on a car to instead of words to trajectories or whatever decisions you want. You just take the thing as a black box,
Yeah, you take whatever's been trained for a living, let me fine tune it a little bit. And like that without, you know, that's, I think if you ask any good gristing in computer science to build, you know, maybe today, this is what they would do. Yeah. And out of the box, that's amazing, right? The power of transformers, the power of realism is mind-blowing, right? So with just a little bit of effort, you get something on the road and it works. You can, you know, drive, I don't test hundreds of miles and we just, it will blow your mind.
But then is that enough? Is that enough to remove the driver and drive, you know, millions of miles and have a safety record? You know, that is dumb. It's really better than humans. No, right. I guess this is, you know, with every tech, you know, evolution, technology and a breakthrough in AI, they've seen that appreciate it.
Up next, we have my dear friend Dylan Field, CEO Figma. Dylan shares his production for how user interfaces will evolve in an AI-driven world. While many predict a shift toward conversational or agent-based interfaces, Dylan suggests that new interface paradigms will complement existing ones. He also highlights the exciting potential of visual AI and intelligent cameras as the next frontier in input methods.
How do you think about the shift in UI in general that's going to come with AI? A lot of things are kind of collapsing in the short run into chat interfaces. There's a lot of people talking about a feature agentic world, which does away with most UI altogether. And this is all programmatic stuff happening in the background.
How do you think about where UI is going in general right now? I think this kind of comes back to the rabbit point I was making earlier. Yes, there's a lot of innovation happening in terms of agents, but I think in terms of the way that we use UI to interact with agents, we're just the beginning.
And I think that the interfaces will get more sophisticated. But also, even if they don't, I suspect that it's just like any new media type when it's introduced is not like the old media types go away, right? Just because you have TikTok.
doesn't mean that you no longer watch YouTube. Even if it's true that a new form of interaction is via chat interfaces, which I'm not even sure I believe, but even if we take that as a prior on the No Priorist podcast, then I think that you still have UI. And actually, I think you have more UI and more software than before. Do you have any predictions in terms of multi modality?
Do you think there's more need for voice? A lot of the debase people have is, when are you going to use voice versus text versus other types of interfaces? You could imagine arguments in all sorts of directions in terms of, when do you use what and things like that? A lot of people are suggesting because of the rise of multi-modal models, you'll have more voice input or more things like that because you'll be able to do real-time.
sort of smart, contextual, semantic understanding of like conversation. And so you have more of a verbal conversational UI versus text-based UI. And so it kind of changes how you think about design. So as is curious, if you have any thoughts on that, that's what a future looking stuff.
There's all sorts of contexts where a voice UI is really important. And I think that it might be that we find that voice UIs start to map to more traditional UIs because it's something that you could obviously do in a more generalized way.
But yeah, I mean, personally, I don't want to navigate the information spaces that I interact with every day all day via voice.
I also don't want to do it in Minority Report style on the Vision Pro exactly either. Maybe with a keyboard and mouse and an amazing Vision Pro monitor setup or Oculus, that could be cool. But I don't want to do the Minority Report thing. And so it's interesting because I think that we get these new glimpses at interaction patterns that are really cool.
And the natural inclination is to extrapolate and say they're going to be useful for everything. And I think that they have like sort of their role. And it doesn't mean that they're going to be ubiquitous across every interaction we have. But that's a natural cycle to be in.
And I think it's good. It's healthy to have sort of that almost mania around what couldn't do because if you don't have that, then you don't get to find out. And so I'm supportive of people exploring as much as possible because that's how you kind of progress on HCI and figuring out how to use computers and to the fullest potential that could be possible.
One of the things I am really bullish on is, I mean, you just think of as an input mode or a peripheral, but it's really hard for people to describe things visually. And so the idea of intelligent cameras, even in the most basic sense.
Oh, it worked. It worked. I think that's actually a really fun space to be, as you said, like exploring, because I actually think that will be useful. And it's something that every user is capable of, right? Taking pictures, capturing a video. And so I think that'll be, I'm pretty bullish on that.
To wrap up our favorite moments from 2024, we have scale CEO Alexander Wang. In this clip, he shares his bold take on the road to AGI. Alex also dives into why generalization in AI is harder than many think, and why solving these niche problems and more data in EVALS is key to advancing the technology. Something you believe about AI that other people don't.
My biggest belief here is that the path to AGI is one that looks a lot more like curing cancer than developing a vaccine. And what I mean by that is I think that the path to build AGI is going to be in
you know, you're going to have to solve a bunch of small problems that we don't get that much positive leverage between solving one problem to solving the next problem. And there's just sort of, you know, it's like curing cancer, which is you have to then zoom into each individual cancer and solve them independently. And eventually over a multi decade timeframe, we're going to look back and realize that we've
We've built AGI, we've cured cancer, but the path to get there will be this quite plotting road of solving individual capabilities and building individual data fly. We'll still support this end mission. Whereas I think a lot of people in industry paint the path to AGI is like, eventually we'll just, boop, we'll get there. We'll solve it in one fell swoop.
I think there's a lot of implications for how you actually think about the technology arc and how society is going to have to deal with it. I think it's actually a pretty bullish case for society adapting the technology because I think it's going to be consistent slow progress for quite some time and society will have time to fully acclimate to the technology that develops.
When you say solve a problem at a time, if we just pull away from the analogy a little bit, should I think of that as generality of multi-step reasoning is really hard, as Monte Carlo tree search is not the answer that people think it might be. We're just going to run into scaling walls. What are the dimensions of solving multiple problems?
I think the main thing fundamentally is I think there's, there's very limited generality that we get from these models. Um, and even for multimodality, for example, uh, my understanding there's no positive transfer from learning in one modality to other modality. So like training off of a bunch of video doesn't really help you that much with your text problems and vice versa.
And so I think what this means is like each sort of each niche of capabilities or each area of capability is we're going to require separate flywheels, data flywheels, to be able to push through and drive performance. You don't yet believe in video as basis for world model that helps.
I think that is great narrative. I don't think there's strong scientific evidence of that yet. Maybe there will be eventually. But I think that this is the, I think the base case, let's say, is one where, you know, there's not that much generalization coming out of the models. And so we actually just need to slowly solve lots and lots of little problems to ultimately result in AGI.
Thank you so much for listening in 2024. We've really enjoyed talking to the people reshaping the world for AI. If you want a more deeply dive into any of the conversations you've heard today, we've linked the full episodes in our description. Please let us know who you want to hear from and what your questions are for next year. Happy holidays.
Find us on Twitter at NoPryersPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priers.com.