If you ask people what AGI was, they would say it's a model that you can actually interact with. It passes the Turing test. It can look at things. It can write code. It can even draw an image for you. Yeah. And like, we've had this for years. And if you said, OK, well, what happens when you get all those capabilities? They say, well, everybody's out of a job and game over for humanity. And none of that is happening. I think in the big picture, we're reaching that bottleneck for pre-training and data. But now we have this new mechanism with reasoning and test time compute.
What we're going to see out of reasoning is that it's really going to unlock the possibility of agents to do actions on your behalf, which has sort of always been possible, but it's just never been quite good enough. And you really need a lot of reliability. I think that is now in sight.
Hey guys, we have a real treat today. Bob McGrew, formerly chief research officer at OpenAI. You were a part of building a lot of the research team. You know, what was that like early at OpenAI?
The really interesting thing about OpenAI is that I did not originally intend to go to a research lab. When I left Palantir, I wanted to start a company. I had a thesis that robotics would be the first real business that was built out of deep learning. This was back in 2015. And I talked my way into a friend's nonprofit. I never had a badge, but I would go in, he'd open the door for me. And I learned deep learning by teaching a robot how to play checkers from vision.
And in the process of doing this, I learned a lot about robotics, and I learned that robotics was definitely not the right startup to start in 2015 or 2016. I ended up going to OpenAI basically because it was a place full of very smart people, and it had big ambitions. It was a place where I could really learn.
I had all this management experience from Palantir, but it was just a place for me to really become an expert in deep learning. And, you know, from there, figure out what it could actually be used and applied for. What were some of the earliest things that you remember working on and how did that play into what everyone knows OpenAI to be now?
Yeah, when opening I started, the goal was always to build AGI. But the theory early on was that we would build AGI by doing a lot of research and running a lot of papers. And we knew that this was a bad theory. I think, you know, for a lot of the early people who are startup, you know, Sam, Greg, myself.
It felt sort of painful and a little academic, but at the same time, it was sort of what we could do at the time. And so some of the early projects, I worked on a robotics project where we took a robot hand, a humanoid robot hand, and we taught it to solve Rubik's cube. The idea in doing that was that if we could make the environments complicated enough, the artificial intelligence would be able to generalize out of the narrow domain it was taught.
and learn something more complicated, which was one of the ideas that later we come back, we see coming back with LLMs. The other really early big project was solving Dota 2. So there's a long history of solving games as a path towards building better AI from Othello to Go. And after beat and go, the next hardest set of games are actually video games. They're not very classy, but they're a lot of fun.
And I can assure you that mathematically they were harder. And so DeepMind went after StarCraft, OpenAI went after Dota 2. And there was real insight that was generated there, which was that it really strengthened our belief that scale.
was the path to improving artificial intelligence. That with Dota 2, the secret idea was that we could take huge amounts of experience and feed it into a neural network. And that the neural network would actually learn and generalize from that. And later, we actually went back and applied this to the robot hand and not became the key idea for the robot hand.
And at the same time as these two big projects were going on, Alec Radford was experimenting with language. And the core idea behind GPD-1 is that if you have a transformer and you apply this super simple objective of guessing the next token, guessing the next word, that that would be enough signal that you could actually have something that would be able to generate coherent text. And in retrospect, it sounds sort of obvious, right?
you know, clearly this was going to work. But no one thought this would work at the time. Alec, you know, really had to persevere for years in order to make this work. And that became GPD-1. And then after GPD-1 seemed successful, we brought in the ideas from Dota and from the robot hand of training at, you know, larger and larger amounts of scale and training on a really diverse set of data and looking for generalization. And together that brings you to GPD-2 and GPD-3 and GPD-4.
So one of the things that your OpenAI really pioneered and sort of figured out was this concept of scale. How is it that it was OpenAI that made the right decisions and sort of found large language models first?
early on, there were sort of, you know, a couple big projects, as I said, and then some room for exploratory research. And at the very earliest days, the exploratory research was really about, you know, it's about what the researcher wanted to do, but also it was about sort of the company's opinion. And in this, it was primarily formed by Ilio with influence from a lot of people, but I think Ilio was really the guiding light here early on.
Sometimes I think about the open AI culture and I like to oppose it to sort of Google brain and to deep mind. And so early on the deep mind culture was, I, you know, a caricature is Demis had a big plan and he wanted to hire a bunch of researchers so he could tell them to move forward with his plan.
And Google Brain said, let's rebuild academia. Let's like bring in all these super talented researchers. Let's not tell them anything. Let's just let them figure out what they want to do, give them lots of resources, and hope that amazing products pop out. And of course, they did, but they didn't necessarily happen at Google.
And we took a different approach, which was really more like a startup, where there was no sort of big centralized plan. But at the same time, people didn't have wasn't just sort of let 1,000 flowers bloom. It said we had opinions about what needed to be done and things like how do you show scale as a way of making your idea get better.
And that opinion was set by the research leadership, you know, again, early on, people like Ilya, people like Dario. That was how we made sure that we didn't just sort of throw resources at everybody. But neither did we have just one set of ideas that were there. We found this sort of happy medium between the two.
I guess one of the critiques of maybe pure academia or some of the AI research labs, we don't have to name any of them. But we've heard stories about looking at the number of researchers on any given paper. There might be way more people on it. And if you really dig into some of the papers, they look like maybe a little bit of this plus a little bit of that.
And that sort of reflected the nature of that's what it took to get compute. And this is at other AI labs. Like, I mean, what was it about OpenAI? Were you able to sort of avoid that? Well, I think that paper example is a really interesting example because I think that's sort of both good and bad. I am hugely positive on academics and researchers, but actually pretty negative on academia. I think academia is good for this very narrow thing of small groups,
trying out crazy ideas. But academia has a lot of incentives that prevent people from collaborating. And in particular, in academia, there's this obsession with credit. One of the things that's interesting about the way that papers have turned out in big labs is that early on, we made the decision that we would try to be as Catholic as possible in putting everybody's name on it. And one of the early robotics papers, we actually said cite as open AI.
because we didn't want to get into a fight. The first author is the one who gets cited and their name shows up every single time. So he said, we're not going to try to have this fight. We're not going to say who is the person who really did it. We're just going to say sight is open AI. And I think that is actually a really important cultural piece, the ability to accept that people want
credit, but to be able to channel it into, you know, it's your internal reputation, not, you know, the position you have on the paper that really matters. And for a long time, OpenAI had didn't really have any titles except for, you know, always a CEO title, right? But didn't really have a lot of titles within the organization itself. But people always knew who the great researchers were.
Once you have the scaling laws and certainly how AI research is being done now, there's sort of the shift where basically scale is all you need for increasingly more and more AI domains. It's sort of potentially coming through and image diffusion models or earlier to bring it back to what you're starting out with.
there's some sense that similar principles to scaling laws actually do apply in the right domains in robotics. Is that sort of one of the things that you're seeing or how would you respond to that? I think if you look at AI progress, you see scaling laws all over the place. And so the interesting question is, well, if scaling laws exist and they're commonplace,
What does that mean? What does that mean for you if you're a company, if you're a researcher, if you're trying to make things better? Why didn't we take advantage of scaling laws earlier in these other domains? Well, I think we were really trying to. Usually in order to... The first step is actually getting to a scaling law to take an example that's not LLMs. If you think about Dolly, which was, how do you take text and make an image out of it?
that I think Aditya Rama as she built that model spent 18 months, maybe two years just getting to the first version that that clearly worked. So I remember he'd be working on this and Iliad would come and show me he'd be like, you know, Aditya's been working on this for a year. He's trying to make a pink panda that's skating on ice because it's something that's clearly not in the training set. And here's an image and you can see it's like pink up there and white down there. It's really beginning to work. And I would look at that and I'd be like,
Really? I mean, maybe, maybe, I don't know. But just getting to that point where it sort of plausibly begins to work is a huge difficult problem. And it's completely separate from using scaling laws. Now, once you get it to work, that's when scaling laws come into play. And with scaling laws, you have two hard things that you can do.
One of them is just the pure scale itself scaling is not easy. It is in fact probably the practical problem in any sort of model building and it's a systems problem. It's a data problem.
It's an algorithmic problem, even if you're just trying to scale the same architecture. The second thing you can do is you can try to change the slope of the scaling law or just bump it up a little bit. And that is searching for better architectures, searching for better optimization algorithms, all of the algorithmic improvements that you can do. And if you put all of those together, that is what explains the very fast progress that we're seeing in AI today.
I guess that is one of the bigger debates that's ongoing certainly out there in the community. Are the scaling laws going to continue to hold, or are we hitting some sort of bottlenecks? I don't know how much you can talk about it, but what's your view at this point on, you know, maybe LLM scaling, but certainly other domains too?
It is definitely the case that there is a data wall and that if you take the same techniques that we were using to scale LLMs, you know, at some point you're going to run into that. The thing that's been really exciting, of course, is, you know, going from.
the LLM scaling of pre-training where you're just bringing bigger and bigger corpses and trying to predict the next token, and shifting gears, and using techniques like reasoning, which, you know, opening eyes shipped in its 01 and 03 models, and Gemini has now also shipped in Gemini flash thinking.
You know, if you think about Moore's law, right, you know, with Moore's law, you see, you know, the Moore's law is sort of one big exponential curve, but it's actually the sum of a bunch of little S curves. And you start off with Dennard scaling. And at some point, that breaks. But if you look, if you think about how NVIDIA has gone, Moore's law has continued. It's just come through a different mechanism.
So you like solve some bottleneck, but then you ask her of that particular solution, but there are other places in your other bottlenecks. And then you have a new bottleneck and you have to go attack that. And so, you know, I think, I think in the big picture, we're reaching that bottleneck for pre-training and data. Are we exactly there? It's a little hard to say, but now we have this new mechanism with reasoning and test time compute. I think if you go back and you think about what it took for
AI for building AGI for, you know, I would say the last five years, people have thought that people at the big frontier labs have felt that, you know, step one was pre-training and that the remaining gap to have something that could scale all the way to AGI was reasoning. Some ability to take the same pre-trained model and have the ability to give it more time to think or more compute of various kinds and get a better answer at the other end.
And now that that has been cracked, at this point, I think we actually have a very clear path to just focus on scaling. We were talking about that the zero to one part that's not about scaling. I think there's a really strong case to be made that in LLMs, that's not relevant anymore. And that now we're in the pure scaling regime.
I'm pretty impressed by the five levels of AGI, and it feels like things are basically playing out the way that original post on the OpenAI website sort of described it. It's, you know, reasoners are here, and then I'm hearing a ton about innovators. So taking a thing like O3, or, you know, maybe when O3 Pro comes out, that'll be a real moment where you can
hook that up to a BioLab and have autonomous exploration of scientific spaces. What can you say about that stuff? The really interesting thing about that is we're probably going to be blocked for now on the ability of the models to work in the physical world. It's going to be a little strange. We're probably going to have a model that can explore scientific hypotheses and figure out how to run experiments with them before we have something going to actually run the experiments themselves.
And so maybe that's one of those new S curves. We're back to robotics then. Yeah, exactly. And we're back to robotics. The other thing that I think is really interesting that the reasoning models enable is agents. And it's a very generic term. It's probably a little overplayed. But fundamentally, what reasoning is, is it's the ability for a model to have a coherent chain of thought that is steadily making progress on a problem over a long period of time.
And the techniques that give that to you in terms of thinking harder also apply to taking action, you know, in the real world in the virtual world. I think the what we're going to see out of reasoning out of long thinking is that it's really going to unlock the possibility of.
agents to do actions on your behalf, which has sort of always been possible, but it's just never been quite good enough. And you really need a lot of reliability. And in order for you to be willing to wait five minutes or five hours in order for something to happen, it's got to actually work at the end. And I think that is now in sight. The thing that prevents people from trusting an agent to do the action is that mainly a frequency of how often is that action the correct action versus the wrong action. Yeah.
There's a rule of thumb that I like, which is basically, if you wanna add a nine, if you wanna go from 90 to 99% or 99 to 99.9%, that's maybe an order of magnitude increase in compute. And historically, we've only been able to make order of magnitude increases in compute by training bigger models.
And now, with reasoning, we're able to do that by letting the models think for longer. And look, letting the models think for longer, this is a really hard problem. With O1, with O3, you're getting longer and longer chains. It requires more scaling. We just talked about scaling as the central problem. So this is not easy. It's not done by any means. But there's a very clear path now that allows you to get to those higher and higher levels of reliability. And I think that unlocks so many things downstream.
What do you think's happening with like a distillation? I was looking at some of the sort of capability graphs of some of the mini models. And it sounds like basically the mini models increasingly are getting better and better. Is that like sort of a function of parent models, teaching, you know, sort of child models or, you know, what's happening there and what can people expect?
Yeah, I think over the last year, the big frontier labs and a lot of other people have figured out the tricks to take big models and take a very particular distribution of user input and train a model that is almost as good as the big model, but much, much smaller and much, much faster. And so I think we're going to see this.
A lot going forward, especially if you, if you look at, you know, the sonnet versus haiku, you know, Gemini versus Gemini flash, you know, one versus one mini for many every lab has really focused on this. And in fact, you see distillation as a service coming.
What would you say to people watching who are trying to make AI startups right now off in their vertical startups? But some of them are consumer too, actually. Yeah, I would say if you're a founder, the right approach is to start with the very best model you can. Because your startup is only going to be successful if it exploits something about AI that realistically is going to be on the frontier. So start with the very best model that you can.
and get it to work. And once you've gotten it to work, then you can use distillation. You can take a dumber model and you can try prompting it. You can try to have the frontier model, train the smaller model. But the most important thing in a startup is actually your time. Unless you have to, you don't want to be like Palantir taking three years to get to market. You want to be able to build that product as quickly as possible. And only once you've actually figured out where the value is, probably by iterating with your user, then you can think about cost.
Working backwards, it sort of feels like the movie Herr is more or less inevitable. I am a little skeptical of the deep emotional connection that guys are going to have AI girlfriends. I think that's not what guys are looking for in a girlfriend, frankly. I think an AI that shops for you, well there it's really helpful to know a lot about your preferences.
an AI that is your assistant at work. Again, very helpful to know about your preferences. One other thing I think would be cool would be an AI that it's Gary's AI bot. If I want to know what Gary's thinking, I could just ask your AI bot. If I get a good enough answer, then I can go about my job and if not, then I have to actually bother you in person. I think that would be just a tremendous feat of personalization if you could make something like that happen.
And anything that works with you at work needs a huge amount of context about you. It should be able to see your slacks and your Gmail and all the different productivity tools that you have. And I think it's actually surprising. I think this is actually a real hole in the market because that's not something I can go out and purchase today.
I mean, in my mind's eye, what I can imagine is kind of like a super intelligent genie. It knows, you know, who you are, what you're about. And it might actually know, you know, your job, your goals in life. And it'll actually tell you, oh, hey, you should probably do this. And it might go out and get an appointment for you. And like, oh, yeah, it's time to take the LSAT, buddy. You said you wanted to, you know, go be a lawyer, like, well, this is the first step. You know, do you want to do it? Yes or no? Right.
And there's something really interesting about this idea, because I think it's very compelling that the AI is your life coach. But then it goes back to like, so what are you even doing with your life in the first place, right, if the AI is better than you? And I think there's actually a really deep mystery here. When we were first thinking about GPD1 back in 2018,
If you ask people what AGI was, they would say, well, it's a model that you can actually interact with. It passes the Turing test. It can look at things. It can write code. It can even draw an image for you. We're there. Yeah. We've had this for years. If you said, okay, well, what happens when you get all those capabilities? They say, well, everybody's out of a job. All laptop jobs are immediately automated and game over for humanity. None of that is happening.
I mean, yes, AI has had some effects, particularly on people who write code, but I don't think you can see it in the productivity statistics unless it's about how big the data centers are that we're building. And I think this is a really deep mystery. Why is it that AI adoption is so slow relative to what we thought should be happening in 2018?
What you just said really reminds me of our days at Palantir, actually, where one of the core missions that Palantir started with, really, is this idea that the technology is already here. It's just not evenly distributed. And I feel like that was one of the things you guys actually really discovered and
You know, part of the reason why Palantir actually exists. You went into places in government, three letter agencies, some of the most impactful decisions that a society might have to make, and you look around and there was no software in there. And that was what Palantir and certainly Palantir government was very early on.
The fun piece there was just thinking through what it is that these people do, and then how you could just completely reimagine it with technology, where if you were checking to see if a particular person who was flying into the US had a record or if there was any suspicion, you look through 20 different databases.
One approach would be to say, well, let's make it faster to look through 20 different databases. Another approach is to say, well, maybe you can just do look for it once and it checks all the databases for you. And I think that's the, we need some twist like that for AI that lets people figure out how to use the AI to solve the problem they actually have.
Uh, not just sort of take their existing workflow and have AI do that work. Yeah. It's like not just having the data. It's not just having the intelligence. I mean, what AI desperately needs right now is like you said, the UI, the software, it's just building software. And if you can.
put that in a package that a particular person really, really needs. I feel like that's one of the big things that we learned at Palantir. It's like, there's a whole job that is exactly that. Forward deployed engineer. It's a very evocative term, right? Like forward deployed. You're not way back at the HQ. You're all the way in the customer's office. You're sitting right next to them at their computer, watching how they do something.
and then you're making the perfect software that they would never get access to. Like the alternative is Excel spreadsheet, writing SQL statements yourself or cost plus government integrator or like Accenture, and they're never gonna get something usable. Whereas a really good engineer who's a good designer who can understand exactly what that person needs and is trying to do, they can build the perfect thing for that very person.
And so maybe that's the answer to your question. Like, why didn't it happen yet? It's like we just need more software engineers who are like that forward deployed engineer to link up the intelligence and we're there.
I think it's really funny because if you think back to 2015 when I left Palantir, people were skeptical of Palantir because of the existence of the Ford deployed engineers. If you had a really good product, you wouldn't need the Ford deployed engineers. You wouldn't need to specialize it to every customer and wait five years and Palantir has a great IPO, wait 10 years. It's a very valuable company. Suddenly everybody is talking about building their Ford deployed engineering function.
And I think it's a good thing. I think hopefully this gives us a lot of software that is actually very tuned to what the customers need, not just something off the rack that you then say, well, there's a way to accomplish what you can do. Go figure it out. Bob, both of us are parents, and we just spent a lot of time talking about some pretty wild concepts that are about to affect all of society. Has that affected how you think about what we should be doing with our kids?
I really struggle with this. And there's a very crisp version of this for me, which is that my eight-year-old son is really excited about coding. He actually is really excited, but he wants to start a company. He has a great name and it's going to do asteroid mining and all sorts of cool stuff. And so every day, he says, dad, can you teach me a little bit about how to code? This is actually what I do most with language models.
is I have the language model. I figure out what he's interested in. I have the language model make a lesson for him that teaches some idea that I want to teach him. I get teachers about networking or teachers about loops. And it fits his idea.
And my wife asked, why are you doing this if the language models are gonna be able to code? And I think the answer is that right now we still have to, this is how you learn how to do critical thinking. And I think back to Paul Graham's idea of the resistance of the medium. Even once the computer can do the programming for you, I think there's still something to having had your hands in it yourself and knowing what's possible and what's not possible.
and that you can have that intuition. I think the role that we're going to be playing
You know, one, I think there's going to be two roles. One will be something like a lone genius, you know, the Alec Radford of the world working alone at his computer, coming up with some crazy idea, but now with that computer being able to leverage him up so much. And the other role is manager that, you know, you will be the CEO of your own firm and that firm will mostly be AI. I think it will be other humans in there. I don't think the whole company gets replaced, although this is another really interesting question for us to answer.
But I think those will be the two jobs of the future, genius and manager. I think that that is actually pretty awesome. Those are two things that would be really fun jobs, honestly. When cameras, the photographic camera and film came out, what happened to artists? They're still around, and people still learn to paint.
And there are probably more people who learn to paint because more people have an appreciation for art and painting and the visual arts. So my hope is that that's what happens. And I think if you go back to the last time we automated away most human jobs, in the 1880s most people were farmers.
And now 3% of Americans maybe are farmers. And we all do jobs. I think we try to explain to people from 1880, being a software engineer or running a startup incubator, they'll be like, what the hell is this? These aren't real jobs. At the end of it, I'm very much an optimist about humanity. I think that humans will have important and valuable roles to play.
But just like that first 90% of jobs that got automated away, those farmers didn't know what the jobs of their grandchildren would look like. I think we have that same period now where we don't know what the jobs of our grandchildren will look like. And we're just going to have to play it by ear and figure it out.
I guess going back to robotics, you know, one of my hopes is actually that maybe the level four innovators will suddenly break through on a bunch of very specific problems that currently hold back robotics. Have you spent time, you know, back in that space recently and what are the odds of that?
coming together in the next, I don't know, a couple years even. Like, do you feel like there will be continued breakthroughs on maybe the figure robot and different things like that? What's your sense for robotics in the next year or two? Robotics companies now are aware.
LOM companies were five years ago. So I think in five years, or even sometime in the next five years, we will see the chat GPT moment for robotics. I think it's a little harder to scale because you've got to build physical robots. But if you look at companies like skilled AI or physical intelligence who are building foundation models for robots, the progress that we've seen there is just really dramatic. There's some point we're going to get out of that zero to one phase where you're just trying to make it work at all.
and we're going to be in something where it kind of works, and then we're just scaling to increase the reliability and increase the scope of the market. I remember working with Sam Altman at YC, and he was bringing in some pretty wild hard-tech companies like Kileon focused on fusion or Oclo in the energy space. And at the time, I don't know if I totally understood.
Why? But I don't know. After the AGI part, you're becoming much more real. Plus that, it feels like if you add robotics, that's one of the more profound sort of triumvirates of technology that might come together that will create quite a lot more abundance for everyone.
Yeah, I mean, it's the, you know, whatever, whatever the part of the stack is that isn't automated becomes the bottleneck. And so, you know, I think, weirdly, we're going to end up with, you know, automating the scientist, the innovator, before we automate, you know, the experiment doer. But then, you know, if that comes through,
I think the potential for really fast scientific advance is totally there. I think we will find some other bottleneck. I think we're going to look back at this conversation where I say we did all the things and science is only going like.
30% faster than it was. Why isn't it 300 times faster? And we'll have to figure it out. I mean, it'd be a great problem to have. That's right, yeah. That's going to be, 300% is great, but 300%, that would be insane. Hey, room for thousands more startups. That sounds great. Bob, thank you so much for joining us. This is, I feel like I learn a lot every time I get to see you. So great to see you again. Thanks for coming on the channel. It's always fun to have these conversations with you, Gary.