Hi listeners and welcome to No Priers. Today we're hanging out with Aidan Gomez, co-founder and CEO of Cohere, a company valued at more than $5 billion in 2024, which provides AI-powered language models and solutions for businesses.
Aiden founded cohere in 2019, but before that, during his time as an intern at Google Brain, he was a co-author on the landmark 2017 paper, Attention is All You Need. Aiden, thanks for coming on today. Yeah, thank you for having me. Excited to be here.
Maybe we can start just a little bit with the personal background. How do you go from growing up in the woods in Canada to, you know, working on the most important technical paper in the world? A lot of luck and, and chance. But yeah, I happened to go to school at the place where Jeff Hinton taught. And so obviously Jeff recently won the Nobel Prize. He's kind of like,
attributed with being the godfather of deep learning. In U of T, the school where I went, he was a legend and pretty much everyone who was in computer science, studying at the school, wanted to get into AI. And so in some sense, I feel like I was raised into AI. Like as soon as I stepped out of high school, I was steeped in an environment that really
saw the future and wanted to build it. And then from there, it was a bunch of happy accidents. So I somehow managed to get an internship with the Clash Kaiser at Google Brain. And I found out at the end of that internship,
I wasn't supposed to have gotten that internship. It was supposed to have been for PhD students. And so they were like throwing a goodbye party for me, the intern. And the clash was like, okay, so Aiden, you're going back. How many years have you got left in your PhD? And I was like, oh, I'm going back into third year undergrad. And he was like, we don't do undergrad internships. So I think it was a bunch of like,
really lucky mistakes that led me, led me to that team. Working on really interesting, important things at Google, what convinced you that you should start cohere? Yeah, so I bounced around. Like when I was working with Luca Shunt, Gnome and the transformer guys, I was in Mountain View. And then I went back to U of T, started working with Hinton and my co-founder Nick in Toronto. I brain there.
And then I started my PhD and I went to England. And I was working with Jakub, who's another transformer paper author in Berlin. And we had Jakub on the podcast. Oh, nice. Yeah, yeah, okay. Fan of the pod. Good, good. So yeah, I was working with Jakub in Berlin. And then I was also collaborating remotely with Jeff Dean and Sanjay on Pathways, which was like their
you know, bigger than a supercomputer training program. The idea was like wiring together supercomputers to create a new larger unit of compute that you could train models on. And at that stage, GPT-2 had just come out.
And it was pretty clear the trajectory of the technology, like we were on a very interesting path. And these models that were ostensibly models of the internet, models of the web, we're going to yield some pretty interesting, interesting things. So I called up Nick, I called up Ivan, my co-founders. And I said, maybe we should figure out how to build these things. I think they're going to be useful.
for anyone who doesn't know yet, can you just describe at the high level like what cohere's mission is and then what the models and products are? Yes, our mission, the way that we want to create value in the world is by enabling other organizations to adopt this technology and make their workforce more productive or transform their product and the services that they offer. So we're very focused on the enterprise. We're not going to build a
chat GBT competitor, what we want to build is a platform and a series of products to enable enterprises to adopt this technology and make it valuable.
And in terms of your north star of how you organize the team and invest, you obviously come from a research background yourself. How much do you think cohere success is dependent on core models versus other platform and go-to-market support investments you make? It's all of the above. The models are the foundation. And if you're building on a foundation that doesn't meet the customer's needs, then
There's no help. And so the models are crucial, and it's like the heart of the company. But in the enterprise world, things like customer support, reliability, security, these are all key. And so we've heavily invested on both sides. We're not just a modeling organization. We're a modeling and go-to-market organization. And increasingly,
Product is becoming a priority for cohere. And so figuring out ways to shorten time to value for our customers. Yeah, over the past like 18 months.
since the enterprise world sort of woke up to the technology, we've watched, we've watched folks build with our models, seen what they're trying to accomplish, seen the common mistakes that they make. That's been helpful. It's been sometimes frustrating, right? Watching the same mistake again and again. But we think there's a huge opportunity to be able to help enterprises avoid those mistakes and implement things right the first time. And so that's really where we're pushing towards.
Yeah, can we make that a little bit more real? Like what is the mistake that frustrates you most and how can product go meet that? Yeah, well, I think all language models are quite sensitive to prompts to the way that you present data. They all have their own individual quirks, the way that you talked to one might not work for the way that you talked to another. And so when you're building a system like a RAG system where there is an external database,
It really matters how you present the retrieved results to the model. It matters how the data is actually stored in those databases. The formatting counts. And these small details are often lost on people. They overestimate the models. They think they're like humans. And that has led to a lot of
repeat failures. People try to implement a RAG system. They don't know about these idiosyncratic elements of implementing one properly, and then it fails. And so in 2023, there were a lot of these POCs. A lot of people trying to get familiar with the technology, wrap their heads around it. And a lot of those POCs fail because of in familiarity, because of
Yeah, these common errors that we've seen. And so moving forward, we have two approaches. One is making the models more robust. So the model should be robust to a lot of different ways that you present data. And the second piece is being more structured about the product that we exposed to the user. So instead of just handing a model and saying prompt it, good luck.
actually putting more structure around it, so creating APIs that more rigorously define how you're supposed to use the model. These sorts of pieces, I think, just reduce the chances of failure and make these systems much more usable for the user.
What are people trying to do? Can you give us a flavor of some of like the biggest use cases you see in the enterprise? It's super broad. So it spans pretty much every vertical. I mean, the common things are like Q&A. So speaking to a corpus of documents, for instance, if you're a manufacturing company, you might want to build a Q&A bot for your engineers or your workers who are on the assembly line and plug in
all of the manuals of the different tools and diagnostic manuals for common errors in parts, and then let the user chat to that instead of having to open up a thousand page book and try to find what they need. Similarly, Q&A bots for the average enterprise worker. So, plugging in
your IT, FAQ, your HR docs, all the things about your company and having a centralized chat interface onto the knowledge of your organization so that they can get their questions answered. Those are some of the common ones. Beyond that, there are kind of specific functions that we power. A good example might be for a healthcare company, they have these longitudinal health records.
of patients and that consists of every interaction that that patient has with the health care system from visits to a pharmacy to the different labs or tests that they're getting to doctor's visits and it can spend decades. And so it's a huge, huge record of someone's medical history. And typically what happens is that patient will call in and they'll
ring up the receptionist and be like, my knee hurts. I need an appointment. And the doctor then needs to kind of comb through the past few entries. He has this come up before and maybe they missed something that was two years ago because they only have 15 minutes before an appointment.
But what we can do is we can feed that entire history in alongside the reason they're coming in, so contextually relevant, right, to what they said they're coming in for, and surface a briefing for the doctor. And so this tends to be one dramatically faster for the doctor to review, but also often it catches things
that a doctor couldn't possibly review before every patient meeting. They're not going through 20 years of medical history. It's just not possible. But the model can do that. We can do that in under a second. So those are the sorts of functions that we're seeing, summarization, Q&A bots. A lot of these, you might think of them as mundane, but the impact is immense.
We see tons of startups working on problems such as, let's say, enterprise search overall, specialized applications to, let's say, like technical support for a particular vertical, even looking at health records and reasoning against them and retrieving from them. How do you think about like what the end state, there's no end state, but what some stable equilibrium state is for how enterprises consume from
let's say specialist AI powered application providers versus custom applications built in house with AI platforms and model APIs.
I think it's going to be a hybrid. I think it's probably, you can imagine like a pyramid where the bottom of that pyramid, every organization needs this stuff. And it's like co-pilot, like a generalist chat bot in the hands of every single employee to answer their questions. And then as you head up the pyramid, it's more specific to the company itself or the specific domain or product that they operate in or offer. And as you push up that pyramid,
It's much less likely you're going to find an off the shelf solution to address it. And so you're going to have to build it yourself. What we've pushed organizations to do.
is have a strategy that encompasses that full pyramid. Yes, you need the generalist standard stuff. Maybe there's some industry-specific tools that you can go on by, but then if you're building, don't build those things that you could buy. Instead, focus on the stuff that no one's going to sell to you, and it gives you uniquely a competitive advantage. We worked with this insurance company, and they ensure large industrial development projects.
It turns out, like, I don't know, think about this space. It turns out what they do is there's like an RFP put out by a mine or something, like whatever the project is for insurance.
And they have actuaries, jump on that RFP, do tons of research about the land that it's on, the potential risks, et cetera. And then it's essentially a race to whoever responds first usually gets it. And so it's a time-based thing. How quickly can these actuaries put forward a good research proposal?
And what we built with them was like a research assistant. So we plugged in all the sources of knowledge that these actuaries go to to do their research via RAG, and we gave them a chatbot. And it dramatically sped up their ability to respond to RFPs. And so it grew their business because they were just winning many more of them.
And so it's tough for like, you know, we build horizontal technology. And LLM is kind of like a CPU. I don't know all the applications of an LLM, right? It's so broad and really the deep insight or the competitive advantage, the thing that puts you ahead.
is listening to the customer and letting them tell you what would put them ahead. And so that's a lot of what we've been doing is just being a thought partner and helping brainstorm these projects and ideas that are strategic to them.
I'd wager that this company is winning because the vast majority of their competitors haven't been able to move so quickly to adopting and building this research assistant product that is helping them. What is the biggest barrier you see to generally enterprise adoption? I think the big one is trust. So security is a big one. In particular, in regulated industries like finance,
healthcare. Data is often not in a cloud, or if it is in a cloud, it can't leave their VPC.
And so it's very lockdown, it's very sensitive. And so that's a unique differentiator of cohere. The fact that we haven't locked ourselves into one ecosystem and we're flexible to deploy on-prem if you want us in VBC, outside of VBC, literally whatever the customer wants, we're able to touch more data, even the most sensitive data and provide something that's more useful. So I would say security and privacy
is probably the biggest one. Beyond that, there's knowledge, right? Like the knowledge to know how to build these systems. They're new, it's unfamiliar to folks. The people with the most experience have a few years of experience. And so that's the other major piece. That bit, I think it's honestly just a time game. Like eventually, developers will become more familiar with building with this technology.
But I think it's going to take another two or three years before it really permeates. Do you think in a traditional hype cycle for enterprise technologies, probably for most technologies, but in particular enterprise, there's this trough of disillusionment concept of people get very excited about something and ends up being harder to apply or more expensive than they thought. Do we see that in AI? I'm sure we see some of it, for sure.
But I think honestly, like the core technology is still improving at a steady clip and new applications are getting unlocked every few months. So I don't think we're in that trough of disillusionment yet. Yeah, it feels like we're super early. It feels like we're really, really early. And if you look at the market, this technology just unlocks an entire new set of things that you couldn't build.
You just fundamentally couldn't build them before and now you can. And so there's a resurfacing of technology, product, systems that's underway. Even if we didn't train a single new language model, like, OK, all the data centers blow up, we can't improve the element. We only have what we have today. There's a half decade of work to go integrate this into the economy, to build all these things, to build the RFP, insurance RFP response.
bots to build the healthcare record summarizer. Like there's a half decade of just resurfacing to go do. So there's a lot of work ahead of us. I think we're kind of past that point. There was a question of, oh, is there too much hype? Is this technology actually going to be useful? But it's in the hands of 100 million people, hundreds of millions of people now. It's in production. There's very clear value. The project is now
putting it to work and delivering it to the world. In this question of integration into the real world, some piece of it is, of course, interfaces and change management and figuring out how users are going to understand the model outputs and guard bills and all of that. Specifically, when we think about the model and specialization, do you have some framework you offer customers that use internally around
Um, what version of it they should invest in, right? So we have pre training, post training, fine tuning, retrieval, like in, in those sort of traditional sense, like prompting, especially as we get longer context, like how, how do you tell customers to make sense of how to specialize?
It really depends on the application. For instance, we partnered with Fujitsu, who's the largest SI in Japan, to build a Japanese language model. There's just no way you can do that without intervening on pre-training. You can't fine-tune or post-train Japanese into them effectively. And so you have to start from scratch.
On the other side, there's more narrow things like if you want to change the tone of the model or you want to, I don't know, change how it formats certain things. I think you can just do fine tuning. You can take the end state.
And so there is this gradient. What we usually recommend to customers is start from the cheapest, easiest thing, which is fine-tuning and then work backwards. And so start with fine-tuning, then go back into post-training, like SFT, RLHF, then if you need to, and you know,
It's kind of a journey, right? Like as you're talking about a production system and the constraints are getting higher and higher, you potentially will need to touch pre-training. Hopefully not all of pre-training. Hopefully it's like 10% of pre-training at the very end or maybe 20% of pre-training. But yeah, that's usually how we think about it is like this journey from the simplest, cheapest thing to the most sophisticated, but most performant.
moving along the gradient from the cheapest thing makes sense to me. The idea that any enterprise customer will invest in pre-training is I think a bit more controversial. I believe some of the lab leaders would say like nobody should be touching this and it doesn't make any sense for people from a scale of compute and data, data curation effort required and just sort of the talent required to do pre-training in any sort of competitive way. Like how would you react to that?
I think if you're building like a, if you're a big enterprise and you're sitting on a ton of data, like hundreds of billions of tokens of data, pre-training is a real lever that you're able to pull. I think for most like SMBs and certainly startup, it makes no sense that you should not be pre-training a model. But if you're a large enterprise,
I think it should be a serious consideration. The question is how much pre-training? It's not like you have to start from scratch and do a $50 million training run, but you can do a $5 million training run. That's what we've seen succeed, these sort of continuation pre-training efforts. So yeah, that's one of the offerings that we have. But of course, we don't jump straight into that. You don't need to
spend massively if you don't want to. And usually the enterprise buying cycle or technology adoption cycle is quite slow. And so you have time to move back into it. I would say it's totally at the customer's discretion. But to the folks who say that no one should be pre-training. No one outside of, let's say, AGI labs should be pre-training. That's empirically wrong.
Maybe that's a good jumping off point into just talking a little bit more about what's going on in the technical landscape and also what that means for cohere. What is the bar you set internally for cohere? You said the models have foundation. And I believe you've also said there's no market for last year's models. How do you square that with the capital expense of competition and the rise of open source models now?
Well, I think you have to spend, there's some like minimum threshold that you need to be spending at in order to build a model that's useful. The things get cheaper, the compute to train the model get cheaper, the sources of data. Well, in some directions, they get cheaper and others not with synthetic data. It's gotten dramatically cheaper, but with expert data.
It's getting harder and harder and more expensive. And so what we've seen is today, you can build a model that's as good as GPT four in all the things that enterprises might care about for $10 million, $20 million, like just orders of magnitude less than what was spent to develop that model. And so if you're willing to wait six months or a year,
to build the technology, you can build it at a fraction of what those frontier labs have paid to develop it. And so that's been a key part of cohere's strategy is we don't need to build that thing first. What we'll do is we'll figure out how to do it dramatically cheaper and we'll focus on the parts of it that matter to our customers. So we'll focus on the capabilities that our customers really depend on. Now, at the same time,
We still have to spend like relative to a regular startup. We have to pay for a supercomputer and those things cost hundreds of millions of dollars a year. Um, so it is capital hungry, but it's not capital inefficient. Um, it's very clear that we'll be able to build a very profitable business off of what we're building. So that's the, the strategy is don't lead, don't burn, you know,
$3, $5, $7 billion a year to be at the front, be six months behind, and offer something to market to enterprises that actually fits their needs at a price point that makes sense for them. Why spend on the supercomputer and the training yourself at all if you have increasingly open source options? Well, you don't, not really. Say more.
Yeah, you get like the base model at the end when it's cooled down and it has zero gradient. You get the post-trained model at the end when it's cooled down and has zero gradient. Taking those models and trying to fine tune them, it's just it's not as effective as building it yourself and you have much fewer levers to pull.
then if you actually have access to the data and you can change the data that goes into that process. And so we feel that by being vertically integrated and by building these models ourselves, we just have dramatically more leverage to offer our customers.
Maybe if we go to projection and we'll hit on a few things that you've mentioned as well, where are we in scaling laws? How much capability improvement do you expect over the next few years? We're pretty far along, I would say. We're starting to enter into a flat part of the curve. We're certainly past the point where if you just interact with a model,
You can know how smart it is. Like the vibe checks, they're losing utility. And so instead, what you need to do is you need to get experts to measure within very specific domains like physics, math, chemistry, biology.
You need to get experts to actually assess the quality of these models because the average person can't tell the difference at this stage between generations. Yes, there's still much more to go do, but those gains are going to be felt in very specialized areas and have impacts on more researchy domains. I think for enterprises and the general tasks that they want to automate or tools that they want to build,
The technology is already good enough or close enough that a little bit of customization will get them there. That's the stage that we're at. There's a new unlock in terms of the category of problems that you can solve, and that's reasoning. Online reasoning is something that has been missing. These models, they don't have a
They previously didn't have an internal monologue. They didn't really think to themselves. You would just ask them a question and then expect them to immediately answer that question. They couldn't reason through it. They couldn't fail, make a mistake, catch that mistake, fix it, and try again. And so the fact that we now have reasoning models coming online, of course, OpenAI was the first to put it into production, but Kogir's been working on it for about a year now.
This category of tech, I think, is really interesting. There's a new set of problems that you can go solve. And it also changes the economics.
So before, if I had a customer come to me and say, uh, eight and I want your model to be better at X or I want a smarter model, I would say, okay, you know, give us six to 12 months. We need to go, uh, spin up a new training run, train it for longer, train a bigger model, et cetera, et cetera. Um,
Now that was kind of the only lever we had to pull to improve the performance of our product. There's now a second lever, which is you can charge the customer more. You can say, okay, let's spend twice as many tokens or let's spend twice as much time at inference time and you'll get a smarter model.
So there's a much nicer product experience. Okay, you want a smarter model, you can have it today, you just need to pay this. And so they have that option, they don't need to wait six months. And similarly for model builders,
I don't need to go double the size of my supercomputer to hit a requisite intelligence threshold. I can just double the amount of inference time compute that my customers pay for. So I think that's a really interesting structural change in how we can go to market and what products we can build and what we can offer to the customer.
I agree. I think it's perhaps undervalued in the ecosystem right now how much more appealing it should be to all types of customers that you can move from a capex.
model of improvement to a consumption model of improvement, right? And it's not like, you know, these are apples and oranges things, but I think you'll see people invest a lot more in, you know, solving problems when they don't have to hone up for a training run and have this delay as you describe.
Yeah, it hasn't been clocked. People haven't really traced in the impact of inference time, compute, delivering intelligence. There's loads of consequences, even at the chip layer.
Like what sort of chips you want to build, what you should prioritize for data center construction. If we have a new avenue, which is inference time compute, that doesn't require this densely interconnected supercomputer. It's fine to have nodes. You can do a lot more locally and less distributed. I think it has loads of impact up and down this chain. And it's a new paradigm of what these models can do and how they do it.
You were dancing around this, but because you're average person doesn't spend that much time thinking about what is reasoning. Do you have any intuition you can offer people for what are the types of problems this allows us to tackle better?
Yeah, I think any sort of multi-step problem. There's some multi-step problems you can just memorize, which is what we've been asking models to do so far, like solving a polynomial. Really, that should be approached multi-step. That's how humans solve it. We don't just get given a polynomial and then boom. There's a few that maybe we've memorized.
By and large, you have to work through those problems, break them down, solve the smaller parts, and then compose it into the overall solution. And that's what we've been lacking. And we've had stuff like Chain of Thought, which has enabled that. But it's sort of like a retrofitting. We train these models to just memorize input output pairs. And we found a nice little hack to elicit the behavior that mimics reasoning.
I think what's coming now is from scratch, the next generation of models that is being built and delivered will have that reasoning capability burned into it from scratch. And it's not surprising that it wasn't there to begin with, because we've been training these models off of the internet. And the internet is like a set of documents, which are the output of a reasoning process.
with the reasoning all hidden. It's like a human wrote an article and spent weeks thinking about this thing and deleting stuff and blah, blah, blah, but then posted the final product. And that's what you get to see. Everything else is implicit, hidden, unobservable. And so it makes a lot of sense why the first generation of language models lacked this inner monologue. But now what we're doing is we're with human data and with synthetic data,
We're explicitly collecting people's inner thoughts. So we're asking them to verbalize it and we're transcribing that and we're going to train on that a model, that part of the problem solving process. And so I'm really excited for that. I think right now it's extremely inefficient and it's quite brittle, similar to the early versions of language models.
But over the next two or three years, it's going to become incredibly robust and unlock just a whole new set of problems. What is the basic driver of the slowdown, reaching the flat part of the curve that you describe with scaling? Is it the cost of increasingly expert data and collecting, as you said, hitting reasoning traces that is harder and more expensive than
just taking the data on the internet is at the difficulty of having evals for increasingly complex problems. Is it just overall cost of compute? Why do you think that flattening is happening? When someone's making an oil painting, they do a backcoat and just cover the whole canvas. And then they sort of paint in the shapes of the mountains and the trees.
And as you get more and more detailed, you bring out very fine brush strokes, there's a lot more of them that you need to make before you could just take a big wedge and just throw pain across the canvas and accomplish the thing that you wanted to accomplish. But as you start to get more and more targeted or more and more detailed in what you're trying to accomplish,
It requires a much more fine instrument. And so that's what we've seen with language models. We're able to do a lot of the common, simple, easy tasks quite quickly, but as we've approached much more specific sensitive domains like science, math, that's where we've started to see
resistance to improvement. And in some places, we've gotten around that by using synthetic data, like in code and math. These are places where the answer is very verifiable. You know when you're right or you're wrong. And so you can generate tons of synthetic data and just verify whether it's correct or not, you know, it's correct. Okay, let's train on it. In other areas that require
testing and knowledge in the real world, like in biology, like in chemistry. There's a bigger bottleneck to creating that sort of data, and you have to go to experts who know the field, who have experienced it for decades, and basically distill their knowledge.
But eventually you run out of experts and you run out of that data and you're at the frontier of what humans know about X, Y or Z. There's just increasing friction to fill in these much finer details of this portrait.
I think that's a fundamental problem. I don't think that there's any shortcuts around that. At some stage, we're going to have to give these models the ability to run their own experiments to fill in areas of their knowledge that they're curious about. But I think that's quite
quite a ways away. And it's going to be tough to scale that. It will take many, many years to do. We will do it. We're going to get there 100%. But for the stuff that I care about today with Cohere, I think there are many applications which this technology is ready for production for.
And so the primary focus is getting it to production and ensuring that our economy adopts this technology and integrates it as quickly as possible, gets that productivity uplift. And so while that technical question is super interesting about why is progress slowing down, I think it should be kind of obvious, right? It's like the models are getting so good, they're hitting, they're running into the thresholds of human knowledge.
which is really where they're getting their capability from. You are so grounded in getting the capabilities we have and will continue to progress even if the curve is flattening into production. I think I know this answer, but how much do you or how much does cohere think about like AGI and take off and does that matter to you?
Well, AGI means a lot of things to a lot of different people. I think I believe in us building generally intelligent machines completely. It's like, of course, we're going to do that. But AGI has been conflated. How soon? We're already there. It's not a, you know,
It's not a binary. It's not discrete. It's continuous. And we're well on our way. We're pretty far down that road. There's some definition elsewhere in industry that you can put a breakpoint at, even if you have this continuous function, you can put a breakpoint in. There's intelligence that replaces.
and educated adult professional in any digital role. Your view is there's no really important breakpoint that's happening. That's sort of like objective checklist thing. Like when you've checked all these boxes, then you've got, I think you can always find like a counter example. You're like, oh, well, it hasn't actually beaten this one human over here who's doing this like random, random thing.
No, I think it's, I think it's pretty continuous and we're like quite far, quite far along. But the, the AGI that I really don't subscribe to is the super intelligence take off self improvement, just leading to the Terminator that exterminates us all.
Or creates abundance, unclear. Yeah, or creates abundance, right? No, I think we'll be the one to create abundance. We don't need to wait for this God to emerge and do it for us. Let's go do it with the tech that we're building. We don't need to depend on that. We can go do it ourselves.
We will build AGI if what you mean is very useful, generally capable technology. They can do a lot of the stuff that humans can do and flex into a lot of different domains. If what you mean is, are we going to build God? No. What do you think is the driver in that difference of opinion? I don't know. I think maybe I'm
a little bit more in the weeds of the practical frustrations of the technology, where it breaks, where it's slow, where we start to see things plateau or slow down.
And perhaps others are more, maybe they're more optimistic. Maybe they see a curve increasing, and they just think it goes on forever. That will just continue arbitrarily, which I disagree with. I think there's friction points. There is genuinely friction that enters in. Even if, in theory, a neural net is a universal approximator. It can learn anything.
To universally approximate, you would need to build a neural net the size of the universe. And so there's some fundamental barriers to reaching limits that people extrapolate out to that I think will bound the practically realizable forms of this technology.
Are there domains where you just believe LLMs as we have them today are like not a good fit for prediction, right? And so an example might be like, are we going to get to physics simulation from sequence to sequence models? I mean, probably, yeah. Like physics is just like a series of states and
transition probability. So I think it's probably quite well modeled by sequence modeling. But are there areas where it's poorly suited? I'm sure that there are better models for certain things, more efficient models. If you zoom into a specific domain, you can take advantage of structure in that domain.
to carve off some of the unnecessary generalities of the transformer or of this category of architectures and get a more efficient model. That's definitely true when you zoom in. It doesn't sound like you think it's at its core like a representation issue or it's just not going to work.
There's irreducible uncertainty in the world. There's things that you genuinely cannot know, and building a beer model will not help you know this genuinely random or unobservable thing. And so those things will never be able to model effectively until we learn how to observe them. I think the transformer in this category of model can do much more than people give it credit for. It's a very general
architecture, many, many things can be phrased as a sequence. And these models are just sequence models. And so if you can phrase it as a sequence, a transformer can do a fairly good job at picking up any
regularity in it. But I'm certain that there are examples that I'm just not able to think of right now, where sequence modeling is super inefficient. Like you can do it with sequences, you can phrase a graph as a sequence. But it's just like the wrong model. And you would pay dramatically less compute if you approached it from a different angle.
Okay, one last question for you. So you concluded earlier that scaling computed inference time. People have noticed, but it's not really priced in how big of a change this is. Is there anything else you think is not priced in by the market right now that cohere thinks about that you think about? Yeah, I think there's this idea of commoditization of models. I don't really think that's true. I don't think that models are actually
getting commoditized. I think what you see is you see price dumping. And so you see people giving it out for free, giving it out at a loss, giving it zero margin.
And so they see the prices coming down and they assume prices coming down means commoditization. I think in reality, the state of the world is there's a total technological refactor that's going on right now. And we'll last the next 10 to 15 years. And it's kind of like we have to repave every road on the planet.
And there's like four or five companies that know how to make concrete. And maybe today, some of them give their concrete, create a way for free. But over time, there's a very small number of parties that know how to do this thing and a huge job in front of us and pressures to drive growth to show return on investment.
It's an unstable present state to be operating at a loss or giving away very expensive technology for free. So growth pressures of the market will push things in a certain direction. And yeah, you know, the price of haiku forexed two weeks ago. And this has been super fun. Thank you so much for doing this with us. Yeah, my pleasure. My pleasure. It was super fun. Great seeing you.
Find us on Twitter at NoPryersPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priars.com.