DeepSeek R1 & The Short Case For Nvidia Stock | Jeffrey Emanuel
en
January 28, 2025
TLDR: New DeepSeek AI model matches GPT-4 performance at 1/45th the cost, contributing to a 20% dip in Nvidia's stock price and raising concerns about its market dominance, as argued by investor-technologist Jeffery Emanuel in his viral article.

In this blog summary, we delve into the critical insights from the podcast episode titled DeepSeek R1 & The Short Case For Nvidia Stock featuring investor-technologist Jeffrey Emanuel. The episode explores the significant implications of China's DeepSeek AI model on the AI hardware market, notably contributing to a 20% drop in Nvidia stock.
Key Themes and Insights
The podcast covers several pivotal themes regarding the ongoing shifts in AI compute economics, Nvidia's market dominance, and competitive pressures these advancements create in the tech industry.
1. The Rise of DeepSeek AI
- Performance vs. Cost Efficiency: The DeepSeek AI model is reported to match GPT-4's capabilities at 1/45th the cost, dramatically affecting Nvidia's competitive edge.
- Market Reaction: The announcement of DeepSeek contributed to a drastic $600 billion loss in Nvidia’s market value, emphasizing the profound impact of competitive AI advancements.
2. Jeffrey Emanuel’s Article
- Viral Response: Emanuel’s substantial 12,000-word article, titled The Short Case for Nvidia Stock, went viral, hypothesizing that Nvidia faces increasing threats due to competitors like DeepSeek and new custom silicon developments by major tech companies.
- Shift in Market Sentiment: The podcast discusses how Emanuel's detailed analysis reshaped investor perceptions, suggesting that Nvidia’s previously unassailable lead is now at risk due to emerging technologies.
3. Shifts in Economic Models
- Increased Competition: The episode discusses the potential unbundling of Nvidia's market hold as companies like Amazon and Google develop their own chip solutions.
- Reduced Demand for Nvidia’s Solutions: With alternatives emerging, companies may reconsider their reliance on Nvidia, potentially resulting in lower margins and demand decrease.
4. DeepSeek’s Technological Innovations
- Efficiency Breakthroughs: DeepSeek's approach includes optimizing data structures for minimal latency, resulting in tremendous efficiency gains, which can threaten existing computing paradigms.
- Implications for AI Development: Higher efficiency could mean AI models now require less computing power, fundamentally altering projections regarding resource allocation and training costs in the AI tech stack.
5. The Market’s Overreliance on Nvidia
- Investor Caution: The podcast warns against over-reliance on Nvidia’s dominance, which could lead to unexpected market volatility in response to emerging technologies and competition.
- Historical Context: Emanuel highlights previous instances where tech giants lost market share to more nimble, innovative competitors when investors overlooked signs of change.
Actionable Takeaways
For investors and tech enthusiasts alike, this podcast episode provides valuable insights into the evolving landscape of AI technologies and market dynamics. Key takeaways include:
- Stay Informed: Regularly analyze industry advancements and articles from reputable sources, as they can influence market perceptions and investor decisions considerably.
- Diversification: To mitigate risks associated with single-entity reliance (like Nvidia), consider diversified investment portfolios within the tech sector.
- Monitor Development Trends: Keeping track of new technologies (such as DeepSeek) and their implications on compute economics could signal larger market shifts.
Conclusion
The conversation between Jeffrey Emanuel and the podcast hosts encapsulates a critical moment for the tech industry, where the intersection of groundbreaking AI technology and stock market volatility underscores the need for vigilance in investment strategies. The emergence of competitive technologies like DeepSeek signals not just an immediate challenge for Nvidia but potentially a shift toward more sustainable and efficient AI solutions across the industry.
Was this summary helpful?
I basically was trying to help my organic search ranking of my little YouTube tool. And then it's like, in the process, I may have inadvertently contributed to $2 trillion getting wiped off global equity markets, because, you know, the fact is, all of the news headlines came out saying the stock market crash because deep-seek. I'd like to point out that the deep-seek V3 technical paper, it came out December 27th, okay, at some month ago.
Even the newer model, the R1 model that does the chain of thought, that paper came out a week ago and people were all over that. So why suddenly on Monday did everything crash? And I'd like to think it's because I'm pretty sure it is that I wrote this article in a way that sort of speaks to hedge fund managers so they can understand it. And I published it like in the middle of the night on Friday and then it started taking off
And then it got shared by Chamath, who has, you know, whatever, 1.8 million, right? And Tamath, and it's been viewed over 2 million times. The volra of account is 2.5 million. And then like the Y Combinator, Gary Tan, and the Y Combinator account, between them, they have millions of followers. And not only did they share it, but they were like very effusive in their praise about this is really smart. And that went crazy.
Everyone is talking about this new deep-seek AI model from China that is reportedly 45 times more cost-efficient than US-based AI models and charges 95% less money to use than chat GPT. As a result, NVIDIA is down 20% wiping out $600 billion in market value and both OpenAI and Meta's AI Labs are scrambling to discover
how a relatively unheard of Chinese AI lab was able to outperform their very expensive models with a Chinese-grown model that just cost $6 million to train. The guest on the show today is Jeffrey Immanuel, who actually thinks that this part of the story, the deep-seek AI model part, is over-indexed on, and it's actually a confluence of other factors that is contributing to the unbundling of NVIDIA's market share.
And it's not the release of DeepSeek that triggered the 20% drawdown, but instead a 12,000 word article that he wrote on his blog that quickly went from just a few handful of readers to over 2 million readers over the weekend that actually coincided with the 20% drop in Nvidia price when the market opened on Monday.
In this episode, Jeffrey and I go through his article and reasoning behind why NVIDIA is under threat of getting unbundled by other chip suppliers. In addition to deep seeks impact upon the entire resource supply chain of training and inference around LLM models. Let's go ahead and get right into this episode with Jeffrey. But first, a moment to talk about some of these fantastic sponsors that make the show possible.
Are you ready to swap smarter? Uniswap apps are simple, secure, and seamless tools that crypto users trust. The Uniswap protocol has processed more than $2.5 trillion in all-time swap volume, proving it's the go-to liquidity hub for swaps, with support
for growing numbers of chains, including Ethereum, Mainnet, Base, Arbers from Polygon, ZKSync, Uniswap apps are built for a multi-chain world. Uniswap seeks your transactions across its web interface, mobile apps, and Chrome browser extensions. So you're never tied to one device. And with self custody for your funds and MED protection, Uniswap keeps your crypto secure while you swap anywhere, anytime. Connect your wallet and swap smarter today with the Uniswap web app or download the Uniswap wallet available now in iOS, Android, and Chrome. Uniswap.
the simple, secure way to swap in a multi-chain world. With over $1.5 billion in TVL, the M-Eath protocol is home to M-Eath, the fourth-largest ETH liquid-staking token, offering one of the highest APRs among the top 10 LSTs. And now, CMEath takes things even further.
This re-staked version captures multiple yields across Kerak, Eigenlayer, Symbiotic, and many more, making CME the most efficient and most composable LRT solution on the market. Metamorphosis, season one, dropped $7.7 million in cook rewards to M ETH holders. Season two is currently ongoing, allowing users to earn staking, re-staking, and AVS yields, plus rewards in cook, M ETH protocols governance token, and more. Don't miss out on the opportunity to stake, re-stake, and shape the future of M ETH protocol with cook. Participate today at meth.mantle.exe.
What if the future of Web3 gaming wasn't just a fantasy, but something you could explore today? Ronin, the blockchain already trusted by millions of players and creators, is opening its doors to a new era of innovation starting February 12th. For players and investors, Ronin is a home to a thriving ecosystem of games, NFTs, and live projects like Axie and Pixels. With its permissionless expansion, the platform is about to unleash new opportunities in gaming, DeFi, AI agents, and more. Sign up for the Ronin wallet now to join 17 million others exploring the ecosystem.
And for developers, Ronin is your platform to build grow and scale. With fast transactions, low fees, and proven infrastructure, it's optimized for creativity at scale. Start building on the testnet today and prepare to launch your ideas, whether it's games, meme coins, or an entirely new web3 experience. Ronin's million
of active users and wallets means tapping into a thriving ecosystem of 3 million monthly active addresses ready to explore your creations. Sign up for Ronin Wallet at wallet.roninchain.com and explore the possibilities. Whether you're a player, investor, or builder, the future of Web3 starts on Ronin.
Bankless Nation, very excited to introduce Jeffrey Emmanuel. He is both an investor and a technologist. He, however, is a very specific flavor of both of those things. On the tech side, he is deeply informed about the research advances that come out of major AI labs like OpenAI, Meta, Google.
And on the investing side, he plays in the markets as a value investor, one who dares to go short at times.
Thanks for having me. Jeffrey, I really enjoyed your article. I want to kind of start with the punchline. I want to read one of the last paragraphs in your article that I really felt summed up the entire digestion of everyone's analysis on how the new deep-seak model has impacted the market. So this is actually the second to last paragraph in your article. You wrote, perhaps the most devastating to NVIDIA's moat is deep-seak's recent efficiency breakthrough, achieving comparable model performance at approximately 1.45th the compute cost.
This suggests the entire industry has been massively over provisioning compute resources. Combined with the emergence of more efficient inference architectures through chain of thought models, the aggregate demand for compute could be significantly lower than current projections assume. The economics here are compelling. When DeepSeek can match GPT-4 level performance while charging 95% less for API calls, it suggests either NVIDIA's customers are burning cash unnecessarily or margins must come down dramatically.
To me, Jeffrey, that was the punchline for I think what everyone felt on the market is Monday when the video stock fell 17%. To me, I'm summing this up as there is a tug of war between hardware and software. And with the emergence of deep-seek, the software side of this tug of war got a very large W. That's my interpretation. That's my analysis. Check me on that. How do you feel about that kind of conclusion?
You know, it's funny because that the deep-seak is the part that everybody's the most focused on. But I actually think the whole shirt thesis still works pretty well without that. For all the other reasons that we can discuss. And the one issue with the deep-seak is that it's funny, there's the same Jeven's paradox, you know, which is like,
Nobody was talking about this until suddenly now everybody's saying Jevons every other word. And, you know, it's something that comes from energy economics, which is like you think you make things more energy efficient. Great. We're going to use less energy. But then what ends up happening is that the price of energy goes down and everybody wants to use more energy. And so it actually increases demand for energy. And so everyone's saying now that all this deep, seek thing is wrong because of the of Jevons and
You know, I am sympathetic to that to a degree, but it's not always so clear. And it's not like the Jevin stuff happens immediately. Like there's often, you know, sort of what causes booms and busts is these sort of temporary dislocations between anticipated demand and realized demand. And really, you know, what I think people miss is that
The big decisions about CapEx come down to a couple people like, you know, Mark Zuckerberg, and a lot of it is sort of gut feel like Masayoshi Sun, like, is this a good time to just push on the accelerator? And I think someone like Zok has to take a set back and say, listen, I know my guys are really smart, but
Maybe the answer is not necessarily to spend another $3 billion on NVIDIA chips that are very expensive. Literally, they're paying 40 grand for the GPU that's costing NVIDIA maybe $3,500 to make.
They're putting a lot of money in, uh, Nvidia's pocket and maybe they can, um, you know, pump the brakes just a little bit and then see if they can sort of still, you know, because they projected that they needed a certain amount of chips for their forecasted demand. So if they can, you know, and the deep-seek stuff is all public. So they can look at the technical report. They can start making these changes themselves internally, theoretically.
at least for the next generation of models they're training. And as a result, maybe they can kind of, because I think there is still some skepticism on Wall Street that like, are they going to see a return on this money? Because it's not like anyone's paying to use all this meta AI stuff yet.
And so I think it's a little... I'm not convinced that the... Oh yeah, well, Jevons. It's like, okay, let's see if that's actually the case. But then, you know, really separately from that, like I was saying, even if you remove deep-seek entirely,
I believe that NVIDIA, in particular, and I want to clarify, I am such a bull on AI. I'm about as bullish like 99th percentile on AI as anyone you will ever meet. I live in the AI future all day, every day. I have three cloud accounts. I'm using this stuff nonstop all day, every day. So I'm a huge bully.
But in Vidya as a company, this is just goes back to my sort of training and investing is that you see this over and over again. With one exception of a regulatory like enforcement monopoly, you do not have companies just get to print infinite profits with triple digit revenue growth.
with 90% gross margins, you don't get to see, and without having everyone in their brother trying to figure out a way to beat them. And that's what's happening. And so you look at, you know, these companies' cerebras and grok with the Q, like these companies already have extremely compelling hardware that, you know, largely does get around the NVIDIA mode, at least for inference. And, you know, in the case of cerebras, I think for training too,
And you know, there's all these other short, and I mean, the other thing is like, you know, normal companies of the scale of NVIDIA tend to have extremely diversified revenue sources, whereas NVIDIA, all the high margin
Data center revenue is coming from like, you know, five hyperscalers or something. Like it's a very much power law distribution. And I just, it's funny because when I started writing the article, which I started writing because, you know, my friend who's a hedge fund guy asked me about it on Friday and I just started writing about it after, as I was explaining it to him, I realized like, I should just write this up.
And it's funny because it started out as, you know, if I was forced to make the short case for Nvidia, here's what it would be. And by the time I had finished, I was like, shit, this actually is a short just from because I wasn't like, I knew there was a lot of custom silicon in the works, but it was kind of eye opening to me that every single hyperscaler customer
is literally making their own custom silicon in some cases for both training and inference. So it's like Amazon, Microsoft, open AI, meta, it's like, they're all doing this. And it's like, you don't like and as soon as they get this stuff to work, the other thing that's so important to remember is it doesn't necessarily have to be better than Nvidia stuff. Right.
Because NVIDIA is charging 10x what it costs them. So if you can make it yourself for 1x what it costs, then you can cut the price by 50% to your end customers. You'll still make a huge margin. And what matters to you as a hyperscaler is how many requests you can handle to your APIs and stuff per dollar. You don't care if you need more chips. It's fine.
as long as you don't have to pay these inflated prices for them. There are other parts of the pieces we can talk about, but I actually think all of that stuff should be just as much of a focal point as the deep-seat news.
Yeah, maybe to go back and trace over your article. I see your article in two parts. It's the moat of Nvidia and how it's being unbundled at the margins by the various set of companies, some of which you just mentioned. And some of these moats are the fast GPU interconnect. Nvidia has had this amazing ability to make their GPUs talk to each other with extreme bandwidths.
as if they are one big unit, like one big GPU. And that is getting unbundled by another company that is just making very large GPUs that reduces the need of... Well, not GPUs. They're making custom... It's not really a GPU. It's like this weird mega chip. I mean, it's funny because the H100 is...
considered like an absolute unit when it comes to chip size, because it's like this massive freaking package. But then the cerebras thing is like, they literally took an entire 300 millimeter wafer and just made the entire thing one enormous. I mean, these chips are extremely expensive to me. And but yeah, you don't need to worry about wiring things together. It's all on the same wafer, right? Right. And
I actually just want to point out to that, even Nvidia didn't make that technology. They bought melanoxis, Israeli company, the doubles of the side. I think they had 10,000 employees by the time they bought melanox for $7 billion and that brought in like another, you know, about the same. So it was a big
really smart. I mean, if they hadn't bought that company, like they would not be in the dominant position they are today with data center stuff. But yeah, I mean, everyone has been sort of relying on, oh, yeah, but what about interconnect? Even if, like, even if AMD could get their act together and come out with a decent driver and come up with some alternative to CUDA, they don't have the interconnect. So you can't use it for this. And you hear that argument a lot. And I think, well,
You're starting to see, on the training side, this company, Sarah Burris with the Wafer Scale chip. But then also, the other big news that started before DeepSeek was the O1 model from OpenAI, and that is sort of unlike this other new scaling law, which is about inference time-compute, which is like it used to be almost all the
processing power was needed on the training side, and then the inference was pretty fast. But nowadays with these models that do chain of thought, the more they compute at the time you give them a request, the better the answer they can give. And so people are now saying, whoa, so actually most of the compute might be on the inference side. But the inference side is a very different
computer problem that can be, so if you want, like right now they use the same GPUs for training and inference. Okay. Can we just quickly define training? Training is like actually making the model. Like you have like a zillion, you know, gig's worth of data of like tax from the internet Wikipedia, blah, blah, blah, broken up into these tokens. Deepsea used 15 trillion of them.
And then you take thousands of GPUs and you basically learn how to condense all that data down to 99% less space in the weights. And then in the process, you basically, the models learn these coherent model of the world and how to understand things. Because it's the only way to compress stuff that much without losing all the information is to understand it.
Whereas inferences, you already have a trained model and now I want to ask it to write me an essay or do a logic problem for me. The inference is a very different problem. You don't need thousands of GPUs to do it because you've already got the trained model. You just need a couple GPUs maybe and you can get the answers.
And so just to really trace that over one more time, training is chat, GPT, OpenAI, creating their products, creating their models that I go on to chat GPT. And then when I type in a query, I am doing inference. And so there's a weight here of just a ton of compute upfront to make the model once, and then hopefully a little amount of compute to run inference on it, which is just the daily requests.
And, like, in theory, there's, like, a trade-off here between how much compute you do initially to train the model, and hopefully that just makes all future inferences as efficient as possible. But there's still a few on both sides. When I'm interested, it just makes the model smarter. Smarter, yeah. Yeah. And that way you get better answers. But what changed recently, it used to be that basically all the inferences, you know, use this sort of moderately, you know, or a fixed compute budget. Like, but now it's, like, open-ended. Now, like, you know, O1,
is like their flagship model from OpenAI. If you pay $20 a month for Chatsheet BT+, you can use O1, you know, certain number of requests per week. If you pay 10 times as much, $200 a month for Chatsheet BT Pro, which I do and I recommend to anyone who uses this stuff a lot, it's got O1 Pro. It's the same model as regular O1. The only difference is that it takes like
much longer to respond, because while it's doing inference, it's using up far more of these intermediate logic tokens, as it were, this chain of thought, which is sort of like the scratch pad of its internal thinking process. And then it gives you an answer, but the answer is better, like your code will work the very first time.
Everything you won't have any kind of mistakes in your essay or whatever. And so over this one more time, just so like it's the same, it's the same pro and the $200 month version and the $20 month version is the same model, but there's this extra step, this extra layer of things happening where the pro version is running that same model over and over and over again in chunks. And it is able to go back and trace over previous work.
to check its work before it actually gives you an output. And you're saying that just because of this. It's not an additional layer. It's just like they just do it for longer. It's because basically it's a dial. You say, how much money do I want to spend generating tokens before I give the final answer?
And with pro, it's like it would not be economical for them to use the amount of tokens that they use for pro for the plus. In fact, Sam Altman said that, you know, it's funny because everyone on Hacker News was like, and in the industry, you know, all these developers, like $200 a month to get real. How could that make sense? And Sam Altman came out later and said, believe it or not, we're actually losing money.
charging $200 a month because people are using it and it just uses like insane amounts of compute. And so it really flips the equation in terms of how much compute is being used for inference versus training. And then this isn't really relevant because, like I said, with the NVIDIA GPUs, you buy an H100 GPU, data center GPU for 40 grand from NVIDIA. You're going to use the same GPU to train the model
and do inference on it. But this company grok with the queue, everyone gets confused because grok with the can. Not the Twitter grok is the Twitter. Exactly. But grok with the queue should be better known because this company is really, I mean, they've got unbelievable technology. They basically said, we're not going to try to solve training at all. We only care about inference. And so if you want to optimize the entire stack for inference only, how might you approach that?
And the result of that is that they can do inference from, you know, like a standard model, like the llama 3.3 70 billion, which was like until the DPC came out, it was the sort of leading edge open source model. And, you know, if you get a fancy desktop computer with one, let's say NVIDIA 4090 GPU, which you can get for under $1,000 now.
you could get, I don't know, maybe 40 tokens per second, which is actually like good enough that you could use that as your sort of home version of chat GBT that works pretty well. When you try it on grok, and anyone can try this for free, you just sign up with your Google account, and you can do inference from this model, and it's like insane. It's like instead of like 40 or 50 tokens per second, it's like 1,500 per second. And so you click the thing and you just boom.
there's your answer. And it's like, whoa, that's pretty interesting. And so even though the garage hardware costs like millions of dollars for one server, if you have enough demand that you can just keep it busy all the time, it's actually much cheaper to use. And most importantly, you're not giving your money to Nvidia, you're giving it to garage. So it's just an example of how people manage to, you know, like if you're trying to assault like a castle that has a big mode,
Instead of trying to cross the mode and get, you know, shut up by arrows, why don't you like dig a hole under the motor, do, you know, the catapult to go over, you find creative ways to get around it. And that's, that's what's happening is everybody's been focused on, well, of frontal assaults not going to work. And it's like, okay, but there are other ways to seize your castle. And that's what you're seeing is that
all the ingenuity of the market of like, because the, and the reason is because the prize is so big that if you can, you know, you too can make your company worth trillion dollars if you can take a big piece of this pie, whereas that was not true in 2016. It was like a backwater, you know? And so it, and it just, the wheels take a lot of time. Like if you want to make your own custom, even if you're Amazon with infinite money to spend, if you want to make your own chips,
What do you know about making silicon? First, you have to coach or hire the really brilliant people, and then it's going to take them probably two or three years to design a really good chip, and then you're going to have to try to come with giant sacks of cash to TSMC and try to convince them to give you volumes at their fabs, because they're already just being
you know, inundated with money from Nvidia and Apple and stuff. And it takes a while to get ramped up. But eventually the chips start coming out. And, you know, the irony of it is like, again, it's like, even though, you know, none of these custom silicon chips are going to be as good as the Nvidia chips.
the sort of way they're made is pretty similar in that they're all both going to be using TSMC as the fab. And they're both using the same machines from this Dutch company ASML that actually like does the lithography. So it's like, yeah, they won't have the same brilliant design maybe. But again, they don't need, that's the thing people miss. It doesn't need to be as good. It could be, it could be one fifth is good.
And it still makes sense to eat for Amazon to use it because they don't have to pay 90% gross margin to Nvidia. Because Nvidia has been chart has the luxury of having very high margins and what that creates is like, well, if your product is 90% as good, but you only take 10% of the margins, then all of a sudden you're solving like a lot of market.
But I'm saying when your margins are so high, just to put things into perspective, companies that sell chips like in the semiconductor industry, it's generally not such a great industry. It's very subject to boom and bust cycles of overcapacity. And so if you look at another area like memory, DRAM, which everyone has it in their phones and their computers,
You know, you might think on the surface that this should be like this great business because there's only basically three companies in the world that do it. It's like Micron, Samsung, and SK Hynix. I mean, there used to be like 15 memory companies, but they all like either went bust or merged. And so you would think you would think it would be this oligopolistic thing with great pricing and margins. But if you look at the history of it over the last 10, 15 years, it's like, it's very cyclical.
And at the very peak when supply demand mismatch is really out of whack and they can charge really high prices, they make like a 60% gross margin. And then, but if you take the average over the cycle, it's closer to like 20%. And at the bottom of the cycle, gross margins actually turn negative. Okay. And so then you look at NVIDIA and you're like, you have a 90 plus percent gross margin on data. They're overall gross margins, more like 75%.
because they make much lower margins on the consumer stuff like playing video games. And that's because they have competition from AMD, you know, like that's what happens in a competitive market. And so, but my point is that when new margins are that high, it doesn't need to be 90% as good. It could be literally like 40% is good. And it still is a no brainer for Amazon to switch as many loads over to their own thing because it's like, you know,
It's like when you buy a handbag from Hermes for 40 grand, how much do you think it costs them to make? Even though it's made by hand by some French guy, it's probably only like two, three thousand bucks tops, and then they're charging you $40,000 for the end. It's like very similar margins for the
the GPUs from NVIDIA. And so it's like, what matters is like, and the users don't care. They're submitting requests. They want to use a model, llama 3.370, but they don't care if an NVIDIA card is doing the inference on it. And so Amazon is going to, you know, Amazon made their own CPUs called like Graviton.
And they are very aggressive with the pricing of that to try to switch people over from if you normally use like an Intel or AMD CPU, try using one of our things and you'll save a lot of money. And you're going to see that where they're going to try to push people over to their product because by making it more, they're going to basically split the savings with the customers.
And I think that and so all that stuff, you know, it's like death by a million cuts, like the combination of the competition from these different areas. And then of course, it's like AMD does compete with them effectively in consumer stuff, but they've been completely absent in this whole data center AI stuff, which is
You know, it's just crazy. Like, I mean, they're going to be writing like business school case studies about how they squandered trillion dollar opportunity. You can't get too mad at them because they also like managed to kill Intel. So it's like, it's like, it's not like they're not good too. And it's so funny because Lisa Sue, the CEO of AMD, is like first cousins with Jensen Huang from NVIDIA. I did not know that. Which is just like how good are these genes in this family?
So, yeah, I mean, but if they can get their act together, and it's so funny because it's like, they're so out of it. Like, I just don't understand it, but like they're literally like people like George Hops, the guy who's famous for jailbreaking the iPhone and all this stuff.
He's like literally by himself without any help from them, writing his own stack of that's like, you know, we'll be able to make these GPUs usable for doing at least, you know, some training and inference. And so you might see even Andy coming up as a real competitor. And yeah.
Yeah, so yeah, going back to tracing over like the broad strokes of your article, I kind of break it out into two parts, two halves. There is the unbundling of NVIDIA's mode in the hardware side of things via hardware competitors as you've kind of just traced over. But then also the deep-seak side of things is a rebalancing of the value of software and in algorithm design.
maybe is one way to put it. Maybe you can take us to the second half of that equation where, how did deep-seek really impact people's understanding of the value of software and its impact on the value of hardware? Well, so, you know, when you say, like, what is the software side of the thesis?
It's not, it actually has very little deep-seek. What it has to do with is one of the biggest source of NVIDIA's mode has been, because you know, AMD has quite reasonably good, you know, chips. So the reason is that NVIDIA basically was a very forward thinking. And when they noticed that like, this deep learning stuff was really taking off and like back in the 2012,
And they, so they really figured out that we need to make it easy to use our chips for this sort of thing. And so they have this system called CUDA, which, because you have to understand, like these GPUs are insanely complicated. I mean, in the old days, you'd have one CPU with one core. Now CPUs are pretty complicated. Like I have a CPU in my computer that's 32 cores, but
These NVIDIA GPUs are like, they have like thousands of cores. And so that's our whole deal. They have lots of cores. And so it's like, if you were to try to write code naively to take your problem and break it up and send it to thousands of cores and reassemble it, no one can do that, basically. And so instead, you
described the problem using these much more abstract, high-level concepts. And then CUDA turns that into hyper-optimized code that runs really, really well on NVIDIA GPUs, but not on anywhere else. And CUDA is an NVIDIA-built software package to allow developers to use NVIDIA GPUs to their best degree possible. Yeah, without being like Einstein. Like they can be very smart, but I mean, it's kind of like a driver.
No, it's, it's, it's, it's, it's like a framework for framework. Yeah. The driver is a sort of separate layer of, uh, it allows the power of NVIDIA GPUs to be expressed to more people without them having to be. Yeah. It's like, it's like the difference between writing code and Python versus writing code in like, like assembler, like, like, which is the lowest level, you know, and so, and then it's actually, so CUDA is even,
Most people actually don't even write CUDA directly. Most people use machine learning frameworks like, you know, it used to be TensorFlow, but it's been sort of totally replaced by something called PyTorch, which was sponsored by Meta. And so that's what most researchers use is PyTorch, which lets them think, like, in terms of the math and, you know, as a researcher, say, oh, I have this loss function, I have this optimizer, I actually
And everything's like modular and plug and play. And then you write high level Python code, which is like very, very high level. And then internally PyTorch can then run that on CUDA.
and then run it on a GPU from NVIDIA very, very efficiently. But if you have an AMD driver, AMD GPU, it's not as easy to have your stuff run really, really fast, writing using like PyTorch and stuff. And so, and a lot of people were saying that it doesn't matter what anyone else does in terms of chips. If they don't have CUDA, you know, it's game over. And there's like a two, I think, big
assaults on that, which is that you're seeing the rise of these sort of even more high level frameworks for expressing highly parallelized programming. And so you have this one, MLX is one. It's another one called Triton.
And these are gaining momentum. And then for that, it's like CUDA is just one, you can write your stuff in MLX and then basically run it on an NVIDIA GPU really, really fast. But you could also make
another compilation target of MLX that could run on a completely different chip, like the one, you know, the Amazon is making internally the, you know, trainium chip. And so, and it's also very high level language. So maybe it makes, you know, instead of writing, you know, targeting CUDA, maybe you should target MLX or Triton. And then you can also get run it on using CUDA, but you could also run it using these other things. And then you're not locked into using the really expensive NVIDIA chips.
So that's one assault. And then the other one, I think, is this idea that, and I haven't heard a lot of people talk about this, but one thing I'll, like I use LLMs all the time for programming and they're just stunningly good at that now. But what they're really, really good at is if you already have a working prototype,
of code in Python or JavaScript or whatever. And so it can really understand what it is you're trying to do. They're unbelievably good at porting that to another language. So for you, you have this Python algorithm and you want to turn it into like Rust or Golang, they do that unbelievably well. Like maybe not on the first shot, but you know, with a couple iterations, you can get it all working. And so what that made me realize is that, you know,
Because the part of what the CUDA thing, it's like about it's become a lingua franca, like everyone who's good at this kind of programming knows it. And so they think in terms of CUDA concepts, it's just the fastest way for them to express these algorithms.
And so I was thinking that, like, they could write their code in CUDA like they normally do. But then instead of using it on NVIDIA GPU, they could use it almost as, like, what's called a specification language, where it's just for documenting the algorithm in a very efficient, elegant way.
And then they could feed that into an L11 and say, all right, now, port this into this other framework, which will work really well with, you know, ND GPUs or with, you know, if cerebras or something. And I think you really, you really explained this well in the article when you illustrated the, there's like a job market for CUDA engineers. Yeah.
And it's insular to the rest of like, you know, engineering jobs, engineers out there. It's very special. It's like there's this own independent like vertical of job markets and like the cost for these engineers. And the way that you illustrate in the article is like, well, that those walls break down. And all of a sudden, there's just like not really the same monopoly around
No, no, it's not that it's, I think they'll still use CUDA, but the question is like, can they use CUDA, but then not use an NVIDIA GPU? Right. Which is where the moat comes from in your ND. It gets at least part of its value from. Yeah. And now you did bring up a point about like, so the deep seek in a sense is software because by writing smarter training software, they did reduce the demand. But I'd say that sort of,
separate, that's like kind of orthogonal, if you will, to this other stuff, which again, it's like, so if you took away the deep-seek part of it, I mean, you can see the big threats to the mode, software and hardware houses. Now, let me just say, I just saw, but right before we start talking, somebody said, here's why this, you know, my thesis is all wrong. And they're saying that, well, the problem is that TSMC, which is Taiwan's semiconductor, which builds all these chips,
And they're basically the only ones I can do. I mean, they're not the only ones because like Samsung can also make pretty good chips, but like for the most, yeah, they make all the NVIDIA stuff and most of the Apple stuff. And, you know, but by the way, I want to point out again, it's like, yes, it would be best to do something in a four nanometer process node, which is the smallest you can do.
But you know, you could use like a bigger, like an older process node and your chips won't be as fast and they won't be as energy efficient. But you've got a lot of room wiggle room because you just you don't need it to be as good. You just need to be cheap. But so anyway, but the objection to my thesis is that these guys are book solid. Even if you came to them with like, you know, giant bags of money, they they're book solid. And the reason is because
The manufacturer is book solid. They're, they're backed up. They have too much order. Yeah. For the next couple of years, they don't care how much money you give them because they, they're, they're all book solid and they can't, you know, can't just instantly make a new, although I will say like, you know, Taiwan, semi built a fab in Arizona and there's all this, you know, stuff about, oh, it's taking them so long and they can't hire good people. But you know what? They finally did get it all up and running and
They could literally, if there was enough money in there to do it, they could just copy paste the blueprints, get another big chunk of land, and just replicate what they just did again, and they could do that. It wouldn't take that long. So in any case, so that's the objection. So even if everything I said is true,
these companies, Sarah Brissing Rock and the hyperscalers like Amazon and Google and blah, blah, blah, did meta and so that they won't even be able to make these chips in enough volumes that it's going to dent in video. And my response to that is like, okay, your analysis is essentially conceding that this is a highly sort of transitory circumstance here that like they just very temporarily
going to have this advantage, and then as soon as the additional capacity comes online or opens up, then there's going to be this massive flood of alternative supply, which is going to pressure market share.
But even if the pie grows, the market share is going to go down. But most importantly, there's some stuff that has nothing to do with technology that's just basic economic, industrial finance thinking about how do markets work.
And the difference between having basically a monopoly and having even one or two competitors is like the margins really can fall quickly because it's like, you know, if you have two office buildings that are like, you know, 98% occupied, nobody's, you know, it's a race to the bottom to try to cut your rents. But like, if both of them start losing tenants,
And, you know, every day that goes by and this floor is empty is just they're losing money. And so there's just race to the bottom where they just and there's this critical threshold where, you know, once let's say the occupancy rate in a market for office, you know,
dips below, let's say 80%, the rents, it's very nonlinear. If occupancy falls another 5%, rents are going to fall a hell of a lot more than 5% to make the market clear. And I think you'll see that the margins can fall very, very quickly once they're the real competitors. And then the question is, OK, again, this is not about technology. This is about how do you rationally value a stock?
And I mean, one of my favorite, I mentioned in my piece that, you know, I once won a prize from this Value Investor Club website for a short idea. This was like more than 10 years ago, but I'll quickly tell you the story of it, because I think it's so relevant here, which is that this was a company called Petrologistics, P-D-H was the ticker. And they were a company that just had a single plant that took propane and turned it into propylene, which is this.
Through this like random, you know, it's basically because the shale play happened all this like I don't have to get into all the details. Suffice it to say they were earning unbelievably high spread much much higher than historical or like when they built the plant what they ever expected to earn. They're earning so much that their profit in one year from running this plant was like 80% of the cost of building a new plant and it's not like
rocket science to build one of these plants. You can just go to a big construction company like Bechtel and say, I want to, I want a conversion plant for propane to propylene. And they have off the shelf blueprints, they'll make it for you, guaranteed in a couple of years. And sure enough, this company was earning these higher turns and people were putting a big multiple on the earnings because they're like, look at this, the earnings have gone up so much.
And but you could tell that all these other plants were already under construction and you actually knew approximately when those plants would come online. And so you could basically figure out, all right, even if I grant you that they're going to continue earning these massive margins.
It's going to start stepping down in like a year. And then in 18 months, it's really going to step down. And in 24 months, it's going to be right back to the. So if I want to value this as, let's say, what is the present value of the future cash flows discounted because of the time value of money, I can do that. I can say big, big profits this year, little bit less profits next year. And then after that, normal profits.
and add up the discount to cash flows, and you realize, like, you can't put a big multiple on earnings that are not sustainable. And right now, like, so if you tell me that, oh, well, but you're wrong because Nvidia is going to keep earning these huge profits,
For the next two or three years, it's like, dude, you're putting like a 30 40 X multiple on that. That's essentially implying that it's going to sustain at this rate, like indefinitely. And that's just not how, you know, you should think about this value of a stock.
It's really, this is why I wanted to say like, and a lot of the Jevin stuff, it's like, yeah, I am bullish on the aggregate, like the amount of total demand for inference is going to skyrocket. The pie is going to grow. That's totally separate question from will and video be able to continue growing revenues, triple digit percentage year over year.
add these insanely high margins. That's a completely separate bit. And you need to answer that question if you want to feel comfortable putting such a high multiple on that earnings stream. You have to know that it's going to sustain. And it seems actually quite likely that it won't sustain.
I do want to dive headfirst into the deep-seak efficiency gains part of this conversation because I think that's kind of where we should go next. One thing that you wrote in your article, you said, the sum total of all of these innovations, these are innovations referring to the lab that made deep-seak. When layered together has led to the 45x efficiency improvement numbers that have been tossed around online.
and I'm perfectly willing to believe that these are in the right ballpark. Maybe you can just explain the significance of this new chat GPT-like model deep-seek and how it got to be 45x more efficient and what that efficiency, what 45x efficiency means when it comes to the industries that are the supply chain to create the usage of these models.
Sure. So look, I mean, it's funny, like in the West, it's like we have this sort of resource curse, you know, almost stuff like we have too much money. It's easier almost to just throw money at a problem than to try to be like really clever. And so, you know, the joke or the sort of parallel I make is like when you look at people's houses in Saudi Arabia, they're not very energy efficient. And that's because
to get subsidized power because they have unlimited energy there. And so there's no point in wasting all this extra construction costs on double-playing class and blah, blah, blah. And it's a similar thing at like meta and Google and they just have so much operating cash flow hitting, you know, every quarter of like, fuck it, let's just, let's just hire more. Money's not an object? Yeah, let's pay our people $5 million a year and let's
or whatever, a million a year, and let's just send over to Jensen, another $3 billion. And whereas China, they're not getting paid that much, that's for sure. And they do have these export controls. Now, I know a lot of people say, oh, they're smuggling them in through Singapore. I'm sure that's happening, but like- Swaggling chips. First of all,
Under Biden, they basically have a slightly crippled version of the NVIDIA GPU just for the China market or export market. That's not as good as the H100.
But then also, I mean, what people point to you, which I think makes a lot of sense, is like something between 15 and 20% of NVIDIA's revenue comes from the tiny nation state of Singapore. It's like, really, they're using that many GPUs there? And it's like, because everyone knows that they're somehow getting laundered and smuggled into
China. And so the question is, we don't even know how many NVIDIA GPUs are in China. And so we don't really know how many deep-seek used. But the point is, they don't have as many as we do, and it's not as easy for them to get them. And so they have to be the punchline you're making is like that Tony Stark Ironman meme of Tony Stark was able to build this in a cave. Exactly. And that's China. They don't have an abundance of capital. They don't have an abundance of chips. They have plenty of capital, but they
they don't have the ability to and by the way they're quickly that's a whole other story but like they hired like you know they poached some of the smartest guys at TSMC to make their national champion and smeck or whatever it's called and they're obviously not there yet but like they made a pretty good Huawei
CPU. And I wouldn't be surprised if I mean, that's the other giant, you know, wild card that nobody's really taking into account. Like, don't count out. They got some of the smartest people from Taiwan semi over there. And it's like, they'll buy the machines from ASML too. And, you know, so but anyway, um,
What I wanted to say is that, you know, their engineers are a necessity's mother of invention. But also, you know, in the West, we tend to have this sort of bifurcation in the market where you either in the AI like research track, in which case you have a PhD and you've written these papers and your like guy like does stuff on the whiteboard or whatever.
And often these people are not very good engineers. There's a joke that these researchers are actually horrible at programming. They're good at math, horrible at writing optimized code. It's obviously not universally true there. Some people are great at both. So what happens is usually the researchers think at this high level, and then they make a prototype,
And then they handed over to these people who are more engineers, like high performance optimization guys, people like John Carmack or Jeff Dean at Google who, you know, they're not going to invent the new optimizer or like, you know, some new loss function for AI models. But if you give them an algorithm, they know how to make it run really fast.
you know, on a computer. And so it's sort of like the way we do it is this sort of two-step process in the West where the researchers design the thing and prototype it, hand it off to the engineering department that says, all right, we have this algorithm. How can we make it go fast?
The deep-seat guys are unbelievable at both, so it's like instead of having it be two teams working one and the other, it's like they kind of inverted it and they started out with, let's start out first with how can we saturate every ounce of performance on these GPUs so that nothing is wasted.
Because it almost doesn't matter how fast the GPU can calculate if it's waiting to get data that it needs to do the calculation, then it's just sitting there idle. And there's a lot of this interconnect. There's a lot of talking to each other. And so normally you have to dedicate a big chunk of your processing power just to handling that communication overhead.
So they did a lot of really clever work with like making the communication stuff as efficient as possible. So there's very little overhead. So they basically started with, rather than say, how do I make this algorithm go fast? They said, how can I make a really, really fast algorithm that'll
that will really run these GPUs as much as I can and then sort of design a smart training system based on that. So they sort of inverted things. And so there's just this collection of sort of optimization tricks. And by the way, I want to point out like many of these ideas were not invented by them.
Many of them were actually published by American and other researchers like Noam Shizir, who just got rehired by Google for a zillion dollars. They bought his startup just to get him because he's so he's like that smart. I mean, but it's implementing them in a clever way. And so I'll just give you just a couple examples of this. It's like so, you know,
All this whole chat should be teeth thing really exploded because there is this model designed called the transformer, which came out in 2017. This is probably the most cited paper in history now. It's called the tension is all you need. And this kind of combined a sort of regular neural nets that
We've been using for a while with something called the attention mechanism, which is this very clever way of contextualizing the information so that instead of always processing it the same way, it depends on its context. And you automatically learn how to think about that context.
And storing all that data while you're training is like one of the major things that use up memory. And the memory, it's very important because you can't use like the system memory on a computer. You have to do everything on the, what's called the VRAM, the memory that's very fast memory on the GPU itself.
And that's pretty limited. And so if you can save on the amount of memory you're using, that's huge, because not only can you do more with fewer GPUs, but you're also not transferring as much data because it's just smaller. And so anyway, there's something called these KV key value caches and indices that you need to keep in memory while you're training a transformer model.
And they came up with this incredibly smart, I mean, this is probably the coolest thing in the whole paper, the DeepSeq V3 technical paper, is that they realize that, you know, really it's very wasteful how it's done normally, that you're storing way more data than you need to, that the sort of only like some very small subset of that data actually is meaningful.
And in fact, by storing more than you need to, you're almost like overfitting to like noise and it's not necessary. And so they maybe a simple way to explain this for listeners who want some extra help with that is it's just maybe closer to how your brain works with attention where when you're applying attention somewhere, you're not thinking about every single thing under the sun all at once. You're kind of focusing on what's necessary. And I think maybe you can't go too far with the
anthropomorphizing, like attention, attention in this context means a very specific thing. And so it's not, I don't think it's going to help people. Maybe I can't remember if I heard this in your article, maybe a different one, but it's like if a house has, you know, 20 different rooms and lights are on in every single room, even though a person is only in one room. And this new model only keeps the lights on for the specific room that the person is in at that one given time.
It's some loose broad stroke pattern like that. It's basically, instead of just naively storing this massive amount of key value data that shows you if you have the word job,
It's very different if you say, nice job versus I just got a new job or you're going to be able to handle that job for me. And it's like knowing so the word job has a certain representation in the model, but that representation has to be altered depending on its context. That's about attention span.
And that means that for every word, every token, you have to really have to store lots of different things depending on the context. And that's why it takes up so much memory. And they're able to store that in a very efficient way using, you know, basically like just storing this sort of subset of the data in a compressed representation.
So that's one thing they did that saved a lot of things. Another thing they did that's very smart is what's called multi-token predictions. So usually these models, they predict the next token, the next word basically, based on the preceding tokens or words. And one at a time, and this is this bottleneck, and they're like, well,
What if we tried to do, let's say two or three at a time? Now, the problem with that is you can't really predict the second token without knowing what the next token is. And so how can you start with the second token until you know the first token, but you can do what's called a speculative
decoding. And so, but you're, especially decoding might be wrong, in which case you wasted your time computing that second token. But what they did is they got very good at guessing what that second one would be, such that 95% of the time they get it right.
And so they basically just from that, you can sort of double your throughput on inference because, and by the way, that's part of the reason why they're able to charge so little for their API is because that's about inference cost. And so they said that one trick let them almost double throughput for no additional cost, but just by, so that's a very clever trick they did. And then they did another very clever trick with just the, you know, these models are basically just a gigantic
list of numbers, if you will, called the parameters of the model. They figured out a way to store those parameters in a much more compressed form. Normally, the way these models are trained is they use more precision. You can think of it almost as more decimal places of accuracy, but it's not actually how it works, but it's just close enough to understand conceptually.
And then often what they do is once they then train the model that way to make it so that it can run on a cheaper GPU, they do what's called quantization where they sort of then kind of truncate and round off the numbers a little bit. But that does hurt the quality of the intelligence of the model.
And what the deep tech guys did is they managed to, instead of having to train at a higher precision and then quantize to a lower precision at the end, they managed to figure out how to mostly do the entire process end to end using the smaller representation. And again, it's like,
It's one of these things where efficiency gains are they pay for themselves so many times. Because it's like not only do you use less memory, but the calculations go faster. And then you don't need to do as much inter-GPU communication because there's less data. And so it's like it's these efficiency gains pay off in multiple different ways.
And so that's another thing they did. I mean, there's just this whole laundry list of like little tricks and optimizations they did that when you add them all together and they're not additive, right? They're multiplicative, like each thing.
If this thing doubles it, and this one increases it by 40%, and this one doubles it also, you're multiplying those multipliers, if you will. And that's how you can get this very big number, like 45 times, which by the way, we don't really know. We don't know.
for sure. They could have lied about the number of GPU hours they use. One thing is clear, though, that they are charging 95% less for inference. Either they're losing money on that, or they really can do at least the inference part much cheaper than we can here in the West because
Yeah. That 95% less money for inference, I think, is really the shocking number that is sending companies like Meta and OpenAI reeling like Sam Altman. No, actually, Meta was up, I think, because it's actually, look, on the one hand, it's bad for Meta in that they have spent so many billions of dollars on GPUs and they paint so much money.
to their team to come up with the llama models and stuff like that. And then it does make them look a little foolish when these guys are able to beat them at their own game on a shoestrain. But at the same time, what they really care about is how much does it cost them to serve AI to all of their billions of users around the world.
And so it's actually good for them if they can cut their cost 95%. That's great. Who is bad for is open AI and anthropic, because it's going to put more pressure on their pricing. Like they're not good right now. Right now, open AI charges a fortune for the 01 model API. And even, you know, GPT-40 is much more expensive.
And so they're going to probably have to respond by cutting their API prices significantly. And that's where they get their profit from, right? Well, they don't actually have profits. Both companies are deeply unprofitable at the consolidated level. And I actually suspect even at incremental marginal level, they're not all that profitable because they're prioritizing revenue growth above sort of everything else.
I don't think it's a case where they lose money on every unit sold on the margin. Any fast-growing company is going to post consolidated losses just because they're always spending on growth and then new models.
So the real question is like, what if open AI in an anthropic completely stopped trying to do R&D and making the new model and just trying to milk the business for money, what they have now? Would they be able to eke out a profit? And I think, yeah, the answer is probably yes. But if they have to cut their pricing by 80%, then it's very unclear. So that's where it starts to be pretty real. The Arbitrum Portal is your one stop hub to entering the Ethereum ecosystem.
With over 800 apps, Arbitrum offers something for everyone. Dive into the epicenter of DeFi, where advanced trading, lending, and staking platforms are redefining how we interact with money. Explore Arbitrum's rapidly growing gaming hub, from immersed role-playing games, fast-paced fantasy MMOs, to casual, luck-battle mobile games.
Move assets effortlessly between chains and access the ecosystem with ease via Arbitrum's expansive network of bridges and on-rifts. Step into Arbitrum's flourishing NFT and creator space, where artists, collectors, and social converge and support your favorite streamers all on chain. Find new and trending apps and learn how to earn rewards across the Arbitrum ecosystem with limited time campaigns from your favorite projects. Empower your future with Arbitrum. Visit portal.arbertrum.io to find out what's next on your web3 journey.
Cello is transitioning from a mobile-first, EVM-compatible Layer 1 blockchain to a high-performance Ethereum Layer 2, built on OP stack with EigenDA and one block finality, all happening soon with a hard fork, with over 600 million total transactions, 12 million weekly transactions, and 750,000 daily active users. Cello's meteoric rise would place it among one of the top Layer 2s, built for the real world and optimized for fast, low-cost, global payments.
As the home of the stablecoins, Celo hosts 13 native stablecoins across seven different currencies, including native USDT on Opera MiniPay, and with over 4 million users in Africa alone. In November, stablecoin volumes hit $6.8 billion, made for seamless on-chain FX trading. Plus, users can pay gas with ERC20 tokens like USDT and USDC, and send crypto to phone numbers in seconds. But why should you care about Celo's transition to a Layer 2? Layer 2's Unify Ethereum, L1's fragmented.
By becoming a Layer 2, Cello leads the way for other EVM-compatible Layer 1s to follow. Follow Cello on X and witness the great Cello Havening, where Cello cuts its inflation in half as it enters its Layer 2 era and continuing its environmental leadership. So, Jeff, I just want to kind of zoom out and summate everything. We have this new model, this deep-seek model, which is 45 times more efficient than, you know, Chachi-PT or other competitive models. That's caused a re-pricing in NVIDIA because people think, like, oh, wow, 45 times more efficient.
We just need much less hardware in order to make that outcome happen. It's just we're getting more from less hardware. And so maybe we've been overpricing a hardware. And that's what has shocked the market with a re-pricing of NVIDIA. And then also now, OpenAI Sam Altman are getting a squeeze because DeepSeek is charging 95% less money for inference requests. But my broad question to you is like,
Well, isn't this the expected outcome? Like AI and AI technology is on a very steep curve. And we're seeing, you know, breakthrough efficiency gains across the complete tech stack, whether it's hardware or the models where we've always known, like AI is going to accelerate very, very quickly. And isn't this just what this looks like? Isn't this kind of the expected outcome here? Like, of course, we're going to get more efficient. That's how technology works. Like, why is everyone surprised?
I mean, it's clearly not the expected outcome because the stock wouldn't have moved so much. I mean, it was the expected outcome for me, which is how I wrote my article. But I think the answer is that everyone does expect progress, progress on the hardware front that every year the chips are going to get faster and bigger.
progress on the algorithmic front that you're going to come up with a better way to train the models or do inference that's going to make things faster. I mean, when these LMs first really came out a couple years ago, they had a much more limited context window, like the amount of text you could put into them. That has gone up dramatically. And originally, everyone thought that that was going to be really hard to make that be able to go up because they thought it was going to
dramatically increased the amount of memory. But people came up with really brilliant inventions to, you know, new algorithms to make it faster. And so people do expect some level of algorithmic improvement, some level of hardware improvement every year. But
They expected to be, you know, like a Moore's Law type progression where it's somewhat predictable. And what really catches people off guard are step function changes where overnight, no. So like if the news was that they tripled efficiency,
You know, that would be for, you know, I mean, can you imagine, like, if you made an air conditioner, that was three times more energy efficient, like you'd crush them, you would get huge market share tripling something. It's like in any normal.
you know, if you had something with triple the mileage for your car, like that would do great. But like, we've become so sort of used to it in technology that it's like, you know, but 45 times, okay, now we're talking here. That's, that's really crazy. And so when that happens overnight, in a way that people didn't anticipate, that's when you get this sort of shocking thing. And, you know, the thing is like,
There's this expression of being price to perfection. The invidious share price, it only looked reasonable to people who extrapolated these curves out. You have to be very careful when you extrapolate revenue growth that has been going at 120% year over year. Again, it's not just revenues, it's about the margins.
And they were basically saying that the margins would maintain and the revenues would keep growing at this incredible rate. And as a result, yeah, sure, this is, and that's why every single, every single investment basically had a strong buy on Nvidia.
All of them, they all got caught completely off sides with this thing. They were all scrambling honestly to read my article and they're like, you know, I got inbound requests from some investment banks to like help because nobody even wants to talk to their analysts about this. They want to talk to experts. And so they're scrambling to find experts, not that I'm even an expert, but compared to the equity analyst at the sales side, apparently I am.
And so it was not expected at all, like that it would happen, like step function change like this. And that's what just is like the body blow to the stock is that, hey, this thing was pricing in, you know, clear skies. And then it's like, all of a sudden it's like, oh, there actually are these threats. And then again, it's not just the deep seek. It's like,
that people were ignoring a lot of these other threats. And I don't know why, because they're literally, these are people where their full-time job is to cover NVIDIA for Goldman Sachs and Morgan Stanley. And I don't know what the hell they're doing, that they didn't, you know, how come they weren't talking about, you know, the competitive threats to CUDA are like, you know, cerebrus and grok. And maybe they mentioned it, but they certainly didn't figure out that this actually is going to be really important.
And with the step function change, it's not just a step function improvement because it's also a step function improvement in a slightly different direction than what the market was thinking, correct? It's not, we aren't just skipping ahead on what it's like, we're also going in a different direction. Well, it's additive to everything else. It's like you are going to have faster chips next year. You are going to have more chips next year. You know, like you are going to have other algorithmic improvements on the margin.
But on top of that now, every big AI lab in the world is going to be the llama team at meta, the anthropic guys. You better believe Zuck has brought these guys into his office and said, we need to use every one of the tricks these guys are using for llama for.
Yeah. So like as a consumer of AI products, if you're not exposed to, you know, Nvidia, if you don't have open AI equity, private equity, if you, if you are just a consumer, you're stoked. The products coming down the pipeline are going to be sick in a very short order. Oh, and not only just that, but from the standpoint of like,
You'll be able to run this shit on your own computer. Like you get like a $1,000 Mac laptop, you're going to be able to have like AGI on your computer and to have privately. And it's like, it's the most miraculous thing ever. I mean, no one would have believed this even a few years ago. Like is that why Apple is up on the week? Because I think I saw Apple being like up three or 4% when Nvidia was down on 20%.
Apple is one of the guys that actually, it's so funny because Amazon and Microsoft and OpenAI, they're all like trumpeting in these big press releases about their custom chips that they're making. You know, Apple's so different. Apple's like so secretive. And it's like, but you know, they have like one of the best Silicon teams in the world, but
they only announce something if it's like they're ready to sell it to consumers. If they're making chips internally for their own uses, like no one even freaking knows about it. And all the people who do know about it are like signed up with NDA is like, and no one talks about it. And for all we know, they have pretty fucking awesome chips already.
But they're essentially like users of AI. So it's good for them. It means that they'll be able to use some of these tricks to make some of these models. In fact, there's an app you can get, I think it's called Apollo, on the App Store that lets you download these models. And if you have like an iPhone 16 Pro or something, you can just run this thing.
And you could be on an airplane or whatever with no internet in a bunker somewhere and have essentially, you know, not quite AGI on tap, but like, you know, certainly like smarter than most college students on a lot of topics. And it's wild to see it go. You know, you could go into airplane mode.
and be like asking it all these questions about, you know, chemistry and physics and history, and it'll give you really good responses at a reasonable, you know, pace. And so, yeah, it's good for Apple. It's good for Apple. I think it's ultimately good for Meta, which is why Meta's thought wasn't down.
It's not a bad thing. It's just bad. It's a recalibration. But I do think it was excessive in one day that the whole 2 trillion of capital wiped out.
It's I'm not I'm not saying that you should be buying the dip in Nvidia though because I think it did get ahead of itself and it could still look it could fall to two trillion and You know two trillion still a lot of money Okay, like this is a company that earned you know like five billion dollars like you know a few years ago like so that's still you know quite a quite a big valuation
Jeffrey, there's one last conversation before I let you go is the conversation of synthetic data. And this I think comes from just having stronger and better models is creates this notion of synthetic data. And this is also like part of the equation of like the rebalancing of how people value things. Can you just walk us through this synthetic data conversation? What is synthetic data? What do different and stronger models have to do with synthetic data? And what does it mean for the overall supply chain of AI?
Well, I'm not so sure that it, I mean, I think it's an important concept. I'm not sure how much it applies to sort of those things. What it is is that, you know, when you're training these models, the pre-training that actually makes the model smart,
It's partly a function of how much compute you apply and how many GPUs and how fast they are. But it's also the amount and quality of the data that you're training on. When DeepSeek says we use 15 trillion tokens in our training set,
That's what they're talking about and the thing is it's like there's only so much data that's of high enough quality that you'd even want to use it to train a model out there like if you take all of Wikipedia, I Don't know how many tokens that is but it's like not that many, you know, it's like it's like
less than, you know, it's measured in the low single digit billions, not even close actually. Sorry, maybe billions, yeah. But it's like, if you take all the books out there, we're talking really like, it's just like a couple trillion. Like if you talked about like all the newspapers that have ever been written,
It's a couple trillion. But what you're saying is the quality data that's out there is a process, processable amount of data. No, I'm saying that we're running out of data. We're running out of data, yes. That people write smart books, like, you know, they're not writing, you know, the books fast enough, basically just keep supplying us with more and more data. And so that's a big wall that we've been facing. Like, how are we going to keep improving the models if we're not going to be able to scale up the data that they're using?
And people say, oh, but you could just take every YouTube video. But it's like, have you seen most YouTube videos? It's not going to make your model smarter. You know, like it's going to make it dumber. No. And so, but there is an exception to this rule. And so the exception now. So synthetic data is using an LLM to generate text.
and then turning around and training a new model on that text. And so that sounds like very circular, like how it's like me trying to teach myself in a room without a book or anything, just talking to myself and I'm going to teach myself like how's that supposed to work, like in terms of getting new information?
Isn't that in a sense almost getting high on your own supply that it's not going to help you? And that is sort of true if let's say you're talking about like, you know, the history of the Peloponnesian War or something, you're not going to get anything new by just regurgitating your own output. The exception to all that is if you're talking about like logic, math,
computer programs, because in those things, the big difference is that you can verify that what you said is correct. So, you know, just like the rules of chess are very simple, but it's like almost unlimited complexity of the possible chess games. It's the same thing. Like there are so many possible simple Python programs that are like a hundred lines or less that, you know, we've only ever seen a tiny subset of them. So you could come up with a, you could say, oh, I want to make a Python program that does x, y, z.
generate a candidate and then test and be like, well, when I run it, did I get that output? And if you did, you know that the program's right. And so now you can say, okay, well, let me add that to the training set. And it wasn't in the training set originally, but it is correct and good. And so what you can do is you could start exploring the world of like all possible math theorems.
and working out all these math proofs, verify that they're right, and then add them to the training set. And in that way, you could basically come up with lots of data that's known to be super high quality. And that's why these models are getting better at logic and math at a much faster rate than they're getting better at anything else. Because you could sort of just keep cranking and getting this synthetic training data.
And then the scaling just can just keep going forever. And so that's why it's sort of funny that, you know, rich jobs are, you know, most at risk from AI. You know, I think a lot of people thought it was like, you know, well, you're still going to need people are like really, really smart quantitative stuff. And it's like, I got news for you. That's like,
the thing that they're going to become super human at before anything else. Because you're still going to want to read the history book by a really smart human before you read the AI's history book. But the AI mathematician might be pretty good two years from now, two years from now.
Jeffrey, we heavily talked about your article, which I'll have linked in the show notes and people want to go read that for themselves firsthand. But also just tell us a little bit more about you, like where you come from, what you do, what else you're working on.
Sure. So in my day job, for the last couple of years, I founded a CEO of Pastel Network, which is actually a crypto project. PSL is our ticker. We trade on a few exchanges like MEXC and GATE. And we started out as sort of a NFT
platform. It's an interesting project. It's based on the Bitcoin core proof of work concept, but with all the additional layers to it. But sort of in the last year, we've done a big pivot to decentralized AI inference. And so I've written just tremendous amount of code in the last year to essentially let you do inference across all sorts of modalities, all sorts of providers of
AI models, including totally uncensored models. And you don't have to dox yourself by giving an email address and a credit card and your IP address. You can just pay with crypto, and it's all pseudonymous. And it's decentralized. It's going, it's all the inferences being handled by these super notes that anyone can start.
and themselves. And you can even, I mean, the example I like to joke about is you can use one of these uncensored versions of the llama models and say, like, how do I make meth at home? And they'll actually just tell you exactly the recipe, whereas, you know, good luck trying that on a chat TPT or cloth. Is this what you call the sovereign AI sector?
Yeah, yeah, but it's really sort of decentralized. I mean, part of the thinking of it was that, you know, for me is not necessarily on the consumer level, like chat TPT, although I did make something like that. It's like, if you go to inference.pestel.network, you can try it all in a browser and you could do inference across all these models.
But it's also that to make it sort of an API that if you have another crypto project, like let's say you have a prediction market and you want to make it so anyone can in a decentralized way create their own prediction event, but you want to have some rules around that. Like you don't want people to make like assassination markets where they're predicting that somebody's in a die by a certain date. So you need to have some kind of moderation.
You don't want to necessarily have it be that there's a moderator who has like, power to delete stuff, right? Cause that how's that decentralized? So, but so the, I think the better way to implement something like that is to have an LLM do it in a totally impartial way where we have this prompt that says, you're not allowed to do an event about, you know, involving any of these subjects. And then you have the user wants to create their prediction event. They have to describe what
is being predicted, at the time they're trying to create that event in the system, it's going to show it to an LM. The LM's going to say yes or no. And based on that, they're going to say, no, you can't do this. You have to change it. Now, if you have this prediction market that's decentralized,
You can't really go and use OpenAI or Claude for this because that requires an API key hooked up to a credit card, which that's not decentralized. It can't work like that. It has to actually be decentralized. So that's the idea of pastels that they could use pastel. And they can say with a straight face, in all honesty, that this is decentralized right down the line that this is fully decentralized.
and that it can never be shut down by just turning off this one API key or credit card or, you know. And so that's the basic idea. And then I have some other side projects like my YouTube transcript optimizer, which is where I publish people who are very confused. And why is this not on medium or something or a sub stack? And I'm like, sorry.
It's so funny because I basically was trying to help my organic search ranking of my little YouTube tool, which I've generated like $1,000 or something of revenue from it. And then it's like, in the process, I may have inadvertently contributed to $2 trillion getting wiped off global equity projects. Because the fact is, look, I don't want people to say that I'm some medical maniac here,
The fact of the matter is all of the news headlines came out saying, the stock market crashed because deep-seak. And it's like, I'd like to point out that the deep-seak V3 technical paper that talked about the efficiency gains, it came out December 27th, okay, that's a month ago. And all these famous people like Andre Carpathi were all over this, talking about this weeks ago on Twitter. Even the newer model, the R1 model that does the chain of thought,
that paper came out a week ago and people were all over that. So why suddenly on Monday did everything crash? And I'd like to think it's because I'm pretty sure it is that I wrote this article in a way that sort of speaks to hedge fund managers so they can understand it. And I published it like in the middle of the night on Friday and then it started taking off.
And then it got shared by Chamath, who has, you know, whatever, 1.8 million, right? And Tamath, and it's been viewed over two million times. The volra of account is two and a half million. And then like the Y Combinator, Gary Tan, and the Y Combinator account, between them, they have millions of followers.
And not only did they share it, but they were like very effusive in their praise about, this is really smart. And that went crazy. And I can tell you, I have been inundated by requests from huge funds that wanted to talk to me about this. And I believe that it did, in fact, as crazy as it sounds, precipitate the decline. Obviously, I didn't cause it. It was caused by the underlying situation, but in terms of highlighting it,
It didn't come from the investment banks. And I think part of the problem is just people are talking in different circles. Like the people who are buying NVIDIA with billions of dollars at a big fund are not reading the technical papers and they're not even necessarily reading the tweets from Andrei Carpathi.
you know, they're just relying on sort of this consensus of where things are going. And all it took was sort of a really in-depth explanation that made sense to them. And they were like, holy shit, I didn't know this. And you know, trying to say one other funny thing is I have, because it's running on my own blog, I have Google Analytics, I can see real time, like who's not who, but where they are.
And it's so funny because I just, when it started going viral, at first I was so thrilled that 50 people were reading it at once. And then it was like, before I knew it was like 1500 people at any given moment. And it's a 60 minute reads 12,000 words. So it's not short. And, but at first it was like mostly guys in New York, because that's where all the hedge ones are.
And but then I noticed right before I fell asleep on Saturday night that, you know, we're the biggest place where people are reading it was San Jose. And I'm like, who sounds like where Nvidia is based. And then because there were like hundreds of people from San Jose reading the thing at the same time. And.
As of yesterday is when I last checked, over 2,000 people from San Jose read my article. And the funny thing about NVIDIA is that they've gone up so much that something like 80% of the employees have more than like $10 million with the stock. And you know, it's like the main thing that they talk about with their spouse and friends of like, man, I have a lot of the stock. Should I keep staying on for this ride?
They understand the technology, but maybe they don't understand how to value a company, and they've read this, and this thing started passing around like wildfire, and I was like, oh my god, I bet you've Jensen's reading this too. And I think there's a lot of stuff that's sort of never hit the market because it was awarded as RSUs and options to these people.
And it only takes a little bit of that on the margin to start causing imbalances. And so I wouldn't be surprised if a lot of that seller pressure came from Nvidia employees. But also it's like these big hedge funds like control. A lot of there's a lot of fast money players and they suddenly got spooked. And it's like, so it's wild to think about that. You know, it could have actually been this sort of, you know, the Reichstag fire, if you will, of setting off
this whole course of events. But I actually do believe without it being so, you know, I mean, I'm sure there are people who say, no, about this other guy wrote this and this other guy wrote this. And I was like, yeah, but my thing went pretty freaking viral. And from the right. And other people's stuff cited yours, your article. Sure. You know, maybe they didn't. I mean, I saw the guy.
Ben Thompson from Stutti Tree. It sort of sounded like he paraphrased my thing without giving me any credit, but whatever. But I just think it's really funny how there's headline stories today from The New York Times and The Wall Street Journal that both said
you know, they're always trying to assign causality to stuff. And they said it was caused by this. And I was like, not really that because it's a ludicrous concept that the 45x efficiency case, that was known a month ago. So you have to explain why there was a one month lag, okay, whereas like this,
is very understandable, that this spread like wildfire from thought leaders like Tumath and Naval. I mean, Naval is like put on such a pedestal by the VC guys. And you know, and the tech hedge phone guys look up to the VC guys like Andrei Sinaro is one of these guys. And the Y Combinator guys, they're the experts, right? And then you have those guys saying that this is a great article
And it's like, well, okay. And so of course that can very quickly convince, and it's not like you have to convince everyone. You just have to convince, you know, the guys that like co-tube were imagine 70 billion dollars that they should maybe like sell a little bit to get in front of this. And you're, you know, that's what you need. And so anyway, I emailed both of the journalists that at least, you know, you should be aware that you may have gotten the causality a little rolling on this. But anyway,
Well, Jeffrey, it's an honor to have the original source of the information on the podcast. It was great to have you as a guest. And perhaps as these AI wars and video chip wars, USA, we didn't even get a chance to talk about USA versus China. But as this progresses, maybe we can get you back on to keep on commentating. Yeah, I really appreciate you coming on. Thanks a lot.
Bankless Ancient, you guys know the deal. Crypto is risky. You can lose what you put in. But it sounds like the traditional market is also risky, too. But we're headed west. This is the frontier. It's not for everyone, but we're glad that you are with us on the Bankless journey. Thanks a lot.
Was this transcript helpful?
Recent Episodes
Can Pudgy Penguins Flip DOGE? | Luca Netz

Bankless
In this episode of Bankless, Luca Netz, CEO of Pudgy Penguins, explores how meme coins are evolving beyond speculation. We discuss Pengu’s launch on Solana, Pudgy Penguins’ expansion into retail and gaming, and how GIFs with 300M+ daily views are driving mass adoption. Luca also shares why meme coins can be real brands and what’s next for Pudgy’s growing ecosystem. ------ 📣SPOTIFY PREMIUM RSS FEED | USE CODE: SPOTIFY24 https://bankless.cc/spotify-premium ------ BANKLESS SPONSOR TOOLS: 🪙 FRAX | SELF SUFFICIENT DeFi https://bankless.cc/Frax 🦄UNISWAP | BUG BOUNTY PROGRAM https://bankless.cc/Uniswap-Bug-Bounty ⚖️ ARBITRUM | SCALING ETHEREUM https://bankless.cc/Arbitrum 🛞MANTLE | MODULAR LAYER 2 NETWORK https://bankless.cc/Mantle 🌐 CELO | BUILD TOGETHER AND PROSPER https://bankless.cc/Celo 🏦ONDO | INSTITUTIONAL GRADE FINANCE https://bankless.cc/Ondo ------ ✨ Mint the episode on Zora ✨ https://zora.co/collect/base:0x4be6cd4d402fed49eb2de95fbc8e737e8ffd3e7f/24?referrer=0x077Fe9e96Aa9b20Bd36F1C6290f54F8717C5674E ------ TIMESTAMPS 00:00:00 Intro 00:03:53 Opinion on Pengu token and its competitor 00:08:03 Reformatting the idea of a memecoin 00:12:32 Pudgy Penguins aren't just memes 00:16:09 Solana competitor vs Pengu 00:20:03 Impact of 30 Billion views on the GIFs on the brand 00:22:56 How does the GIFs system work? 00:25:27 GIFs working principle 00:27:13 Big brands doing SEO on GIFs 00:28:38 Non-crypto people using Pudgy GIFs 00:31:59 Physical Products of Pudgy Penguins 00:36:50 Other components of expanding the brand 00:38:56 How big is the conglomerate? 00:39:26 Choosing Solana L1 over L2 00:45:27 Where does Base stand in the algorithm? 00:51:26 Some thoughts on Ethereum's current state 00:55:32 Abstract Chains 01:02:58 Pudgy Penguins' relation with Abstract Chains 01:04:18 Closing & Disclaimers ------ RESOURCES Luca Netz https://x.com/LucaNetz Pudgy Penguins https://pudgypenguins.com/ ------ Not financial or tax advice. See our investment disclosures here: https://www.bankless.com/disclosures
February 03, 2025
ROLLUP: Bitcoin on the US Balance Sheet? | Trump's US Crypto Stockpile | Banks Can Now Custody Crypto | Venice AI's Massive Airdrop

Bankless
Haseeb Qureshi discusses the U.S. government's new stance on crypto with Trump signing an executive order for a national digital asset stockpile and banning CBDCs. SAB121 is revoked, allowing banks to custody crypto. Important crypto-related events include Venice AI token launch, Abstract Chain mainnet launch, Nvidia selloff, and Gary Gensler returning to MIT.
January 31, 2025
AI ROLLUP #9: $6M DeepSeek Shocker | $500B AI Push | Venice’s Billion-Dollar Airdrop | Solana DEX Record

Bankless
Ejaaz and David discuss how a $6M open-source AI model (DeepSeek) threatened OpenAI's dominance, affected Nvidia stock, and sparked a new crypto AI 'arms race'. They also cover Trump's AI funding pledge, Solana's record DEX volumes potentially leading to the biggest bull run in AI space. Mention of several builder activities including ARC partnering with Solana, AI16Z launching a $10M fund, Virtuals planning multi-chain expansions, and others.
January 30, 2025
Superagency: The Bull Case for AI | Reid Hoffman

Bankless
In this episode of Bankless, Reid Hoffman discusses how AI is poised to reshape society and democratize technology; explores perspectives on its impact (pessimistic, cautious, ambitious, balanced); shares lessons from past tech revolutions and risks of misuse; examines benefits to average citizens and the potential for AI to deliver value worth millions.
January 27, 2025

Ask this episodeAI Anything

Hi! You're chatting with Bankless AI.
I can answer your questions from this episode and play episode clips relevant to your question.
You can ask a direct question or get started with below questions -
What was the main topic of the podcast episode?
Summarise the key points discussed in the episode?
Were there any notable quotes or insights from the speakers?
Which popular books were mentioned in this episode?
Were there any points particularly controversial or thought-provoking discussed in the episode?
Were any current events or trending topics addressed in the episode?
Sign In to save message history