The Forward Thesis
Posts
The Memory Wall: How RAM Became AI's Biggest Bottleneck

The Memory Wall: How RAM Became AI's Biggest Bottleneck

A Forward Thesis Deep Dive

January 28, 2025

Executive Summary

The explosion of AI has exposed a critical challenge that threatens to slow its advance: memory constraints. While computing power continues to grow, the ability to feed data to those compute engines efficiently hasn't kept pace.

This "memory wall" is reshaping the semiconductor industry and creating opportunities for new approaches to AI computing.

This analysis explores how the memory bottleneck is driving changes across the AI hardware landscape, from traditional DRAM to specialized AI accelerators.

Picture this.

Imagine trying to read a book where you can only see one word at a time through a tiny window, and you have to physically walk across the room to get each new word. That's essentially what modern AI chips face - they can process data incredibly quickly, but getting that data from memory to the processing cores is increasingly the main performance bottleneck.

This wasn't always such a pressing issue. Traditional computing tasks like running your email client or web browser don't require moving massive amounts of data around constantly. But AI workloads, particularly large language models, are different. They need to juggle enormous amounts of data simultaneously - model parameters, intermediate calculations, and input/output data all need to be readily accessible.

To put this in perspective: ChatGPT-4 reportedly requires over 1TB of memory just to run inference (using an already-trained model). Training such models requires even more memory bandwidth and capacity. This creates a fundamental challenge: how do you feed enough data to increasingly powerful AI processors quickly enough to keep them busy?

To understand why this is such a challenge, we need to understand how memory works in modern computers.

There's a hierarchy of memory types, each with different trade-offs between speed, capacity, and cost:

On-chip memory (SRAM): Lightning fast but tiny and expensive. This is like having a few notes on your desk - instantly accessible but limited space.
High Bandwidth Memory (HBM): Very fast and close to the processor, but still limited and expensive. Think of this as a small bookshelf within arm's reach.
System DRAM: Larger capacity but slower. Like having a library in the next room - more space but takes time to access.
Storage (SSDs/HDDs): Huge capacity but very slow. Equivalent to having to drive to a warehouse to get information.

The closer memory is to the processor, the faster and more expensive it becomes. This worked fine for traditional computing where data access patterns were more predictable and localized. AI has broken this model.

High Bandwidth Memory has become the crucial enabling technology for AI accelerators. It's essentially DRAM stacked vertically right next to the processor, connected with thousands of tiny wires called Through-Silicon Vias (TSVs). This provides massive bandwidth while keeping the physical distance data needs to travel very short.

The latest HBM3E can deliver over 1TB/second of bandwidth per stack. But this comes at a steep cost - the latest versions can contribute over 50% of the total cost of high-end AI chips like Nvidia's H100. This is largely because manufacturing these stacked memory chips is incredibly complex and prone to yield issues.

Samsung, SK Hynix, and Micron are the only companies capable of manufacturing HBM at scale, creating intense competition for supply. This has driven prices up while limiting availability, becoming a key constraint on AI chip production.

This memory bottleneck has sparked innovation across the industry. Companies are taking several approaches to tackle the problem:

Processing-In-Memory (PIM)

Companies like d-Matrix are developing chips that combine memory and processing in the same device. This eliminates the need to move data back and forth, but comes with its own challenges in terms of programming complexity and scaling to larger workloads.

Their latest chip claims 20x faster inference than Nvidia's H100 for certain workloads - but only when staying within on-chip memory. Performance drops dramatically when they need to access external memory, highlighting how crucial the memory bottleneck has become.

Specialized Inference Architectures

Several startups are developing chips specifically optimized for AI inference, taking various approaches to the memory challenge:

Groq has designed a chip with massive on-chip memory and a unique architecture that can process tokens much faster than traditional GPUs for certain workloads. They recently demonstrated running Llama-70B by linking 576 of their chips together.

Tenstorrent is taking a hybrid approach, including small RISC-V cores alongside their AI processing units to better manage memory access patterns. This provides more flexibility in how workloads are structured.

FuriosaAI has developed what they call "tensor contraction processing" - essentially processing data in its native multi-dimensional form rather than converting everything to 2D matrices like most AI accelerators do.

Memory-Semantic Processors

Some companies are exploring chips that treat memory and processing as a single unified resource rather than separate systems. This is still an emerging approach, but it could potentially offer better efficiency for AI workloads.

The memory wall isn't just a technical challenge - it's reshaping the competitive landscape of the AI hardware industry.

Some key implications:

Memory Manufacturers Gaining Power The limited number of companies capable of producing HBM gives them significant leverage. This has driven consolidation in the memory industry and is leading to higher margins than memory makers have historically enjoyed.
Specialized AI Chips Becoming Viable The memory bottleneck creates an opening for specialized AI chips that can better manage memory access. While Nvidia still dominates training, the inference market may be more contestable.
System-Level Innovation Companies are increasingly competing on system-level solutions rather than just raw chip performance. This includes innovations in packaging, cooling, and interconnects between chips.

The memory wall isn't going away anytime soon. While new memory technologies are in development (like next-gen HBM4), fundamental physics makes it difficult to keep reducing the energy and time cost of moving data around.

This suggests several trends to watch:

Continued innovation in chip architectures that minimize data movement
Growth in specialized chips optimized for specific AI workloads
Increasing importance of software that can efficiently manage memory access patterns
Potential for new memory technologies to emerge that better balance the competing demands of AI workloads

The companies that best navigate these challenges while delivering practical solutions that developers can actually use will be best positioned to capture value in the evolving AI hardware landscape.

Conclusion

The memory wall represents both a crucial challenge for the AI industry and a significant opportunity for innovation. While traditional architectures struggle with the massive memory demands of modern AI, new approaches are emerging that could help break through this bottleneck.

For investors and industry participants, understanding these dynamics will be crucial for evaluating opportunities in the AI hardware space.

But that’s why you read The Forward Thesis.

Until next time.

Forward Thesis provides detailed analysis of technology markets and emerging opportunities. None of the content written by The Forward Thesis should be taken as financial advice. This deep dive is part of our ongoing coverage of the AI semiconductor sector and its market implications.