NVIDIA Vera Rubin Sparks Memory Demand: Analyzing the Pros and Cons of SK Hynix, Samsung, Micron, and SanDisk

2026 CES Consumer Electronics Show, NVIDIA CEO Jensen Huang officially announced Vera Rubin’s mass production, marking a critical turning point in the history of artificial intelligence (AI): from the early days of generative AI centered on model training, to an era dominated by agentic AI and large-scale inference.

(Jensen Huang CES sets the tone for 2026: Vera Rubin full-scale production, AI autonomous vehicles to launch in Q1, key processes from TSMC)

This report will delve into how this technological shift is reshaping the hardware layer of data centers, especially the G3.5 storage tier and the inference context memory storage platform (ICMS). Against this backdrop, the world’s four major memory and storage giants—SK Hynix, Samsung Electronics, Micron Technology, and SanDisk—are facing unprecedented opportunities and challenges.

What are HBM, DRAM, NAND? Memory terminology explained

Before diving into the main content, let’s clarify these terms with simple descriptions:

Layman’s explanation of memory terms: HBM (including HBM3E, HBM4, HBM5)

HBM stands for High Bandwidth Memory. Think of it as stacking many layers of DRAM chips like a multi-layer cake, connected to the GPU via very wide and fast high-speed highways, enabling super-fast data transfer.

HBM3E: Currently the main version, used in the latest generation of GPUs, offering high speed and relatively low power consumption.

HBM4: The next generation, designed for more powerful GPUs like Vera Rubin, with higher bandwidth and larger capacity.

HBM5: The upcoming (planned) generation, expected to further increase speed and capacity, preparing for larger future models.

Next to the Rubin GPU, many HBM stacks will be packed, allowing the GPU to access data at ultra-high speeds. The core computing power for AI training and inference relies entirely on HBM to supply data, making it the star of the current AI server supply shortage. Manufacturers are shifting large production capacities to HBM, leading to tight supplies of other memory types. In the Vera Rubin era, HBM is the most critical component among all parts.

Layman’s explanation of memory terms: SSD

SSD is like a huge USB flash drive used for long-term data storage, retaining information even when powered off. Files, videos, and games stored inside a computer are stored on SSDs (or traditional hard drives). In the Vera Rubin era, to enable AI chatbots to remember vast amounts of text, dialogue history, and knowledge, Vera Rubin needs to connect to many SSDs, acting as a massive data library. Citi estimates that a Vera Rubin server requires about 1,152TB (that’s 1,152 1TB units) of SSDs to operate the new ICMS system.

Previously, SSDs were more like supporting actors in data warehouses; now, in ICMS/long-context inference, they play a crucial role.

Layman’s explanation of memory terms: NAND

The material inside SSDs that actually stores data is called NAND flash memory. Think of it as the pages in a book; SSDs are like bookshelves, and NAND chips are the individual pages. Vera Rubin’s ICMS uses many SSDs, which are filled with NAND chips, so AI models require a large amount of NAND. As AI models grow bigger and dialogue memories become longer, more NAND is needed to store text and intermediate results.

Layman’s explanation of memory terms: DRAM

DRAM is like short-term memory whiteboards. When a computer performs calculations, it writes data on DRAM; when powered off, the whiteboard is wiped clean. It’s much faster than SSD but forgets everything when shut down. In Vera Rubin, DRAM serves as the working space for CPUs and GPUs during general computations. It doesn’t store long-term conversations or huge models directly but supports system operation. However, because manufacturers have shifted capacity to HBM, the supply of regular DRAM has decreased, causing prices to surge and shortages to occur.

Layman’s explanation of memory terms: LPDDR5X / DDR5

DDR5 is the main memory used in servers and desktops, faster than the older DDR4.

LPDDR5X is a power-efficient version designed for mobile devices or high-density CPU modules, akin to “energy-saving DRAM.”

Rubin CPUs require a lot of LPDDR5X or DDR5 as system memory for control, scheduling, and system tasks. They are not directly attached to GPUs like HBM but are fundamental for stable operation of AI servers. Due to capacity being diverted to HBM, the supply of DDR5 / LPDDR5X is tight, and prices are rising.

Layman’s explanation of memory terms: High Bandwidth Flash (HBF)

HBF can be thought of as a speed-enhanced NAND, aiming to make flash memory faster and more memory-like. Compared to standard SSDs, HBF emphasizes “high throughput and low latency,” enabling AI inference to read and write large amounts of context quickly.

In Vera Rubin, HBF is one of the core components of ICMS: storing large KV caches and long-context data on this high-speed flash, using network technologies (like RDMA) to allow GPUs to access data at near-memory speeds. This is the G3.5 concept. It elevates flash from just storage to an external memory that can participate in computation workflows.

Vera Rubin Generation: Fundamental Reconfiguration of Hardware Architecture

Extreme Co-design and Rack-scale Computing

At CES 2026, Jensen Huang revealed a core philosophy: in the Rubin generation, the unit of computation is no longer just a single GPU or server but an entire data center rack. The Rubin platform consists of six core chips: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch.

This strategy, called extreme co-design, aims to eliminate communication bottlenecks between chips, transforming the Vera Rubin NVL72 rack into a single supercomputer with 3.6 ExaFLOPS inference compute power and 75TB of high-speed memory.

This architectural evolution is not merely about stacking performance but is driven by the fundamental change in AI workloads. From Blackwell to Rubin, AI models have evolved from simple question-answering machines to intelligent agents capable of multi-step reasoning, long-term memory retrieval, and tool use. These workloads demand hardware that not only offers high throughput but also extremely low latency and vast context retention capabilities.

Acquisition of Groq and the Inference Revolution: Defensive Mergers and the ASIC Era

By the end of 2025, NVIDIA acquired AI chip startup Groq through a $20 billion talent acquisition and technology licensing deal. Groq’s core architecture, LPU (Language Processing Unit), is essentially an ASIC optimized for Transformer models. Unlike traditional GPUs relying on HBM, Groq uses on-chip SRAM and a compiler-first design.

In real-time interaction scenarios, this architecture can deliver token generation speeds 10 times faster than traditional GPUs, with 10 times higher energy efficiency. NVIDIA aims to complement low-latency inference (where Groq’s LPU excels) with the CUDA ecosystem. Cloud giants like Google (TPU) and Amazon (Inferentia) have already demonstrated the huge cost advantages of dedicated chips for inference. NVIDIA must leverage Groq’s technology to defend its position.

The Context Wall Challenge

In long-context inference, Key-Value (KV) Cache is the mechanism AI models use to remember dialogue history. As the context window expands to millions of tokens, the size of KV Cache grows linearly, quickly exhausting the expensive and capacity-limited GPU HBM. When HBM is full, data is evicted to system DRAM or local SSD. This leads to a KV Cache crisis: GPUs often idle while waiting for historical data.

G3.5 Layer: Inference Context Memory Storage Platform (ICMS)

In the Vera Rubin architecture, the most disruptive and profound change in memory industry is the emergence of G3.5 memory tier, i.e., the Inference Context Memory Storage platform (ICMS). This innovation is not just an upgrade but marks the arrival of the context-aware computing era.

ICMS uses BlueField-4 DPU and Spectrum-X Ethernet to establish a shared, flash-based buffer pool at the rack (pod) level. This G3.5 layer sits between DRAM and traditional storage, utilizing RDMA (Remote Direct Memory Access) technology, enabling GPUs to access remote Flash KV Cache at near-memory speeds.

Forcing New Technical Standards (HBF & AI-SSD)

To enable NAND flash to handle high-intensity, near-memory workloads, the industry is accelerating technological iterations, changing the roadmap of major memory manufacturers.

High Bandwidth Flash (HBF): To pursue higher bandwidth, SK Hynix and SanDisk are developing HBF. This is a 3D-stacked technology similar to HBM but using NAND wafers, aiming to provide several times the throughput of traditional SSDs, specifically for AI inference.

AI-specific SSD (AI-NP): SK Hynix is working closely with NVIDIA to develop AI-NP SSDs capable of 100 million IOPS. This performance is 100 times that of current top-tier SSDs, designed to meet the extreme demands of ICMS for random read speeds, ensuring data can be fed to GPUs instantly.

The G3.5 ICMS layer is a critical bridge extending the AI value chain from expensive HBM down to NAND flash. It solves the pain point of AI agents needing infinite memory for complex tasks, transforming NAND from a cyclical storage commodity into an indispensable core resource in AI computing infrastructure.

Rubin NVL72 Storage Inflation Effect

According to analyses by Citi and other market research firms, the ICMS in the Vera Rubin architecture has an explosive demand for NAND. Besides standard storage, ICMS driven by BlueField-4 adds about 16TB of high-speed NAND flash per GPU. For a fully loaded NVL72 rack with 72 GPUs, this means an additional 1,152TB (about 1.15PB) of NAND demand.

If by 2026, 100,000 such racks are deployed globally, it will generate over 115 Exabytes of additional NAND demand, accounting for about 12% of the total NAND supply in 2025. This demand is not only massive but also highly performance-critical, leading to market shortages of enterprise SSDs and triggering a seller-dominated super cycle.

This architectural revolution has pushed the memory market into a “triple super cycle” (DRAM price hikes, NAND shortages, HBM sell-outs). Below is an in-depth competitive analysis of the four major players:

SK Hynix (SK Hynix): The AI architecture designer

Position

Absolute leader in HBM market (HBM3/3E market share 5~60%), core partner of NVIDIA.

Advantages

HBM4 Monopoly: Estimated to account for over 70% of initial orders for Vera Rubin platform HBM4, with capacity already announced to be fully sold out by 2026.

HBF Standard Setting: Collaborating with SanDisk to promote High Bandwidth Flash (HBF), aiming to elevate NAND to near-memory levels.

AI-NP SSD: Developing ultra-high-performance SSDs with 100 million IOPS specifically for ICMS.

Disadvantages

SK hynix is currently benefiting from the AI super cycle, with HBM3E / HBM4 nearly fully booked. In 2026, it also foresees potential risks of price corrections and increased competition. Multiple institutions point out that once HBM supply expands and prices fall after 2026, SK hynix, which relies most heavily on HBM, faces the greatest profit reduction risk.

Samsung Electronics (Samsung): The empire’s counterattack and capacity advantage

Position

Comprehensive solution provider, a capacity giant.

Advantages

Turnkey HBM4: Offers “memory + logic foundry + packaging” as a one-stop service, highly attractive to customers like Google and Amazon who develop their own chips.

Direct benefit from G3.5: As the world’s largest NAND manufacturer, possesses the strongest enterprise SSD and CXL memory (PBSSD) supply capabilities, capable of meeting both HBM and massive storage demands simultaneously.

Disadvantages

HMB technology started later, needing to rebuild customer confidence in the Rubin generation; NAND, although abundant, has less pricing power compared to HBM.

Micron Technology (Micron): Efficiency and geopolitical benefits

Position

U.S. sovereign AI first choice, driven by HBM+NAND dual wheels.

Advantages

Dual benefits: The only U.S. manufacturer with both HBM3E/4 capacity and advanced enterprise SSDs, able to capitalize on both Rubin GPU memory and ICMS storage layer benefits.

Energy efficiency leader: HBM products claim to be 30% more energy-efficient than competitors, aligning with AI data centers’ extreme TCO requirements.

Geopolitical advantage: As the only domestic U.S. manufacturer, the top choice for North American sovereign AI cloud.

Disadvantages

Smaller total capacity compared to Korean giants, relying on technological premiums to maintain high margins, unable to compete on price.

SanDisk: Re-evaluating value from storage to computation

Position

The biggest pure beneficiary of the G3.5 layer, transforming into an AI infrastructure stock.

Advantages

Purest G3.5 concept stock: The 1,152TB NAND demand per Rubin system is incremental for SanDisk. Its Stargate enterprise SSD has been certified by major clients.

Business transformation: After spinning off from Western Digital, its strategy shifted entirely to data centers (revenue up 26% annually), shedding consumer-grade baggage.

Pricing explosive potential: Under supply shortages, enterprise NAND prices could double again, giving SanDisk high profit elasticity.

Disadvantages

Lacks its own wafer fabs, operates on a fabless model, dependent on foundries, with weaker capacity locking ability compared to IDM manufacturers.

2026 Outlook: Establishing a Memory Seller’s Market

Nomura and Citi both predict a severe supply-demand imbalance in 2026. DRAM revenue is expected to grow 51% annually, and NAND wafer contract prices could double. Due to cleanroom shortages and HBM consumption (HBM consumes three times more wafers than DRAM), supply tightness will persist until mid-2027. In this $10 trillion industry modernization wave, the emergence of Vera Rubin and ICMS platforms elevates memory manufacturers from supporting roles to leading players.

Looking ahead to 2026–2028, the memory seller’s market will be driven not only by limited HBM expansion and ICMS pressure on enterprise SSDs but also by a potential acceleration: the commercialization timeline of HBF (stacked high-bandwidth NAND flash) may move forward. Recent consensus in academia and industry suggests that because HBF can partially reuse stacking and packaging foundations from the HBM era, its adoption pace could be faster than HBM, entering a major acceleration phase around 2027.

STG17,39%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)