Nvidia GTC 2026: Cloud AI trends and infrastructure insights

24 March 2026

8 min Reading Time

$68 billion in quarterly revenue, a chip architecture with 336 billion transistors, and a server rack drawing as much power as 100 single-family homes. Nvidia’s GTC 2026 in San Jose didn’t just unveil new hardware – it reset the coordinates by which IT decision-makers plan data centers, calculate cloud budgets, and draft infrastructure roadmaps.

Jensen Huang spent three and a half hours on stage at the SAP Center. His core message: AI workloads are growing faster than the hardware can keep up. Nvidia’s answer is Vera Rubin – a platform designed to eclipse Blackwell. Alongside that comes the GPU cost pressure already forcing IT teams today to justify every compute minute. The question is no longer whether Nvidia dominates – but what that dominance means concretely for European cloud strategies.

TL;DR

Vera Rubin delivers 50 petaflops per chip – the equivalent of five times the inference performance of Blackwell. A single NVL72 rack achieves 3.6 exaflops (Nvidia Newsroom, March 2026).
120 kW per rack mandates liquid cooling – existing data centers cannot operate Blackwell racks without major retrofitting. Air cooling alone is no longer sufficient.
Deutsche Telekom is building Europe’s largest AI factory – 10,000 Blackwell GPUs in Munich, operational in Q1 2026, delivering a 50% increase in Germany’s AI compute capacity (Deutsche Telekom press release).
$20 billion Groq deal – Nvidia licenses the startup’s inference chip technology and integrates its leadership team (CNBC, December 2025).
AMD has reached 80-90% CUDA parity – the competition is intensifying, yet migration remains complex. Multi-vendor strategies are becoming standard.

Vera Rubin: Five Times Faster Than Blackwell

The Vera Rubin platform is Nvidia’s response to the exponential surge in demand for AI inference performance. The Rubin GPU chip contains 336 billion transistors – 1.6× more than its Blackwell predecessor. It leverages HBM4 memory and delivers 22 terabytes per second of bandwidth per GPU.

The underlying Vera CPU features 88 ARM v9.2 cores and communicates with the GPU via NVLink-C2C at 1.8 terabytes per second. Together, they form a fully integrated system delivering 50 petaflops in NVFP4 inference mode.

At rack scale, the numbers become staggering. The Vera Rubin NVL72 – a system comprising 72 Rubin GPUs and 36 Vera CPUs – reaches 3.6 exaflops in FP4 mode. For context: That exceeds the total computing power of the world’s fastest supercomputers three years ago.

TRANSISTORS

336 Mrd.

Rubin GPU – 1.6x more than Blackwell

INFERENCE PERFORMANCE

50 PFLOPS

5x faster than GB200

RACK PERFORMANCE

3,6 ExaFLOPS

Vera Rubin NVL72 (72 GPUs + 36 CPUs)

Source: Nvidia Newsroom, March 2026

Jensen Huang also announced Vera Rubin Ultra – codenamed “Kyber” – slated for 2027. Next on the roadmap is Feynman. The cadence is clear: Nvidia delivers a new architecture every year, not every two years as previously customary.

“Orders for Blackwell and Vera Rubin will reach one trillion dollars through 2027.”

– Jensen Huang, CEO Nvidia, GTC 2026 keynote, paraphrased (CNBC, March 16, 2026)

Blackwell Ultra: What’s Already Running at Hyperscalers

While Vera Rubin remains a promise, the Blackwell generation has already arrived in data centers. The B300 – also known as Blackwell Ultra – delivers 15 petaflops in dense FP4 mode, ships with 288 GB of HBM3e memory, and carries a 1,400-watt thermal design power (TDP).

Google Cloud already offers A4 and A4X instances powered by B200 and GB200 chips as Generally Available. AWS has launched EC2 G7e instances with Blackwell GPUs in US East – and signed a deal for over one million Nvidia GPUs by 2027, confirmed by Ian Buck, VP Hyperscale at Nvidia (Reuters, March 2026). Microsoft Azure and Oracle Cloud have likewise announced Blackwell-based systems.

What Blackwell delivers in practice: According to Nvidia-supported benchmarks from SemiAnalysis, the GB200 NVL72 system delivers ten times more tokens per watt than the prior Hopper generation. That translates to one-tenth the cost per token for inference workloads. The upcoming GB300 NVL72 promises another 1.5× efficiency gain – infrastructure teams currently booking Hopper instances will face radically different unit economics within twelve months.

One important caveat: These benchmark figures come from tests co-funded by Nvidia. Independent comparisons in production environments remain pending. The trend direction is clear – but exact savings depend heavily on the specific workload.

120 Kilowatts Per Rack: The Infrastructure Question No One Wants to Ask

This is where things get uncomfortable for IT leaders. A single GB200 NVL72 rack draws 120-132 kilowatts of sustained power, including 115 kW for liquid cooling and 17 kW for air cooling in the HPE configuration. By comparison, an H100 rack consumed 10-15 kW – a factor of eight to ten.

One hundred such racks require 12 megawatts – equivalent to the electricity consumption of 10,000 households. Existing data centers cannot support this density without major upgrades. Liquid cooling becomes mandatory. Power grid connections become the bottleneck – operators of large AI clusters often wait three to five years for grid capacity.

120 kW

Power draw of a single GB200 NVL72 rack – eight times higher than an H100 rack

Source: Nvidia GB200 NVL72 specifications / Sunbird DCIM

Nvidia argues on the basis of efficiency per token: Ten times less energy per processed token than the previous generation. That’s true – but only if total capacity doesn’t scale proportionally. If enterprises simultaneously run more models across more GPUs, absolute consumption still rises.

For European IT decision-makers, this means: Anyone planning to run AI workloads on-premises or in colocation over the next two years must resolve physical infrastructure questions now. Power contracts, cooling architecture, and grid connectivity – not GPU availability – are the new bottlenecks.

Deutsche Telekom: 10,000 Blackwell GPUs in Munich

Deutsche Telekom, in partnership with Nvidia, has announced the Industrial AI Cloud – described in its press release as one of Europe’s largest AI factories. Location: Munich. Equipped with more than 1,000 DGX B200 systems and RTX PRO servers, totaling approximately 10,000 Nvidia Blackwell GPUs.

Operations are scheduled to begin in Q1 2026. If timelines hold, this will boost Germany’s AI compute capacity by roughly 50 percent. Target customers are German enterprises seeking to train AI models using their own data – on European servers, under European law.

This isn’t an isolated initiative. At GTC Paris 2025, Nvidia announced strategic partnerships with France, Germany, the UK, Italy, and Spain. Plans call for 20 AI factories across Europe, five of them at gigafactory scale. Collectively, they aim to deliver more than 3,000 exaflops of Nvidia Blackwell compute power for European sovereign-AI initiatives.

For IT teams in the DACH region, this becomes concrete: Those who previously booked GPU capacity from U.S. hyperscalers – and worry about data sovereignty – now have an alternative with the Telekom Cloud, designed to comply with GDPR and the EU AI Act. The open question is whether pricing and availability can match AWS and Google.

DGX Spark: The $4,699 AI Computer

Alongside rack-scale systems, Nvidia unveiled two desktop products designed to bring AI infrastructure from the data center to the desk.

The DGX Spark, priced at $4,699, is built around the GB10 Grace Blackwell Superchip. It offers 128 GB of unified memory, delivers one petaflop in FP4 mode, and can locally execute models with up to 200 billion parameters. Up to four Spark units can be combined into a desktop cluster.

The DGX Station goes further: GB300 chip, 784 GB of coherent memory, 20 petaflops FP4. This enables local execution of trillion-parameter models – without any cloud connection. Manufacturers including Dell, HP, and MSI will offer the Station starting in early 2026.

Who benefits? Organizations unable – or unwilling – to send sensitive data to the cloud. Research teams, security departments, compliance-driven industries. The DGX Spark makes local AI inference an investment that fits a departmental budget – not a capital expenditure plan.

At GTC, Jensen Huang explicitly drew the comparison: A $4,699 DGX Spark replaces, for many use cases, a monthly cloud contract costing several thousand dollars. That math works for mid-market firms – especially teams regularly working with large language models who reject cloud latency. Yet maintenance remains an open question: Who operates the local AI computer? Who updates the models? Who monitors utilization? That infrastructure work previously vanished inside the cloud bill.

Groq Deal: $20 Billion for Inference Chips

In December 2025, Nvidia closed its largest deal to date: For approximately $20 billion, the company licensed Groq’s technology and absorbed its leadership team. Crucially: Nvidia is not acquiring Groq as a company – this is an IP and talent acquisition. Groq continues operating independently under new CEO Simon Edwards.

Groq’s Language Processing Units (LPUs) are chips optimized specifically for AI inference. They process tokens significantly faster than GPUs – a domain where Nvidia’s market share (60-75%) lags well behind its training dominance (>90%).

Jensen Huang stated it plainly: “While we are adding talented employees to our ranks and licensing Groq’s IP, we are not acquiring Groq as a company.” The Groq-3 LPU unveiled at GTC 2026 signals Nvidia’s intent: It won’t serve the inference market solely with GPUs – but will augment them with specialized accelerators.

CUDA vs. ROCm: Is Competition Heating Up?

Nvidia holds roughly 80% of the AI accelerator market. Its moat isn’t hardware – it’s CUDA. This software ecosystem has existed for 17 years and boasts over four million registered developers.

But AMD is catching up. The MI300X offers 192 GB of HBM3 memory – 2.4× more than the H100 – at 30-50% lower price. According to SemiAnalysis, ROCm 7 achieves 80-90% CUDA parity. The MI350 is expected in H2 2025 and promises 35× the inference performance of the MI300 series.

Enterprise reality: Full migration away from CUDA is rare. What’s emerging instead are multi-vendor strategies. AMD GPUs for cost-optimized inference; Nvidia for training and complex workloads. Any organization planning cloud infrastructure today should evaluate both options – not out of idealism, but pure cost calculus.

“Every SaaS company will become an Agent-as-a-Service company.”

– Jensen Huang, GTC 2026 keynote, paraphrased (TechRadar/MSN Liveblog, March 16, 2026)

China Export Dispute: The Geopolitical Dimension

Parallel to its technical offensive, a political tug-of-war is unfolding in Washington over Nvidia chip exports to China. The short version: The Trump administration has permitted H200 sales to approved Chinese customers under strict conditions – capped at 50% of U.S. domestic volume and verified by a U.S.-controlled third-party lab.

The U.S. Senate is pushing back. Senators Elizabeth Warren and Jim Banks introduced a bipartisan bill demanding the suspension of all Nvidia export licenses to China. The House Foreign Affairs Committee is drafting legislation featuring a 30-day review window and a two-year Blackwell export ban.

For European cloud strategies, this matters: If China vanishes – or shrinks – as a market, Nvidia’s focus shifts more decisively toward Western markets, especially Europe. Sovereign-AI initiatives and the Telekom partnership must be read against this geopolitical backdrop.

Market Forecast: $2.5 Trillion in AI Spending in 2026

Gartner’s numbers contextualize what GTC announcements mean globally. Worldwide AI spending is projected to hit $2.52 trillion in 2026, up 44% from 2025. More than half flows into infrastructure: roughly $1.37 trillion for servers, networking, cooling, and power delivery (Gartner, January 2026).

Most striking: AI-optimized Infrastructure-as-a-Service – i.e., cloud GPU capacity – is forecast to double from $18.3 billion in 2025 to $37.5 billion in 2026, representing 105% growth. No other cloud segment is expanding remotely this fast.

Simultaneously, Gartner places AI in the “Trough of Disillusionment” for 2026 – the phase in the Hype Cycle where pilot projects fail against reality and enterprises demand practical ROI evidence rather than vision decks. Translation: Investment continues rising – but expectations for measurable outcomes rise in lockstep. For IT budget holders, that’s good news: Investments in GPU infrastructure will be judged on concrete business cases – not hype.

Nvidia’s Q4 earnings report reinforces the trend: $68.1 billion in revenue, of which $62.3 billion came from the datacenter segment – a 75% year-on-year increase. For Q1 of fiscal year 2027, Nvidia forecasts $78 billion. The company is on track to become the first firm generating $300 billion annually solely from datacenter hardware (Nvidia Earnings, February 2026).

What IT Decision-Makers Should Do Now

GTC 2026 delivered a clear message: AI infrastructure is becoming more powerful, more energy-intensive, and more expensive at the physical layer – yet cheaper per processed token. For IT teams, this creates concrete action items.

First: Accelerate energy planning. Anyone planning to deploy Blackwell or Rubin hardware on-premises within the next 18 months needs liquid cooling and power delivery exceeding 100 kW per rack. This is an infrastructure project – not a procurement exercise.

Second: Evaluate multi-vendor options. AMD’s MI300X and MI350 are no longer novelties. For inference workloads with well-defined models, ROCm 7 can work – delivering a 30-50% price advantage. Recommendation: Launch an AMD pilot alongside your Nvidia stack.

Third: Assess sovereign-cloud alternatives. Deutsche Telekom’s Industrial AI Cloud and similar European offerings make local AI processing economically viable for compliance-driven sectors – for the first time. Request competitive quotes before signing your next cloud contract.

Fourth: Extend FinOps to GPU costs. GPU instances often account for 70-80% of cloud bills for AI workloads. Failing to track and optimize them separately means overlooking the largest cost block.

Frequently Asked Questions

What’s the difference between Blackwell and Vera Rubin?

Blackwell is Nvidia’s current GPU generation, available at hyperscalers since 2025. Vera Rubin is its successor platform – featuring 336 billion transistors, HBM4 memory, and five times the inference performance. Vera Rubin is scheduled for availability in the second half of 2026.

How much does a GB200 NVL72 system cost?

Nvidia does not publish an official list price. Cloud providers like Corvex offer GB200 NVL72 capacity starting at approximately $4.49 per hour. A full on-premises system is estimated to cost in the low single-digit millions.

Do I need liquid cooling for Blackwell GPUs?

Yes. A GB200 NVL72 rack draws 120-132 kW. Pure air cooling cannot handle this power density. On-premises Blackwell deployment requires investment in liquid cooling infrastructure.

Is AMD’s MI300X a real alternative to Nvidia?

Yes – for certain inference workloads. AMD offers 192 GB of HBM3 memory at 30-50% lower cost. ROCm 7 achieves 80-90% CUDA parity. For training complex models, Nvidia remains the default choice – for now.

What is Nvidia’s Groq deal?

Nvidia licensed Groq’s inference chip technology and leadership team for approximately $20 billion. Groq continues operating as an independent company. The deal strengthens Nvidia’s position in specialized inference accelerators.

What does the Deutsche Telekom Industrial AI Cloud offer?

Telekom operates ~10,000 Nvidia Blackwell GPUs in Munich as a cloud service. The platform targets German enterprises wanting to train AI models in GDPR-compliant fashion on European servers – without sending data to U.S. hyperscalers.

When will Vera Rubin launch?

Nvidia announced that Rubin-based systems will be available at major cloud providers in the second half of 2026. The Ultra variant, Kyber, is planned for 2027.

Editor’s Reading Recommendations

More from the MBF Media Network

Header Image Source: Pexels / Tara Winstead (px:8386440)

Also available in

Français Español Deutsch

Nvidia GTC 2026: What Vera Rubin, Groq, and 120-kW Racks Mean for Cloud Infrastructure