AMD’s AI Strategy: Open Ecosystem, Scalable Hardware, and Developer-Centric Innovation

Executive Summary

In her keynote at the AMD Advancing AI 2025 event on June 12th, 2025, CEO Dr. Lisa Su outlined a comprehensive vision for AMD’s role in the rapidly evolving AI landscape. The presentation emphasized three core strategic pillars:

  1. A broad, heterogeneous compute portfolio spanning CPUs, GPUs, FPGAs, DPUs, and adaptive SoCs, each targeting specific AI workload characteristics.
  2. An open, developer-first ecosystem, centered around ROCm and integration with popular frameworks like PyTorch, vLLM, and SGLang—a domain-specific language optimized for AI workloads.
  3. Full-stack solutions enabling scalable distributed inference, training, and deployment across edge, cloud, and enterprise environments.

The central thesis is that no single architecture can dominate all AI workloads. Instead, success depends on matching the right compute engine to the use case—while ensuring openness, performance, and interoperability across hardware and software layers.


Three Critical Takeaways

1. ROCm 7: A Maturing Open Software Stack for AI Workloads

Technical Explanation

ROCm 7 represents a significant advancement in performance and usability, particularly targeting inference and training workloads. Key features include:

  • Optimized support for vLLM and SGLang, accelerating large language model (LLM) serving.
  • Implementation of flashAttentionV3, enhancing memory efficiency during attention computations.
  • Improved Pythonic kernel authoring tools and a robust communications stack for distributed systems.
  • Up to 3.5x generation-over-generation performance gains in LLMs such as DeepSeek and Llama 4 Maverick, under mixed precision modes.

Critical Assessment

While NVIDIA’s CUDA remains dominant in GPU computing, AMD’s open, standards-based approach is gaining traction. The reported 40% better token-per-dollar ratio versus closed ecosystems suggests meaningful economic advantages for cloud providers.

However, adoption challenges persist:

  • Ecosystem maturity: ROCm supports major frameworks, but tooling, community resources, and third-party integrations remain less extensive than CUDA’s mature ecosystem.
  • Developer inertia: Porting CUDA-optimized codebases requires significant effort, compounded by a lack of seamless abstraction layers comparable to CUDA Graphs or Nsight tooling.

Competitive/Strategic Context

FeatureAMD ROCm 7NVIDIA CUDA
LicensingFully open sourceProprietary
Framework SupportPyTorch, TensorFlow, vLLM, SGLangNative, highly optimized
PerformanceUp to 4.2x gen-on-gen improvementIndustry standard, mature optimizations
Community ToolsGrowing, less matureExtensive profiling, debugging, and optimization tools

Quantitative Support

  • Llama 4 Maverick: Achieves three times the tokens per second compared to its prior generation.
  • MI355 GPUs: Deliver up to 40% more tokens per dollar than comparable solutions such as NVIDIA’s A100.

2. Ultra Accelerator Link (UALink): Scaling Beyond Rack-Level AI Systems

Technical Explanation

UALink is an open interconnect protocol designed to scale AI systems beyond traditional rack-level limitations. It:

  • Supports up to 1,000 coherent GPU nodes.
  • Utilizes Ethernet-compatible physical interfaces, enabling cost-effective and widely compatible deployment.
  • Incorporates pod partitioning, network collectives, and resiliency features.
  • Targets both training and distributed inference workloads.

The specification was released by the Ultra Accelerator Link Consortium, which includes major hyperscalers and system integrators.

Critical Assessment

UALink addresses a critical limitation in current AI infrastructure: efficiently scaling beyond tightly coupled racks. Using standardized Ethernet-like signaling promises lower costs and easier integration.

Potential concerns include:

  • Adoption velocity: NVLink and CXL are already entrenched in many leading data centers, posing challenges to UALink’s market penetration.
  • Performance parity: Independent benchmarks and ecosystem maturity are not yet publicly available.

Competitive/Strategic Context

InterconnectVendor Lock-inScalabilityBandwidthOpenness
NVLinkYesLimited (~8 GPUs)Very highClosed
CXLNo (industry-wide)ModerateHighSemi-open
UALinkNoUp to 1000+ GPUsHighFully open

Quantitative Support

  • Latency reduction: Promises measurable improvements in collective communication primitives crucial for distributed training.
  • Scalability: Designed to scale from small enterprise clusters to gigawatt-scale hyperscale data centers.

3. Agentic AI and the Need for Heterogeneous Compute Orchestration

Technical Explanation

AMD showcased its readiness to support agentic AI, where multiple autonomous agents collaborate to solve complex tasks. This requires:

  • Flexible orchestration between CPUs and GPUs.
  • Efficient memory management for models with billions of parameters.
  • Low-latency interconnects (e.g., UALink) to coordinate agents.
  • Integration with OpenRack infrastructure for modular, scalable deployment.

AMD’s Helios platform, expected in 2026, combines high memory bandwidth, fast interconnects, and OCP compliance to meet these demands.

Critical Assessment

Agentic AI is an emerging frontier that significantly increases architectural complexity. AMD’s heterogeneous compute approach, coupled with open standards, positions it well for this future.

Key challenges include:

  • Software maturity: Coordinating multiple agents across CPUs and GPUs remains an active research area with limited production-ready tooling.
  • Workload portability: Robust abstraction layers and middleware will be essential to support diverse hardware configurations and agent workflows.

Competitive/Strategic Context

ArchitectureFocusStrengthsWeaknesses
NVIDIA DGXHomogeneous GPU clustersMature toolchain, high throughputLimited CPU/GPU balance
AMD HeliosHeterogeneous, agentic AIBalanced CPU/GPU, open standardsEarly lifecycle, ecosystem still forming
Intel GaudiTraining-centric, Ethernet fabricCost-efficient, good MLPerf scoresLess focus on inference and agentic workloads

Quantitative Support

  • Helios offers leading memory capacity, bandwidth, and interconnect speeds.
  • Designed for frontier models, enabling inference scaling across thousands of nodes.

Final Thoughts: AMD’s Path Forward in AI

Dr. Lisa Su’s keynote reaffirmed AMD’s positioning not merely as a hardware vendor but as a platform architect for the AI era. Its strengths lie in embracing heterogeneity, openness, and full-stack engineering—principles deeply aligned with modern enterprise and cloud-native innovation.

However, challenges remain:

  • CUDA’s entrenched dominance remains a substantial barrier to AMD’s widespread adoption.
  • Real-world validation of new protocols like UALink at scale is still awaited.
  • Developer experience must continue to improve to attract and retain talent.

AMD’s openness bet could yield significant returns if it sustains momentum among developers and ecosystem partners. As the industry advances toward agentic AI, distributed inference, and hybrid architectures, AMD’s roadmap aligns well with the future trajectory of AI innovation.

Jensen Huang’s GTC Paris Keynote: A Technical Deep Dive

Executive Summary

At the GTC Paris Keynote during VivaTech 2025, on June 11th, 2025, NVIDIA CEO Jensen Huang presented a comprehensive and ambitious vision for the future of computing. The keynote emphasized the convergence of AI, accelerated computing, and quantum-classical hybrid systems. Central to this vision is the Grace Blackwell architecture, a revolutionary datacenter-scale GPU design optimized for agentic AI workloads demanding massive compute throughput and efficiency.

NVIDIA is repositioning itself beyond a GPU vendor, as a key infrastructure enabler of the next industrial revolution driven by AI agents, digital twins, and embodied intelligence such as robotics. Huang also unveiled CUDA-Q, a platform bridging classical and quantum computing, signaling NVIDIA’s strategic move into the post-Moore’s Law era.

The keynote was structured around three core technical pillars:

  1. Grace Blackwell Architecture: A new breed of GPU designed to power complex agentic AI.
  2. CUDA-Q and Quantum-Classical Computing: A framework to unify classical GPUs and quantum processors.
  3. Industrial AI and Robotics: Leveraging simulation-driven training through Omniverse to scale AI in physical systems.

1. Grace Blackwell: A Thinking Machine for Agentic AI

Technical Explanation

Grace Blackwell is a radical rethinking of datacenter GPU design. It is a single virtualized GPU composed of 72 interconnected packages (144 GPUs) linked by NVLink 7.0, offering 130 TB/s of aggregate bandwidth—surpassing global internet backbone speeds. This scale is critical to support multi-step, agentic AI workflows, where a single prompt triggers thousands of tokens generated via recursive reasoning, planning, and external tool use.

Key innovations include:

  • NVLink Spine: A copper coax backplane connecting packages with ultra-low latency.
  • Integrated CPUs connected directly to GPUs, eliminating PCIe bottlenecks.
  • Liquid cooling system capable of handling rack-level power densities up to 120kW.

Critical Comments & Suggestions

  • Latency and coherence management: Maintaining cache coherency at this scale is non-trivial. You should probe NVIDIA’s solutions for minimizing coherence delays and packet loss. Latency sensitivity can significantly impact AI model performance, especially for reasoning pipelines with iterative token generation.
  • Thermal management risks: Liquid cooling at datacenter scale remains unproven in operational reliability and maintainability. Investigate contingency plans for cooling failures and maintenance overhead—critical for data center uptime guarantees.
  • Software stack maturity: The promised 40x performance gain hinges on runtime and compiler optimizations (Dynamo, cuTensor). Be skeptical until real-world workloads demonstrate these gains under production conditions.
  • Competitive landscape: While AMD and Google have strong offerings, NVIDIA’s focus on scale and bandwidth could be decisive for agentic AI. Your evaluation should include real-world benchmarks once available.

2. CUDA-Q: Quantum-Classical Acceleration

Technical Explanation

CUDA-Q extends NVIDIA’s CUDA programming model to hybrid quantum-classical workflows. It integrates cuQuantum to accelerate quantum circuit simulations on GPUs, while preparing for execution on actual quantum processors (QPUs) once they mature.

Key features:

  • Tensor network contraction acceleration for simulating quantum states.
  • Hybrid execution model enabling programs that partly run on GPUs and partly on QPUs.
  • GPU-accelerated quantum error correction loops, critical for near-term noisy quantum devices.

Critical Comments & Suggestions

  • Simulated vs. real quantum advantage: While GPU acceleration boosts quantum simulation speed, this is not a substitute for genuine quantum hardware breakthroughs. Carefully evaluate CUDA-Q’s value proposition for near-term R&D versus long-term quantum computing scalability.
  • Hardware dependency: The practical impact of CUDA-Q depends heavily on stable, scalable QPUs, which remain under development. Keep tabs on quantum hardware progress to assess when CUDA-Q’s hybrid model becomes commercially viable.
  • API complexity and abstraction: Extending CUDA semantics to quantum workflows risks developer confusion and integration issues. Recommend a close examination of SDK usability and developer adoption metrics.
  • Competitive analysis: IBM Qiskit and Microsoft Azure Quantum offer mature hybrid frameworks but lack GPU acceleration layers, positioning CUDA-Q uniquely for hardware-accelerated quantum simulation.

3. Industrial AI and Robotics: Omniverse as a Training Ground

Technical Explanation

NVIDIA’s Omniverse platform aims to revolutionize robotic AI by providing physically accurate, photorealistic simulations where robots train using large vision-language-action transformer models. The simulation-to-reality transfer approach uses:

  • 100,000 unique simulated environments per robot to build robust policies.
  • Transformer-based motor controllers embedded in the Thor DevKit robot computer.
  • Policy distillation and reinforcement learning frameworks to accelerate deployment.

Critical Comments & Suggestions

  • Domain gap challenge: Simulation fidelity remains an open problem. Real-world deployment risks failure due to edge cases missing in simulations. Continuous validation with physical trials is indispensable.
  • Compute resource demands: Exascale computing may be required for training humanoid or dexterous robot behaviors. Evaluate infrastructure investment and cost-efficiency tradeoffs.
  • Toolchain maturity: Developer ecosystems around Omniverse AI training are still emerging. Consider ecosystem maturity before committing large projects.
  • Competitive context: Google’s RT-2 and Meta’s LlamaBot pursue alternative real-world data-driven approaches. Omniverse’s simulation focus is differentiated but complementary.

Conclusion

Jensen Huang’s GTC Paris keynote sketches a bold and integrated vision of future computing, anchored in scalable AI reasoning, quantum-classical hybridization, and embodied intelligence.

  • The Grace Blackwell architecture pushes datacenter GPU design to new extremes, promising unparalleled performance for agentic AI but requiring validation of cooling, latency, and software orchestration challenges.
  • CUDA-Q strategically positions NVIDIA in the nascent quantum-classical frontier but depends heavily on quantum hardware progress and developer adoption.
  • The Omniverse robotics strategy aligns with academic advances but needs to bridge simulation and reality gaps and build mature developer ecosystems.

For CTOs and system architects, the imperative is clear: infrastructure planning must anticipate AI-driven workloads at unprecedented scales and heterogeneity. The boundary between classical, quantum, and embodied computation is blurring rapidly.


My Final Recommendations for Your Strategic Focus

  1. Follow up with NVIDIA’s developer releases and early benchmarks on Grace Blackwell to validate claims and integration complexity.
  2. Monitor CUDA-Q’s ecosystem growth and partnerships—quantum hardware readiness will determine near-term relevance.
  3. Pilot simulation-driven robotic AI in controlled environments, measuring domain gap impacts and training costs carefully.
  4. Build expertise around hybrid computing workflows, preparing your teams for managing multi-architecture pipelines.

Thales Bets on Open Source Silicon for Sovereignty and Safety-Critical Systems

Executive Summary

Bernhard Quendt, CTO of Thales Group, delivered a compelling presentation at RISC-V Summit Europe 2025 on May 28th, 2025 on the strategic adoption of open-source hardware (OSH), particularly RISC-V and the CVA6 core, to build sovereign and reliable supply chains in safety- and mission-critical domains. The talk emphasized how tightening geopolitical controls—export restrictions from both U.S.-aligned and China-aligned blocs—are accelerating the need to decouple from proprietary IP.

Quendt highlighted three technical thrusts in this initiative: an open-source-based spaceborne computing platform based on CVA6, a compact industrial-grade CVA6-based microcontroller (CVA62) for embedded systems, and a forthcoming CVI64 core with MMU support for secure general-purpose OSes.

Thales is not adopting OSH merely as a cost-cutting measure. Rather, it views open hardware as foundational—alongside AI acceleration, quantum computing, and secure communications—for enabling digital sovereignty, reducing integration costs, and maintaining complete control over high-assurance system architectures.


Three Critical Takeaways

1. CVA6-Based Spaceborne Computing Platform

Technical Overview

Thales Alenia Space has developed a modular onboard computer based on the CVA6 64-bit RISC-V core. This system incorporates secure open-source root-of-trust blocks and vector accelerators. The platform supports mixed-criticality software and is tailored for the unique reliability and certification needs of space environments.

The modularity of the platform allows faster design iteration and decoupling of hardware/software verification cycles—critical benefits in aerospace development.

Assessment

The strategy is forward-leaning but not without risk. Toolchains and verification flows for open-source processors remain less mature than those in the Arm or PowerPC ecosystem. Furthermore, CVA6 is not yet hardened against radiation effects (e.g., single event upsets or total ionizing dose), which poses challenges for LEO and deep-space applications.

Thales likely mitigates this through board-level fault tolerance and selective redundancy, though such architectural decisions were not disclosed.

Market Context

This approach diverges from legacy reliance on processors like LEON3 (SPARCv8) or PowerPC e500/e6500, which are radiation-tolerant and supported by ESA/NASA toolchains. The open RISC-V path offers increased configurability and transparency at the expense of hardened IP availability and TRL maturity.

Quantitative Support

While specific metrics were not shared, RISC-V-based radiation-tolerant designs typically aim for performance in the 100–500 DMIPS range. Proprietary IP licenses for space-qualified cores can exceed $1–2 million per program, underscoring the potential cost advantage of open-source silicon.


2. CVA62: Low-Area, Safety-Ready Microcontroller

Technical Overview

Thales introduced CVA62, a 32-bit microcontroller derivative of CVA6, targeting embedded systems and industrial IoT. CVA62 is designed on TSMC 5nm and adheres to ISO 26262 safety principles, aiming for ASIL-B/D applicability. Its RTL is formally verified and publicly auditable.

It supports the RV32IMAC instruction set, features a configurable pipeline depth, and prioritizes area and power efficiency. Its release aligns with growing demand for safety-certifiable open cores.

Assessment

A formally verified open-source MCU with ISO 26262 alignment is a strong differentiator—especially for defense, automotive, and infrastructure markets. However, achieving full ASIL-D certification also depends on qualified toolchains, documented failure modes, and compliance artifacts. The current RISC-V ecosystem has yet to meet these rigorously.

Still, the availability of a verified baseline—combined with collaboration-friendly licensing—could enable safety qualification through industry-specific efforts.

Competitive Context

CVA62 competes with Cortex-M7 and SiFive E31/E51 in the deterministic MCU space. While Arm cores offer rich toolchains and pre-certified software stacks, CVA62 provides transparency and configurability, with the tradeoff of less polished ecosystem support.

FeatureCVA62Cortex-M7
ISARISC-V (RV32IMAC)Armv7E-M
PipelineConfigurableFixed 6-stage
MMU SupportNoNo
Open SourceYesNo
ISO 26262 AlignmentPlannedAvailable (via toolchain vendors)
Target ProcessTSMC 5nm40nm–65nm typical

Quantitative Support

Public benchmarks for RV32-class cores show CVA62 class devices achieving 1.5–2.0 CoreMark/MHz depending on configuration. Power efficiency data is pending silicon tape-out but is expected to improve over larger legacy MCUs due to 5nm geometry.


3. CVI64: MMU-Enabled RISC-V Application Core

Technical Overview

Thales is collaborating on CVI64, a 64-bit RISC-V core with memory management unit (MMU) support and a clean-slate deterministic design philosophy. The first silicon is targeted for Technology Readiness Level 5 (component validation in relevant environment) by Q3 2025.

CVI64 is intended to support real-time Linux and deterministic hypervisors, with applications in avionics, defense systems, and certified industrial platforms.

Assessment

Adding MMU support unlocks Linux-class workloads—but increases architectural complexity. Issues like page table walk determinism, cache coherence, and privilege transitions must be tightly constrained in safety contexts. Out-of-order execution, if implemented, would further complicate timing analysis.

Early ecosystem maturity will likely lag that of SiFive U-series or Arm Cortex-A cores, but CVI64 may find niche adoption where auditability and customization trump software availability.

Competitive Context

CVI64 enters a field occupied by SiFive S7/S9, Andes AX45, and Arm Cortex-A53/A55. Unlike these, CVI64 will be fully open and verifiable. This suits users requiring full-stack trust anchors—from silicon up to operating system.

FeatureCVI64SiFive S7Cortex-A53
ISARV64GCRV64GCArmv8-A
MMUYesYesYes
Execution ModelIn-order (planned)In-orderOut-of-order
Target FrequencyTBD (~1 GHz class)1.5–2.0 GHz1.2–1.5 GHz
Open SourceYes (100%)PartialNo

Quantitative Support

SiFive U84-based SoCs have reached 1.5 GHz on 7nm. CVI64 will likely debut at lower performance (~800–1000 MHz) due to early-phase optimizations and tighter deterministic design goals.


Final Thoughts

Thales’s adoption of open-source silicon reflects a strategic shift across defense and aerospace sectors. OSH enables sovereignty, customization, and long-term maintenance independence—critical in an era of increasingly politicized semiconductors.

Yet major challenges persist: toolchain immaturity, limited availability of safety-certifiable flows, and uncertain community governance. Organizations pursuing this path should adopt a phased integration model—deploying OSH first in non-critical components while building verification and integration expertise in parallel.

Significant investment will be required in:

  • Formal verification frameworks (e.g., SymbiYosys, Boolector, Tortuga Agilis)
  • Mixed-language simulation environments (e.g., Verilator, Cocotb)
  • Cross-industry ecosystem building and long-term funding models

Thales is making a long-term bet on auditability and openness in silicon. If the RISC-V ecosystem can deliver the tooling and robustness demanded by regulated industries, it could catalyze a new wave of mission-grade open architectures. The opportunity is real—but so is the engineering burden.

AMD at COMPUTEX 2025: Pushing the Boundaries of Compute

At COMPUTEX 2025 on May 21st, 2025, AMD’s Jack Huynh—Senior VP and GM of the Computing and Graphics Group—unveiled a product vision anchored in one central idea: small is powerful. This year’s keynote revolved around the shift from centralized computing to decentralized intelligence—AI PCs, edge inference, and workstations that rival cloud performance.

AMD’s announcements spanned three domains:

  • Gaming: FSR Redstone and Radeon RX 9060 XT bring path-traced visuals and AI rendering to the mid-range.
  • AI PCs: Ryzen AI 300 Series delivers up to 34 TOPS of local inferencing power.
  • Workstations: Threadripper PRO 9000 and Radeon AI PRO R9700 target professional AI developers and compute-intensive industries.

Let’s unpack the technical and strategic highlights.


1. FSR Redstone: Machine Learning Meets Real-Time Path Tracing

The Technology

FSR Redstone is AMD’s most ambitious attempt yet to democratize path-traced rendering. It combines:

  • Neural Radiance Caching (NRC) for learned lighting estimations.
  • Ray Regeneration for efficient reuse of ray samples.
  • Machine Learning Super Resolution (MLSR) for intelligent upscaling.
  • Frame Generation to increase output FPS via temporal inference.

This hybrid ML pipeline enables real-time lighting effects—like dynamic GI, soft shadows, and volumetric fog—on GPUs without dedicated RT cores.

Why It Matters

By applying learned priors to ray-based reconstruction, Redstone achieves the appearance of path-traced realism while maintaining playable frame rates. This lowers the barrier for mid-range GPUs to deliver high-fidelity visuals.

Caveats

The ML approach, while efficient, is heavily scene-dependent. Generalization to procedurally generated content remains an open question. Visual artifacts can emerge in dynamic geometry, and upscaling introduces trade-offs in motion stability.

Competitive Lens

FeatureFSR RedstoneDLSS 3.5XeSS
Neural Rendering
Ray Regeneration⚠️ Partial
Open Source Availability✅ (via ROCm)⚠️ Partial
Specialized Hardware Req.✅ (Tensor Cores)

In essence: Redstone is AMD’s answer to DLSS—built on open standards, deployable without AI-specific silicon.


2. Ryzen AI 300 Series: On-Device Intelligence for the AI PC Era

The Technology

The new Ryzen AI 300 APUs feature a dedicated XDNA 2-based NPU delivering up to 34 TOPS (INT8). This enables local execution of:

  • Quantized LLMs (e.g., Llama 3 8B)
  • Real-time transcription and translation
  • Code assist and image editing
  • Visual search and contextual agents

The architecture distributes inference across CPU, GPU, and NPU with intelligent workload balancing.

Why It Matters

Local inferencing improves latency, preserves privacy, and reduces cloud dependencies. In regulated industries and latency-critical workflows, this is a step-function improvement.

Ecosystem Challenges

  • Quantized model availability is still thin.
  • ROCm integration into PyTorch/ONNX toolchains is ongoing.
  • AMD’s tooling for model optimization lacks the maturity of NVIDIA’s TensorRT or Apple’s CoreML.

Competitive Positioning

PlatformNPU TOPS (INT8)ArchitectureEcosystem OpennessPrimary OS
Ryzen AI 30034x86 + XDNA 2High (ROCm, ONNX)Windows, Linux
Apple M4~38ARM + CoreML NPULow (CoreML only)macOS, iOS
Snapdragon X~4.3ARM + Hexagon DSPMediumWindows, Android

Ryzen AI PCs position AMD as the open x86 alternative to Apple’s silicon dominance in local AI workflows.


3. Threadripper PRO 9000 & Radeon AI PRO R9700: Workstation-Class AI Development

The Technology

Threadripper PRO 9000 (“Shimada Peak”):

  • 96 Zen 5 cores / 192 threads
  • 8-channel DDR5 ECC memory, up to 4TB
  • 128 PCIe 5.0 lanes
  • AMD PRO Security (SEV-SNP, memory encryption)

Radeon AI PRO R9700:

  • 1,500+ TOPS (INT4)
  • 32GB GDDR6
  • ROCm-native backend for ONNX and PyTorch

This pairing provides a serious platform for AI fine-tuning, quantization, and even training of small LLMs.

Why It Matters

This workstation tier offers an escape hatch from expensive cloud runtimes. For developers, AI researchers, and enterprise teams, it enables:

  • Local, iterative model tuning
  • Predictable hardware costs
  • Privacy-first workflows (especially in defense, healthcare, and legal)

Trade-offs

ROCm continues to trail CUDA in terms of ecosystem depth and performance tuning. While AMD offers competitive raw throughput, software maturity—especially for frameworks like JAX or Triton—is still catching up.

Competitive Analysis

MetricTR PRO 9000 + R9700NVIDIA RTX 6000 Ada
CPU Cores96 (Zen 5)N/A
GPU AI Perf (INT4)~1,500 TOPS~1,700 TOPS
VRAM32GB GDDR648GB GDDR6 ECC
Ecosystem SupportROCm (moderate)CUDA (mature)
Distributed Training❌ (limited)✅ (via NVLink)
Local LLM Inference✅ (8B–13B)

AMD’s strength lies in performance-per-dollar and data locality. For small-to-mid-sized models, it offers near-cloud throughput on your desktop.


Final Thoughts: Decentralized Intelligence is the New Normal

COMPUTEX 2025 made one thing clear: the future of compute is not just faster—it’s closer. AMD’s platform strategy shifts the emphasis from scale to locality:

  • From cloud inferencing to on-device AI
  • From GPU farms to quantized workstations
  • From centralized render clusters to ML-accelerated game engines

With open software stacks, power-efficient inference, and maturing hardware, AMD positions itself as a viable counterweight to NVIDIA and Apple in the edge-AI era.

For engineering leaders and CTOs, this represents an inflection point. The question is no longer “When will AI arrive on the edge?” It’s already here. The next question is: What will you build with it?

Arm at COMPUTEX 2025: A Strategic Inflection Point for AI Everywhere

Executive Summary

Chris Bergey, Senior Vice President and General Manager of the Client Line of Business at ARM, delivered a keynote at COMPUTEX 2025 on May 20th, 2025 that framed the current era as a historic inflection point in computing—one where AI is no longer an idea but a force, reshaping everything from cloud infrastructure to edge devices. The presentation outlined ARM’s strategic positioning in this new landscape, emphasizing three core pillars: ubiquitous platform reach, world-leading performance-per-watt, and a powerful developer ecosystem.

Bergey argued that the exponential growth in AI workloads—both in scale and diversity—demands a fundamental rethinking of compute architecture. He positioned ARM not just as a CPU IP provider but as a full-stack platform company delivering optimized, scalable solutions from data centers to wearables. Key themes included the shift from training to inference, the rise of on-device AI, and the growing importance of power efficiency across all form factors.

The talk also featured panel discussions with Kevin Dearling (NVIDIA) and Adam King (MediaTek), offering perspectives on technical constraints, innovation vectors, and the role of partnerships in accelerating AI adoption.


Three Critical Takeaways

1. AI Inference Is Now the Economic Engine—Not Training

Technical Explanation

Bergey distinguished between the computational cost of model training vs. inference, highlighting that while training requires enormous flops (~10^25–10^26), inference—though less intensive (~10^14–10^15 per query)—scales with usage volume. For example, if each web search used a large language model, ten days’ worth of inference compute could equal one day of training compute.

This implies a shift in focus: monetization stems not from model creation, but from scalable deployment of efficient inference engines across mobile, wearable, and embedded platforms.

Critical Assessment

This framing aligns with current trends. While companies like NVIDIA continue optimizing training clusters, the greater opportunity lies in edge inference, where latency, power, and throughput are paramount. However, the keynote underplays the complexity of model compression, quantization, and hardware/software co-design, which are critical for deployment at scale.

ARM’s V9 architecture and Scalable Matrix Extensions (SME) are promising for accelerating AI workloads in the CPU pipeline, potentially reducing reliance on NPUs or GPUs—a differentiator in cost- and thermally-constrained environments.

Competitive/Strategic Context

  • x86 Alternatives: Intel and AMD dominate traditional markets but lag ARM in performance-per-watt. Apple’s M-series SoCs, based on ARM, demonstrate clear efficiency gains.
  • Custom Silicon: Hyperscalers like AWS (Graviton), Google (Axion), and Microsoft (Cobalt) increasingly favor ARM-based silicon, citing up to 40% efficiency improvements.
  • Edge NPU Trade-offs: Competitors like RISC-V and Qualcomm Hexagon push AI logic off-core, whereas ARM integrates it into the CPU, improving software portability but trading off peak throughput.

Quantitative Support

  • Over 50% of new AWS CPU capacity since 2023 is ARM-based (Graviton).
  • ARM-based platforms account for over 40% of 2025 PC/tablet shipments.
  • SME and NEON extensions yield up to 4x ML kernel acceleration without dedicated accelerators.

2. On-Device AI Is Now Table Stakes

Technical Explanation

Bergey emphasized that on-device AI is becoming the norm, driven by privacy, latency, and offline capability needs. Use cases include coding assistants, chatbots, and real-time inference in industrial systems.

ARM showcased its client roadmap, including:

  • Travis CPU: Next-gen core with IPC improvements and enhanced SME.
  • Draga GPU: Advanced ray tracing and sustained mobile graphics.
  • ARM Accuracy Super Resolution (AASR): AI upscaling previously limited to consoles, now on mobile.

Critical Assessment

On-device AI is architecturally sound for privacy-sensitive or latency-critical apps. Yet, memory and thermal constraints remain obstacles for large model execution on mobile SoCs. ARM’s strategy of enhancing general-purpose cores aids flexibility, though specialized NPUs still offer superior throughput for vision or speech applications.

While ARM’s developer base (22 million) is substantial, toolchain fragmentation and driver inconsistencies complicate cross-platform integration.

Competitive/Strategic Context

  • Apple ANE: Proprietary and tightly integrated but closed.
  • Qualcomm Hexagon: Strong in multimedia pipelines but hampered by software issues.
  • Google Edge TPU: Power-efficient but limited in scope.

ARM’s open licensing and platform breadth support broad AI enablement, from Chromebooks to premium devices.

Quantitative Support

  • MediaTek’s Companio Ultra delivers 50 TOPS AI performance on ARM V9.
  • Travis + Draga enables 1080p upscaling from 540p, achieving console-level mobile graphics.

3. Taiwan as the Nexus of AI Hardware Innovation

Technical Explanation

Bergey emphasized Taiwan’s pivotal role in AI hardware: board design, SoC packaging, and advanced fab technologies. ARM collaborates with MediaTek, ASUS, and TSMC—all crucial for AI scalability.

He highlighted the DGX Spark platform, combining 20 ARM V9 CPUs and an NVIDIA GB10 GPU, delivering petaflop-class AI compute to compact systems.

Critical Assessment

Taiwan excels in advanced packaging (e.g., CoWoS) and silicon scaling. But geopolitical risks could impact production continuity. ARM’s integration with Taiwanese partners is a strategic strength, yet resilience planning remains essential.

DGX Spark is a compelling proof-of-concept, though mainstream adoption may be constrained by power and cost considerations, especially outside research or high-end enterprise.

Competitive/Strategic Context

  • U.S. Foundries: Lag in packaging tech; TSMC leads sub-5nm.
  • China: Investing heavily but remains tool-dependent.
  • Europe: Focused on sustainable compute but lacks vertical integration.

ARM’s neutral IP model facilitates global partnerships despite geopolitical tensions.

Quantitative Support

  • Taiwan expects 8x data center power growth, from megawatts to gigawatts.
  • DGX Spark packs 1 petaflop compute into a desktop form factor.

Conclusion

ARM’s COMPUTEX 2025 keynote presented a strategic vision for a future where AI is ubiquitous and ARM is foundational. From hyperscale to wearable, ARM aims to lead through performance-per-watt, platform coverage, and ecosystem scale.

Challenges persist: model optimization, power efficiency, and political risk. Still, ARM’s trajectory suggests it could define the next computing era—not just through CPUs, but as a full-stack enabler of AI.

For CTOs and architects planning future compute stacks, ARM’s approach offers compelling value, especially where scalability, energy efficiency, and developer reach take precedence over peak raw performance.

Microsoft Build 2025: A Platform Shift for the Agentic Web

Executive Summary

Satya Nadella’s opening keynote at Microsoft Build 2025, on May 20th, 2025, painted a comprehensive vision of the evolving developer landscape, centered around what Microsoft calls the agentic web—a system architecture where autonomous AI agents interact with digital interfaces and other agents using standardized protocols. This shift treats AI agents as first-class citizens in software development and business processes.

This is not just an incremental evolution of existing tools but a transformation that spans infrastructure, tooling, platforms, and applications. While Microsoft presents this as a full-stack transformation, practical maturity across the stack remains uneven—particularly in orchestration and security.

The central thesis was clear: Microsoft is positioning itself as the enabler of this agentic future, offering developers a unified ecosystem from edge to cloud, with open standards like MCP (Model Context Protocol) at its core.

This blog post distills three critical takeaways that represent the most impactful innovations and strategic moves presented at the event.


Critical Takeaway 1: GitHub Copilot Evolves into a Full-Stack Coding Agent

Technical Explanation

GitHub Copilot has evolved beyond code completion and chat-based assistance into a full-fledged coding agent capable of autonomous task execution. Developers can now assign issues directly to Copilot, which will generate pull requests, triage bugs, refactor code, and even modernize legacy applications (e.g., Java 8 → Java 21). These features are currently in preview.

It integrates with GitHub Actions and supports isolated branches for secure operations. While there is discussion of MCP server configurations in future integrations, public documentation remains limited.

Microsoft has also open-sourced the integration scaffolding of Copilot within VS Code, enabling community-driven extensions, though the underlying model remains proprietary.

Critical Assessment

This represents a major leap forward in developer productivity. By treating AI not as a passive assistant but as a peer programmer, Microsoft is redefining how developers interact with IDEs. However, the effectiveness of such agents depends heavily on the quality of training data, token handling capacity, and context-awareness.

Potential limitations include:

  • Context fidelity: Can the agent maintain state and intent across large codebases given current token limits?
  • Security and auditability: Transparency around sandboxing and trace logs is essential.
  • Developer trust: Adoption hinges on explainability and safe fallback mechanisms.

Competitive/Strategic Context

Competitors like Amazon CodeWhisperer and Tabnine offer similar capabilities but lack GitHub’s deep DevOps integration. Tabnine emphasizes client-side privacy, while CodeWhisperer leverages AWS IAM roles but offers limited CI/CD interaction.

FeatureGitHub Copilot AgentAmazon CodeWhispererTabnine
Autonomous PR generation
Integration with CI/CDLimited
Open-sourced in editorPartial✅ (partial)
Multi-agent orchestrationPlanned

Quantitative Support

  • GitHub Copilot has over 15 million users.
  • Over 1 million agents have been built using Microsoft 365 Copilot and Teams.
  • Autonomous SRE agents reportedly reduce incident resolution time by up to 40%.

Critical Takeaway 2: Azure AI Foundry as the App Server for the Agentic Era

Technical Explanation

Azure AI Foundry is positioned as the app server for the next generation of AI applications—analogous to how Java EE or .NET once abstracted deployment and lifecycle management of distributed applications.

Key features:

  • Multi-model support: 1,900+ models including GPT-4o, Mistral, Grok, and open-source variants.
  • Agent orchestration: Enables deterministic workflows with reasoning agents.
  • Observability: Built-in monitoring, evals, tracing, and cost tracking.
  • Hybrid deployment: Supports cloud-to-edge and sovereign deployments.

Foundry includes a model router that automatically selects models based on latency, performance, and cost, reducing operational overhead.

Critical Assessment

Foundry addresses the lack of a standardized app server for stateful, multi-agent systems. Its enterprise-grade reliability is particularly appealing to organizations already invested in Azure.

Still, complexity remains. Building distributed intelligent agents demands robust coordination logic, long-term memory handling, and fault-tolerant execution—all areas that require ongoing refinement.

Competitive/Strategic Context

AWS Bedrock and Google Vertex AI offer model hosting and inference APIs, but Azure Foundry differentiates through full lifecycle support and tighter integration with agentic paradigms. Support for open protocols like MCP also enhances portability and neutrality.

CapabilityAzure AI FoundryAWS BedrockGoogle Vertex AI
Multi-agent orchestrationLimited
Model routing
Memory & RAG integrationLimited
MCP support

Quantitative Support

  • Over 70,000 organizations use Foundry.
  • In Q1 2025, Foundry processed more than 100 trillion tokens (5x YoY growth).
  • Stanford Medicine reduced tumor board prep time by 60% using Foundry-based agents.

Critical Takeaway 3: The Rise of the Agentic Web with MCP and NLWeb

Technical Explanation

Microsoft is building an open agentic web anchored by:

  • MCP (Model Context Protocol): A lightweight, HTTP-style protocol for secure, interoperable agent-to-service communication. A native MCP registry is being integrated into Windows to allow secure exposure of system functionality to agents. Public availability is currently limited to early preview.
  • NLWeb: A framework that enables websites and APIs to expose structured knowledge and actions to agents, functioning like OpenAPI or HTML for agentic interaction. Implementation requires explicit markup and wrappers.

Together, these technologies support a decentralized, interoperable agent ecosystem.

Critical Assessment

MCP solves the critical problem of safe, permissioned access to tools by agents. NLWeb democratizes agentic capabilities for web developers without deep ML expertise.

Challenges include:

  • Standardization: Broad adoption of MCP beyond Microsoft is still nascent.
  • Security: Risk of misuse via overly permissive interfaces.
  • Performance: Real-time agentic calls could introduce latency bottlenecks.

Competitive/Strategic Context

LangChain and MetaGPT offer agent orchestration but lack the web-scale interoperability MCP/NLWeb target. Microsoft’s emphasis on open composition is reminiscent of the REST API revolution.

FeatureMCP + NLWebLangChain ToolingMetaGPT
Web composability
InteroperabilityLimitedProprietary
Open source
Security modelOS-integratedManualManual

Quantitative Support

  • Windows MCP registry enables discovery of system-level agents (files, settings, etc.).
  • Partners like TripAdvisor and O’Reilly are early adopters of NLWeb.
  • NLWeb supports embeddings, RAG, and Azure Cognitive Search integration.

Conclusion

Microsoft Build 2025 marked a definitive pivot toward the agentic web, where AI agents are not just tools but collaborators in software, science, and operations. Microsoft is betting heavily on open standards like MCP and NLWeb while reinforcing its dominance in developer tooling with GitHub Copilot and Azure AI Foundry.

For CTOs and architects, the message is clear: the future of software is agentic, and Microsoft aims to be the platform of choice. The success of this vision depends on Microsoft’s ability to balance openness with control and to build trust across the developer ecosystem.

The tools are now in place—and the race is on.

Jensen Huang’s COMPUTEX 2025 Keynote: A Technical Deep Dive into the Future of AI Infrastructure

Executive Summary

In his keynote at COMPUTEX 2025 on May 19th, 2025, NVIDIA CEO Jensen Huang outlined a detailed roadmap for the next phase of computing, positioning artificial intelligence as a new foundational infrastructure layer—on par with electricity and the internet. Rather than focusing on individual product SKUs, Huang presented NVIDIA as the platform provider for enterprises, industries, and nations building sovereign, scalable AI systems.

Central to this vision is the replacement of traditional data centers with “AI factories”—integrated computational systems designed to generate intelligence in the form of tokens. Huang introduced key architectural advancements including the Grace Blackwell GB300 NVL72 system, next-generation NVLink and NVSwitch fabrics, and the strategic open-sourcing of Isaac GR00T, a foundational robotics agent model.

This post dissects the three most technically significant announcements from the keynote, with a focus on implications for system architects, CTOs, and principal engineers shaping next-generation AI infrastructure.


1. The GB300 NVL72 System: Scaling AI Factories with Rack-Scale Integration

Technical Overview

The Grace Blackwell GB300 NVL72 system represents a fundamental rethinking of rack-scale AI infrastructure. Each rack contains 72 B300 GPUs and 36 Grace CPUs in a liquid-cooled configuration, delivering up to 1.4 exaflops (FP4) of AI performance. Notable improvements over the H100/H200 era include:

  • ~4× increase in LLM training throughput
  • Up to 30× boost in real-time inference throughput
  • 192 GB of HBM3e per GPU (2.4× increase over H100)
  • 5th-generation NVLink with 1.8 TB/s per GPU of bidirectional bandwidth

A 4th-generation NVSwitch fabric provides 130 TB/s of all-to-all, non-blocking bandwidth across the 72 GPUs, enabling a unified memory space at rack scale. The system operates within a 120 kW power envelope, necessitating liquid cooling and modernized power distribution infrastructure.

Architectural Implications

The GB300 NVL72 exemplifies scale-up design: high-bandwidth, tightly coupled components acting as a single compute unit. This architecture excels at training and inference tasks requiring massive memory coherence and fast interconnects.

However, scale-out—distributing computation across multiple racks—remains bottlenecked by inter-rack latency and synchronization challenges. NVIDIA appears to be standardizing the NVL72 as a modular “AI factory block,” favoring depth of integration over breadth of distribution.

The thermal and electrical demands are also transformative. 120 kW per rack mandates direct-to-chip liquid cooling, challenging legacy data center design norms.

Strategic and Competitive Context

Feature / VendorNVIDIA GB300 NVL72AMD MI300X PlatformGoogle TPU v5pIntel Gaudi 3
Primary Interconnect5th-Gen NVLink + NVSwitch (1.8 TB/s/GPU)Infinity Fabric + PCIe 5.0ICI + Optical Circuit Switch24× 200 GbE RoCE per accelerator
Scale-Up ArchitectureUnified 72-GPU coherent fabric8-GPU coherent node4096-chip homogeneous podsEthernet-based scale-out
Programming EcosystemCUDA+, cuDNN, TensorRTROCm, HIPJAX, XLA, PyTorchSynapseAI, PyTorch, TensorFlow
Key DifferentiatorBest-in-class scale-up performanceOpen standards, cost-effectiveExtreme scale-out efficiencyEthernet-native, open integration

Quantitative Highlights

  • Performance Density: A single 120 kW GB300 NVL72 rack (1.4 EFLOPS FP4) approaches the compute capability of the 21 MW Frontier supercomputer (1.1 EFLOPS FP64), yielding over 150× higher performance-per-watt, though with different numerical precision.
  • Fabric Bandwidth: At 130 TB/s, NVSwitch bandwidth within a rack exceeds peak estimated global internet backbone traffic.
  • Power Efficiency: Estimated at 25–30 GFLOPS/Watt (FP8), reflecting architectural and process node advances.

2. NVLink-C2C: Opening the Fabric to a Semi-Custom Ecosystem

Technical Overview

NVIDIA announced NVLink-C2C (Chip-to-Chip), a new initiative to allow third-party silicon to participate natively in the NVLink fabric. Three key integration paths are available:

  1. Licensed IP Blocks: Partners embed NVLink IP in their own SoCs, ASICs, or FPGAs.
  2. Bridge Chiplets: Chiplet-based bridges allow legacy designs to connect without redesigning core logic.
  3. Unified Memory Semantics: Ensures full coherence between NVIDIA GPUs and partner accelerators or I/O devices.

This enables hybrid system architectures where NVIDIA GPUs operate alongside custom silicon—such as domain-specific accelerators, DPUs, or real-time signal processors—in a shared memory space.

Strategic Assessment

NVLink-C2C is a strategic counter to open standards like CXL and UCIe. By enabling heterogeneity within its own high-performance ecosystem, NVIDIA retains control while expanding use cases.

Success depends on:

  • Partner ROI: Justifying the cost and engineering complexity of proprietary IP over CXL’s openness.
  • Tooling & Validation: Supporting cross-vendor debug, trace, and profiling tools.
  • Performance Guarantees: Ensuring third-party devices do not introduce latency or stall high-bandwidth links.

This move also repositions NVIDIA’s interconnect fabric as the system backplane, shifting the focus from CPUs and PCIe roots to GPUs and NVLink hubs.

Ecosystem Comparison

Interconnect StandardNVLink-C2CCXLUCIe
Use CaseGPU-accelerated chiplet/silicon cohesionCPU-to-device memory expansionDie-to-die physical interface for chiplets
Coherence ModelFull hardware coherenceCXL.cache and CXL.memProtocol-agnostic
GovernanceProprietary (NVIDIA)Open consortiumOpen consortium
Strategic GoalGPU-centric heterogeneous integrationBroad heterogeneity and ecosystem accessChiplet disaggregation across vendors

Confirmed partners: MediaTek, Broadcom, Cadence, Synopsys.


3. Isaac GR00T and the Rise of Physical AI

Technical Overview

Huang identified a strategic shift toward embodied AI—autonomous agents that operate in the physical world. NVIDIA’s stack includes:

  • Isaac GR00T (Generalist Robot 00 Technology): A robotics foundation model trained on multimodal demonstrations—text, video, and simulation. Designed to be robot-agnostic.
  • Isaac Lab & Omniverse Sim: A highly parallelized simulation environment for training and validating policies via reinforcement learning and sim-to-real pipelines.
  • Generative Simulation: AI-generated synthetic data and environments, reducing dependence on real-world data collection.

Together, these components define a full-stack, simulation-first approach to training robotics agents.

Challenges and Opportunities

While simulation fidelity continues to improve, the sim-to-real gap remains the key barrier. Discrepancies in dynamics, perception noise, and actuator behavior can derail even well-trained policies.

Other critical considerations:

  • Safety and Alignment: Embodied AI introduces physical risk; rigorous validation and fail-safe mechanisms are mandatory.
  • Fleet Orchestration: Deploying, updating, and monitoring robots in real-world environments requires industrial-grade orchestration platforms.
  • Edge Compute Requirements: Real-time control necessitates high-performance, low-latency hardware—hence NVIDIA’s positioning of Jetson Thor as the robotics edge brain.

Competitive Landscape

Company / PlatformNVIDIA IsaacBoston DynamicsTesla OptimusOpen Source (ROS/ROS 2)
AI ApproachFoundation model + sim-to-realClassical control + RLEnd-to-end neural (vision-to-actuation)Modular, limited AI integration
SimulationOmniverse + Isaac LabProprietaryProprietaryGazebo, Webots
Business ModelHorizontal platform + siliconVertically integrated hardwareIn-house for vehicle automationCommunity-led, vendor-neutral

Strategic Implications for Technology Leaders

1. Re-Architect the Data Center for AI Factory Workloads

  • Plan for 120 kW/rack deployments, with liquid cooling and revamped power infrastructure.
  • Network performance is system performance: fabrics like NVSwitch must be part of core architecture.
  • Talent pipeline must now blend HPC, MLOps, thermal, and hardware engineering.

2. Engage in Heterogeneous Compute—But Know the Tradeoffs

  • NVLink-C2C offers deep integration but comes at the cost of proprietary lock-in.
  • CXL and UCIe remain credible alternatives—balance performance against openness and cost.

3. Prepare for Digital-Physical AI Convergence

  • Orchestration frameworks must span cloud, edge, and robotic endpoints.
  • Edge inferencing and data pipelines need tight integration with simulation and training platforms.
  • Robotics will demand security, safety, and compliance architectures akin to automotive-grade systems.

Conclusion

Jensen Huang’s COMPUTEX 2025 keynote declared the end of general-purpose computing as the default paradigm. In its place: AI-specific infrastructure spanning silicon, system fabrics, and simulation environments. NVIDIA is building a full-stack platform to dominate this new era—from rack-scale AI factories to embodied agents operating in the physical world.

But this vision hinges on a proprietary ecosystem. The counterweights—open standards, cost-conscious buyers, and potential regulatory scrutiny—will define whether NVIDIA’s walled garden becomes the new industry blueprint, or a high-performance outlier amid a more modular and open computing future.

For CTOs, architects, and engineering leaders: the choice is not just technical—it is strategic. Infrastructure decisions made today will determine whether you’re building on granite or sand in the coming decade of generative and physical AI.

Intel Foundry’s Back-End Technology Update: A Deep Dive into Heterogeneous Integration Strategy

Executive Summary

In his presentation at Direct Connect 2025 on April 29th, 2025, Navid Shahriari, Executive Vice President and General Manager of Intel Foundry’s integrated technology development and factory network, outlined a comprehensive roadmap for advanced packaging technologies under the umbrella of heterogeneous integration. The talk emphasized Intel Foundry’s evolution into an OSAT (Outsourced Semiconductor Assembly and Test) partner of choice, offering full-stack flexibility—from design to manufacturing—while addressing critical challenges in quality, yield, and cost.

Shahriari positioned heterogeneous integration as a transformative force powering the AI revolution, moving from a niche concept to a mainstream necessity. His technical roadmap included enhancements to EMIB (Embedded Multi-die Interconnect Bridge), the introduction of Foros R/B, hybrid bonding (Forvorous Direct), and innovations in power delivery, thermal management, and co-packaged optics. The strategic goal is clear: provide scalable, flexible, and cost-effective packaging solutions that meet the extreme demands of next-generation AI systems.


Three Critical Takeaways

1. Enhanced EMIB with TSV-Based Power Delivery (EMIT)

Technical Explanation

Intel introduced EMIT, an enhancement to its existing EMIB (Embedded Multi-die Interconnect Bridge) technology. EMIB enables high-density interconnect between multiple die using a silicon bridge embedded in the organic substrate. EMIT adds Through-Silicon Vias (TSVs) to this architecture, enabling direct power delivery through the substrate rather than relying on thin metal layers in the bridge itself.

This addresses IR drop issues that become significant at higher data rates (e.g., HBM4 operating at 12 Gbps per pin). By routing power vertically through TSVs, EMIT reduces both AC and DC noise, improving signal integrity and performance stability.

Key specs:

  • Supports HBM4 and UCIe (Universal Chiplet Interconnect Express)
  • Scalable pitch down to 9µm
  • Panel-based DLAST process enables large-scale integration (up to 80x80mm² packages)

Critical Assessment

The addition of TSV-based power delivery represents a pragmatic solution to a well-known limitation of 2.5D interposer architectures. While silicon interposers offer excellent interconnect density, their use for power distribution has always been suboptimal due to limited metal thickness and current-carrying capacity.

By embedding vertical TSVs directly into the EMIB structure, Intel effectively combines the best of both worlds: the cost and scalability benefits of panel-based packaging with the robustness of TSV-based power rails. However, the long-term reliability of these TSVs under high current densities remains a concern, especially for kilowatt-level AI chips.

Competitive/Strategic Context

Compared to TSMC’s CoWoS-S, which uses a full silicon interposer with redistribution layers, EMIB/EMIT offers better cost scaling because it avoids wafer-level reticle stitching constraints. TSMC’s approach excels in maximum bandwidth but suffers from lower throughput and higher costs at scale.

FeatureIntel EMIB/EMITTSMC CoWoS-S
Interconnect TypeEmbedded Silicon BridgeFull Silicon Interposer
Power DeliveryTSV-enhancedThin Metal Layers
Cost ScalingGoodPoor
Max Reticle SizePanel-scaleWafer-scale

Quantitative Support

  • Over 16 million units of EMIB already shipped
  • Targeting 8x reticle size by 2026 and beyond
  • Supports up to 12 HBM stacks

2. Hybrid Bonding (Forvorous Direct): 9µm Pitch Copper-to-Copper Bonding

Technical Explanation

Intel announced progress in hybrid bonding, specifically Forvorous Direct, achieving a 9µm pitch copper-to-copper bonding for 3D stacking. This allows direct metallurgical bonding between dies without microbumps, reducing parasitics and enabling ultra-high-density interconnects.

Hybrid bonding is crucial for future chiplet architectures, where logic-on-logic or logic-on-memory stacking is needed with minimal latency and power overhead.

Critical Assessment

Hybrid bonding is widely regarded as the next frontier in advanced packaging. Intel’s reported yield improvements are promising, but real-world reliability metrics remain sparse. Reliability testing typically requires multiple data turns across temperature, voltage, and mechanical stress cycles—data that was not shared.

Another consideration is alignment accuracy: achieving consistent bond quality across millions of pads at 9µm pitch is non-trivial and will require precision equipment and control algorithms. Intel’s roadmap suggests production readiness within a year, which aligns with industry expectations.

Competitive/Strategic Context

Intel competes here with TSMC’s BONDOS and Samsung’s Hybrid Bonding offerings. Both foundries have demonstrated similar pitches (down to ~6–7µm), though commercial deployment is still limited.

FeatureIntel Forvorous DirectTSMC BONDOS
Bond TypeCu-CuCu-Cu
Pitch9µm6–7µm
Production ReadinessSampling now, 2026 targetLimited availability
Yield DataImprovingNot publicly available

Quantitative Support

  • Achieved 9µm pitch hybrid bonding
  • High-volume sampling underway
  • Targeting production readiness in 2026

3. Known-Good Die (KGD) Testing & Singulated Die Services

Technical Explanation

As chiplets and multi-die packages become more complex, ensuring known-good die (KGD) becomes mission-critical. Intel highlighted its mature singulated die test capability, developed over a decade, supporting advanced probing and burn-in processes.

This includes custom test flows, integration with ATE ecosystems (like Teradyne or Advantest), and support for customer-specific test vectors and protocols.

Critical Assessment

The economic impact of defective dies in multi-die systems can be catastrophic. Intel’s singulated die test infrastructure is a major differentiator, especially when compared to OSATs that lack such capabilities or rely on less rigorous binning strategies.

However, the cost and time overhead of exhaustive KGD testing must be balanced against yield improvements. For example, if a system integrates 100+ die, even a 1% defect rate leads to a 36% overall yield loss—highlighting the importance of near-perfect KGD assurance.

Competitive/Strategic Context

Most third-party OSATs do not offer end-to-end KGD services, instead focusing on assembly rather than pre-packaging test. Intel positions itself uniquely by offering KGD as a service, either standalone or as part of a broader flow.

CapabilityIntel KGD ServiceTypical OSAT Offering
Pre-Packaging TestYesNo
Burn-In CapabilitiesYesRare
Custom Test FlowSupportedLimited
Integration with ATEDeepBasic

Quantitative Support

  • Over 10 years of production experience
  • Piloting with select customers showing strong results
  • Essential for managing cost in multi-chiplet, high-reticle designs

Conclusion

Navid Shahriari’s presentation painted a compelling picture of Intel Foundry’s ambitions to lead in the post-Moore’s Law era through advanced packaging and heterogeneous integration. From enhanced EMIB with TSV power delivery to hybrid bonding and KGD-centric test strategies, the roadmap reflects a deep understanding of the evolving needs of AI-driven compute architectures.

While the technical claims are backed by impressive deployment figures (e.g., 16M+ EMIB units shipped), the true validation will come from sustained yield improvements, reliability data, and ecosystem adoption. Intel Foundry’s ability to offer modular, OSAT-like flexibility while maintaining world-class packaging innovation puts it in a unique position to serve both traditional and emerging semiconductor markets.

As AI continues to push the boundaries of system complexity and power density, Intel Foundry’s back-end roadmap may well define the next generation of compute platforms—not just for Intel, but for the broader ecosystem seeking alternatives to monolithic scaling.

Intel’s 18A and Beyond: A Deep Dive into Process Technology Innovation

Executive Summary

In this presentation at Direct Connect 2025 on April 29th, 2025, Intel’s Vice President and GM Ben Sell, along with Myung-Hee Na, outlined the company’s roadmap for next-generation process technologies. The central thesis revolves around extending Moore’s Law through architectural innovation—particularly via gate-all-around (GAA) transistors (RibbonFET) and backside power delivery (PowerVia). These innovations aim to deliver significant performance-per-watt improvements while enabling advanced 3D integration for AI and high-performance computing workloads.

The roadmap includes:

  • Intel 18A: First production GAA node with PowerVia, targeting Q4 2025 volume production.
  • Intel 18AP: Enhanced version of 18A with better transistor performance and VT types, slated for late 2026.
  • Intel 18APT: Base die for 3D ICs with TSVs optimized for signal and power, entering risk production in 2026.
  • Intel 14A: Full-node scaling over 18A with second-gen RibbonFET and PowerVia, expected in 2027.

The talk also emphasized technology co-optimization, system-aware design, and long-term R&D into post-silicon materials like molybdenum disulfide (MoS₂) and alternative packaging techniques.


Three Critical Takeaways

1. RibbonFET + PowerVia: A Dual Innovation for Performance and Density

Technical Explanation

Intel’s RibbonFET is a gate-all-around (GAA) transistor architecture that improves electrostatic control, particularly beneficial for low-voltage operation. Each transistor comprises four stacked ribbons, allowing for better current modulation and reduced leakage.

PowerVia rethinks traditional front-side power routing by moving it to the backside of the wafer. This approach:

  • Reduces voltage drop from bump to transistor
  • Relaxes lower-layer metal pitch requirements (from <25nm to ~32nm)
  • Improves library cell utilization

This dual innovation delivers:

  • >15% performance improvement at same power
  • 1.3x chip density improvement over Intel 3

Critical Assessment

The combination of RibbonFET and PowerVia addresses two major bottlenecks: transistor scalability and power delivery efficiency. However, the cost implications of adding backside metallization are non-trivial. Intel claims they offset this via simplified front-end patterning using EUV lithography.

One unstated assumption is the long-term yield stability of these complex processes, especially as they scale into multi-die stacks and 3D ICs. Early data shows yields matching or exceeding historical Intel nodes, but sustained HVM (high-volume manufacturing) yields remain to be seen.

Competitive/Strategic Context

Competitors like TSMC and Samsung are also pursuing GAA (MBCFET), with TSMC opting for nanosheet FETs. Samsung has announced Gate-All-Around for their 3nm node. However, Intel’s early integration of backside power delivery is unique and could offer advantages in chiplet-based designs and AI accelerators where power delivery and thermal management are critical.

Quantitative Support

MetricIntel 18A vs. Intel 3
Performance gain (same power)>15%
Chip density improvement1.3x
Lower metal pitch relaxation<25nm → 32nm
SRAM area reduction (high-density)~89%

2. System-Aware Co-Optimization for AI Workloads

Technical Explanation

Myung-Hee Na highlighted the shift from Design-Technology Co-Optimization (DTCO) to System-Technology Co-Optimization (STCO). This approach involves:

  • Understanding workload-specific compute needs (especially AI)
  • Co-designing silicon, packaging, and system architecture together
  • Enabling 3D ICs with fine-pitch TSVs and hybrid bonding

Intel’s Intel 18APT is designed specifically as a base die for 3D integration, offering:

  • 20–25% compute density increase
  • 25–35% power reduction
  • ~9x increase in die-to-bandwidth density

Critical Assessment

This marks a strategic pivot toward domain-specific optimization, aligning with trends in AI hardware acceleration and heterogeneous computing. However, implementing STCO requires deep collaboration across the stack—from EDA tools to OS-level scheduling—and may introduce new layers of complexity in verification and toolchain support.

While promising, Intel’s roadmap lacks concrete details on software enablement and toolchain readiness—key factors in realizing the benefits of co-optimized systems.

Competitive/Strategic Context

Other players like AMD and NVIDIA have pursued similar strategies via chiplet architectures and NVLink interconnects, respectively. However, Intel’s focus on bottom-up co-integration (silicon + packaging + system) sets them apart. The challenge will be maintaining coherence between rapidly evolving AI algorithms and fixed silicon pipelines.

Quantitative Support

FeatureIntel 18APT Improvement
Compute density+20–25%
Power consumption-25–35%
Die-to-bandwidth density×9 increase

3. High-NA EUV: Cost Reduction Through Simplified Patterning

Technical Explanation

Intel is leveraging high-NA EUV to reduce process complexity and cost. For example, certain patterns previously requiring three EUV exposures and ~40 steps can now be achieved with a single pass using high-NA EUV.

This not only shortens the process flow but also allows for metal layer depopulation, which can improve RC delay and overall performance.

Critical Assessment

The move to high-NA EUV is both technically sound and strategically necessary given the rising cost of multi-patterning. However, high-NA tools are still rare and expensive. ASML currently produces them in limited quantities, and full deployment across Intel’s foundry network will take time.

Additionally, there’s an implicit assumption that design rules can accommodate relaxed geometries without sacrificing performance—this remains to be validated in real-world SoC implementations.

Competitive/Strategic Context

TSMC and Samsung are also investing heavily in high-NA EUV, but Intel appears to be ahead in its integration timeline, particularly for logic applications. Their use case—combining high-NA with PowerVia—is novel and could provide a cost-performance edge in high-margin segments like client and server CPUs.

Quantitative Support

ApproachSteps RequiredMetal Layers Used
Traditional Multi-Pass EUV~40Multiple
High-NA EUV Single Pass~10–15Reduced (depopulated)

Conclusion

Intel’s Direct Connect 2025 presentation paints a compelling picture of process innovation driven by architectural foresight. With RibbonFET, PowerVia, and system-aware co-design, Intel is positioning itself to regain leadership in semiconductor manufacturing.

However, the path ahead is fraught with challenges:

  • Sustaining yield improvements at scale
  • Ensuring robust ecosystem support for novel flows
  • Managing the cost and availability of high-NA EUV

For CTOs and system architects, the key takeaway is clear: the future of compute lies in tightly integrated, domain-optimized silicon-and-packaging solutions. Intel’s roadmap reflects this vision, and while execution risks remain, the technical foundation is undeniably strong.

Intel Foundry 2025: A Strategic Shift in Semiconductor Manufacturing

Executive Summary

At the Direct Connect 2025 keynote on April 29th, 2025, Intel CEO Lip-Bu Tan outlined a bold and necessary pivot: transforming Intel into a leading global foundry. His central message was clear—innovation depends on deep collaboration, customer-centricity, and sustained execution.

Intel is now building its future on four interlocking pillars:

  • Process Technology Leadership
  • Advanced Packaging at Scale
  • Open Ecosystem Enablement
  • Manufacturing Scalability and Trust

Tan emphasized Intel’s singular position as the only U.S.-based company with both advanced R&D and high-volume manufacturing capabilities in logic and packaging. Key partnerships with Synopsys, Cadence, Siemens EDA, and PDF Solutions aim to establish a truly open and modern foundry model—one that is competitive with TSMC and Samsung on technology, but differentiated by geography, trust, and strategic alignment with national priorities.

This strategic direction was substantiated by in-depth presentations from executives Naga Shakerin and Kevin O’Rourke, detailing progress on Intel 18A, advanced packaging (EMIB and Foveros), and the ecosystem infrastructure supporting customer design and yield enablement.


Three Critical Takeaways

1. Intel 18A: Gate-All-Around and Backside Power, Delivered at Scale

Technology Leadership

Intel 18A introduces gate-all-around (GAA) RibbonFET transistors and PowerVia, a backside power delivery network that routes power beneath the transistor layer, freeing up top-side metal layers for signal routing.

Key benefits:

  • ~10% improvement in cell utilization
  • ~4% performance uplift at iso-power
  • ~30% density gain over Intel 20A

This architecture is tailored for compute-intensive, bandwidth-constrained domains like AI training, HPC, and edge inference, where energy efficiency and signal integrity dominate system-level constraints.

Competitive Perspective

While Samsung (3GAE) and TSMC (N2) also offer GAA, Intel is first to pair GAA with backside power in a commercially viable, high-volume node. This combination offers a compelling differentiator in power efficiency and routing simplicity, particularly for multi-die systems and 3D packaging strategies.

FeatureIntel 18ATSMC N2Samsung 3GAE
GAAYesYesYes
Backside PowerYesNoNo
High EUV UseYesYesModerate
U.S. Foundry OptionYesNoNo

Execution Status

  • Risk production in progress; volume production planned for 2025
  • Yield indicators tracking toward target defect densities
  • 100+ customer engagements under NDA
  • Early silicon achieving ~90–95% of performance targets

2. Advanced Packaging as the New Integration Frontier

Platform Capability

Intel is doubling down on heterogeneous integration via:

  • EMIB (Embedded Multi-die Interconnect Bridge): 2.5D packaging enabling high-bandwidth, low-latency links between chiplets
  • Foveros: 3D stacking with active interposers, TSVs, and logic-on-logic die integration

New variants include:

  • EMIB-T: Incorporating TSVs for enhanced vertical power delivery
  • Foveros R/B/S: Feature-integrated versions supporting voltage regulation and embedded passive elements (e.g., MIMCAPs)

Intel now supports reticle-scale and sub-reticle tile stitching, with packages up to 120×188 mm², enabling compute fabrics, stacked DRAM, and integrated accelerators in single systems-in-package.

Strategic Implication

Advanced packaging is Intel’s bridge between Moore’s Law economics and modular, chiplet-based innovation. While CoWoS and X-Cube offer similar capabilities, Intel’s advantage lies in its U.S.-based, vertically integrated packaging supply chain—a critical factor for defense, aerospace, and regulated markets.

MetricIntel EMIB/FoverosTSMC CoWoSSamsung X-Cube
Reticle StitchingYesPartialNo
TSV-EnabledYesLimitedYes
Power Integrity EnhancementsYesYesModerate
Domestic PackagingYesNoNo

Execution Status

  • Microbump pitch below 25 μm in production
  • Inline ML-based defect detection reduces test and soak costs by >20%
  • Packaging roadmap aligned with 18A and 14A node cadence

3. Ecosystem Enablement: Toward a Modern, Open Foundry

Infrastructure Build-Out

Intel is transitioning from an internal IDM model to an open, customer-facing foundry supported by industry-standard tools and workflows. Key developments:

  • PDK Access: 18A and 14A enabled through Synopsys and Cadence
  • Design Signoff: Siemens Calibre certified on 18A
  • Yield Analytics: PDF Solutions integrated into ramp flow, reducing yield learning cycles

Intel Foundry aims to meet external customer expectations on design readiness, IP portability, and predictable tapeout schedules—areas where TSMC has set the bar.

Market Context

While Intel’s ecosystem is still maturing, its combination of geopolitical alignment, manufacturing transparency, and customer co-design programs creates a differentiated value proposition—especially for companies operating in defense, automotive, or AI infrastructure sectors that require U.S.-based capacity.

CapabilityIntel FoundryTSMCSamsung
External IP SupportModerateExtensiveHigh
Open PDK AccessYesYesYes
AI Yield TuningYes (PDF)YesEmerging
Domestic ComplianceFullNonePartial

Execution Status

  • 18A tapeouts supported via pre-qualified tool flows
  • Over 100 design teams actively engaged across customer and internal tapeouts
  • Full stack support (RTL to GDSII to HVM) expected by Q4 2025

Conclusion

Intel’s 2025 foundry strategy marks a decisive inflection point for the company—and for the U.S. semiconductor industry at large. With 18A, Foveros, and an open design ecosystem now moving into execution, Intel is not merely catching up, but defining a new kind of foundry model: one built on technical excellence, geographic trust, and systems-level collaboration.

However, the path forward will demand discipline in yield ramping, transparency in roadmap delivery, and deep ecosystem support. For engineering leaders and CTOs, Intel presents a strategic alternative—not only in performance, but in resilience and sovereignty.

In a world where manufacturing location, IP control, and system integration are as important as process node performance, Intel Foundry may well become the preferred partner for the next generation of compute platforms.