Google Ironwood TPU: Next-Gen AI Inference Chip

Unlocking Agentic AI: The Power of Google’s Ironwood TPU

Google unveiled Ironwood, its seventh-generation Tensor Processing Unit (TPU), which brings custom design and advanced capabilities to transform its AI technology. The new architecture represents a deliberate strategic leap designed to meet the changing needs of Google’s most advanced Gemini models. The Ironwood chip is designed to perform exceptionally well in simulated reasoning tasks, which Google calls “thinking.”

The company strongly believes that its advanced AI models function best when paired with custom-designed infrastructure to create a mutually beneficial relationship. Ironwood demonstrates Google’s philosophy by delivering faster inference speeds and expanding context windows for powerful AI models. Google describes Ironwood as its most scalable and powerful TPU yet, which enables AI to autonomously gather information and generate outputs to serve users proactively. Google views “agentic AI” as a proactive system oriented around user needs, while Ironwood powers this new generation of inference technology.

Performance Unleashed: Ironwood’s Impressive Specs

Ironwood demonstrates substantial performance improvements in throughput when compared to earlier Google TPUs. The company plans to deploy a large-scale system that will feature clusters of 9,216 liquid-cooled Ironwood chips operating together. The newly advanced Inter-Chip Interconnect (ICI) establishes smooth communication between massive arrays while delivering high-bandwidth and low-latency data transfer throughout the system.

Google’s internal teams alongside cloud developers will gain access to this powerful processing capability. Ironwood will be available in two configurations: For modest requirements, companies will utilize a 256-chip server, and intense AI tasks will benefit from the advanced 9,216-chip cluster.

The sheer computational power of a full Ironwood pod is staggering: 42.5 Exaflops of inference computing. Google reports that each Ironwood chip achieves 4,614 TFLOPs peak throughput which marks a significant advancement from earlier designs. The memory capacity of each chip now stands at 192GB, which represents an impressive six times growth from the Trillium TPU’s memory capacity. The memory bandwidth has increased by 4.5 times to reach 7.2 Tbps.

Contextualizing the Power: Ironwood’s Place in the AI Landscape

The performance comparison between AI chips becomes difficult because different measurement methodologies exist. Google has chosen FP8 precision to measure Ironwood’s performance. The company asserts that Ironwood “pods” deliver performance 24 times greater than equivalent parts of top supercomputers but these comparisons should be approached carefully because certain supercomputers do not support FP8 natively in their hardware.

Google excluded their TPU v6 (Trillium) from direct performance comparisons. Ironwood delivers double the performance per watt efficiency than v6 according to company claims. Google announced that Ironwood serves as the next-generation TPU following v5p while Trillium is the next model for TPU v5e. Trillium achieved a peak performance level of about 918 TFLOPS when operating at FP8 precision.

The Road Ahead: Ironwood and the Future of AI

Despite the complexities of benchmarking, the message is clear: Google’s AI infrastructure advances considerably with the introduction of Ironwood. The improved performance and efficiency from Ironwood extend the substantial groundwork which allowed for fast progression in models such as Gemini 2.5 running on older TPU technology.

Google expects Ironwood’s advanced inference capabilities and enhanced efficiency to drive further transformative AI developments throughout the upcoming year. Ironwood offers essential processing power to enable complex models and agentive functions that will play a crucial role in Google’s “age of inference” vision of AI becoming a fundamental part of digital experiences.