By Anindita Nayak
Bhubaneswar, May 16
Search engine giant Google unveiled a new chip ‘Trillium’ for training and running its foundation large language models (LLM) such as Gemma and Gemini at its annual I/O conference on Tuesday.
Trillium is the sixth generation TPU (Tensor Processing Unit), which is 4.7 times more efficient in terms for compute performance over its previous generation. These new chips are also 67% more energy efficient.
Google plans to use Trillium in its AI Hypercomputer, a supercomputing architecture designed for cutting-edge AI-related workloads. This new chip will be made available to Google Cloud customers in late 2024.
“Trillium TPUs achieve an impressive 4.7X increase in peak compute performance per chip compared to TPU v5e. We doubled the High Bandwidth Memory (HBM) capacity and bandwidth, and also doubled the Interchip Interconnect (ICI) bandwidth over TPU v5e. Additionally, Trillium is equipped with third-generation SparseCore, a specialized accelerator for processing ultra-large embeddings common in advanced ranking and recommendation workloads,” Amin Vahdat, VP for the machine learning, systems, and cloud AI team at Google wrote in a blog post.
He said that Generative AI is transforming the technology sector but at the same time, it requires greater compute, memory, and communication powers for making it seamless.
“Generative AI is transforming how we interact with technology while simultaneously opening tremendous efficiency opportunities for business impact. But these advances require ever greater compute, memory, and communication to train and fine-tune the most capable models and to serve them interactively to a global user population,” he said in the blog post.
The company asserted that Trillium would power the next generation of AI models. “Trillium TPUs will power the next wave of AI models and agents, and we’re looking forward to helping enable our customers with these advanced capabilities. Support for training and serving of long-context, multimodal models on Trillium TPUs will also enable Google DeepMind to train and serve the future generations of Gemini models faster, more efficiently, and with lower latency than ever before,” Vahdt wrote.