Wednesday, December 18, 2024

Google Cloud moves its AI-focused Trillium chips into general availability – SiliconANGLE

Must read

Google LLC’s cloud unit today announced that Trillium, the latest iteration of its tensor processing unit artificial intelligence chip, is now generally available.

The launch of the TPU comes seven months after the search giant first detailed the custom processor. It offers up to three times the inference throughput of Google’s previous-generation chip.

When used for AI training, it can provide two and a half times the performance. Google says that the chip delivers those speed increases using 67% less power, which means AI workloads can run more cost-efficiently in the company’s cloud. 

Trillium is partly made up of compute modules called TensorCores. Each such module includes, among others, circuits optimized to perform matrix multiplications, mathematical operations that AI models rely on heavily to process data. One reason Trillium is faster than its predecessor is that Google increased the size of the matrix multiplication units and boosted clock speeds. 

The company’s engineers also enhanced another Trillium building block called the SparseCore. It’s a set of circuits optimized to process embeddings, the mathematical structures in which AI models keep their data. The new SparseCore is particularly adept at crunching large embeddings that hold a significant amount of information. Such supersized files are extensively used by the AI models that underpin product recommendation tools and search engines.

Trillium’s logic circuits are supported by an HBM memory pool twice as large as the one in Google’s previous-generation chip. It’s also twice as fast. HBM is a type of pricey, high-speed RAM that is used in graphics cards because it allows AI models to quickly access the data they require for calculations.

Google also added other improvements, including to the interconnect through which Trillium chips exchange data with one another. The company doubled the component’s bandwidth. That means the chips can share AI models’ data twice as fast with one another to speed up calculations.

In Google’s data centers, Trillium processors are deployed as part of server clusters called pods that each feature 256 of the AI chips. Pods’ hardware resources are provided to AI workloads in the form of so-called slices. Each slice corresponds to a subset of the AI chips in a pod.

With Trillium, Google Cloud introduced a new software capability for managing slices. The feature groups multiple slices into a pool of hardware resources called a collection. “This feature allows Google’s scheduling systems to make intelligent job-scheduling decisions to increase the overall availability and efficiency of inference workloads,” Mark Lohmeyer, Google Cloud’s vice president and general manager of compute and AI infrastructure, wrote in a blog post

Alongside Trillium chips, a pod includes a range of other components. A system called Titanium offloads certain computing tasks from a Trillium server’s central processing unit to specialized chips called IPUs. Those chips perform the offloaded chores more efficiently than the CPU, which optimizes data center operations.

Operating large AI clusters is challenging partly because they can experience scalability challenges. When engineers add more processors to a cluster, only a portion of the extra processors’ compute capacity can be used by applications because of technical constraints. Moreover, the diminishing returns sometimes become more pronounced as the cluster grows in size.  

Google says the design of its Trillium-powered pods addresses those limitations. In one internal run, a set of 12 pods achieved 99% scaling efficiency when training an AI model with 175 billion parameters. During one another test that involved Llama-2-70B, Google’s infrastructure achieved 99% scaling efficiency. 

“Training large models like Gemini 2.0 requires massive amounts of data and computation,” Lohmeyer wrote. “Trillium’s near-linear scaling capabilities allow these models to be trained significantly faster by distributing the workload effectively and efficiently across a multitude of Trillium hosts.”

Photo: Google

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Latest article