Thursday, December 12, 2024

Has Google Turned A Corner In Quantum And AI Computing ?

Must read

Google has recently announced three new hardware platforms, including quantum, AI, and Arm. And these chips are indeed impressive.

Ok, I’ll admit it. I underestimated Google’s ability to drive meaningful innovations in the computing realm. Yes, they had great potential, but it seemed that few developments crossed the competitive threshold, and they didn’t get a lot of traction in the market, perhaps due more to marketing than product. Recent announcements, however, made it clear that this is no longer the case, and the stock responded by skyrocketing by over 11% in the last two trading days. It looks like the market has decided that hardware is cool again.

Google has introduced the breakthrough Willow chip for quantum computing, the new Trillium AI chip that powers the newly released Gemini 2.0 AI Model, and GA’d its first Arm-based server chip, called Axion, to compete with Amazon’s Graviton and displace some Intel and AMD x86 cloud instances. Let’s take a look.

Willow Quantum AI Computing

Google has been competing with Amazon, IBM, Microsoft, and others to advance the state of quantum computing. While quantum remains a research project worldwide, the potential is staggering, and it could reshape the computing landscape to solve currently unsolvable problems in the next decade.

The 105-qubit Willow replaces Google’s 53-qubit Sycamore in a big way by supporting more qubits and improving error correction, making the qubits more usable. Google grouped qubits to reduce error rates, one of the primary factors currently limiting quantum computing to the experimental stage.

State-of-the-art quantum computers typically have one error rate in a hundred to a thousand. Traditional binary computers only produce memory errors once in a billion billion calculations. So, if one does the math, quantum computers have error rates about 1 billion to 10 billion times higher than classical computers, as quantum circuits are susceptible to a range of disturbances, like temperatures, electromagnetic radiation, and vibrations.

Typically, the error rates increase as quantum systems scale to a more significant number of qubits. But that is not the case with Willow. The bigger the mesh of qubits becomes, the better Willow performs. “Every time we increased our logical qubits – or our groupings – from 3-by-3 to 5-by-5 to 7-by-7 arrays of physical qubits, the error rate didn’t go up,” Google researcher Michael Newman said. “It kept coming down. It went down by a factor of two every time we increased the size.” A modest improvement compared to the one in ten billion deltas of traditional computers puts Google on a path to viable commercial operations in the next five years as qubit counts exceed 1000.

Willow also shows significantly improved coherency, the ability of qubits to maintain their state over time. Willow demonstrated a doubling of memory coherence, another essential step in reducing inherent error rates.

Willow Quantum Achieves What Supercomputers Simply Cannot

In the Nature article announcing Willow, Google claimed that the new system could solve a mathematical problem that has vexed scientists for decades called Random Circuit Sampling, or RCS. Willow ran the RCS benchmark in less than five minutes, while the massive Frontier supercomputer at Oak Ridge National Laboratories would theoretically take ten septillion years (1 followed by 25 zeros).

Willow is a research tool that achieves a significant milestone along the path of quantum supremacy or utility in commercial applications beyond the reach of today’s largest supercomputers. But like other quantum computers, Willow has very few, or zero, commercial applications. Give the industry a few years, and that could change. However, the stock market didn’t care about that nit and rewarded Google with a remarkable 7% price increase on the announcement day.

Trillium: The 6th Generation Of Google TPU For AI

Google announced today the general availability of Trillium, the latest and most advanced TPU in the company’s nine-year effort to offer a competitive AI accelerator. Trillium was built to scale and is available on Google Cloud with over 100,000 Trillium chips per Jupiter network fabric, capable of scaling a single distributed training job to hundreds of thousands of accelerators. Trillium was used to create the new Gemini 2.0 AI model and serve inference processing of that model.

Trillium comes in a 256-chip pod and scales to hundreds of thousands of chips, making it possible to train trillion-parameter AI models. Trillium boasts the following over its predecessor:

  • Over 4x improvement in training performance
  • Up to 3x increase in inference throughput
  • A 67% increase in energy efficiency
  • An impressive 4.7x increase in peak compute performance per chip
  • Double the High Bandwidth Memory (HBM) capacity
  • Double the Interchip Interconnect (ICI) bandwidth
  • 100K Trillium chips in a single Jupiter network fabric
  • Up to 2.5x improvement in training performance per dollar and up to 1.4x improvement in inference performance per dollar

While TPUs have always been excellent accelerators, they have struggled to gain a significant market share as TPUs are only available on the Google Compute Platform. However, Apple has now said that Apple Intelligence was trained on Google TPUs, and Google TPU has a majority share of the other cloud accelerators, including Microsoft and Amazon AWS.

But where Trillium shines is for Mixture of Expert models, or MoEs. The models stress the network that connects accelerators, and the Trillium pods are nearly four times faster than the previous TPU v5e at the same size. As MoEs are snowballing, this could help Google capture more market share on its cloud.

Google Finally Enters The Arm Battle

Google recently released their own Arm chip, Axion, following Amazon AWS and Microsoft Azure, who have already released their Arm server CPUs. While less glamorous technology than AI and quantum processors, Arm has been steadily increasing the server CPU market share. TrendForce projects that the penetration rate of Arm architecture in data center servers will reach 22% by the end of 2025, posing a significant threat to the x86 CPU market share.

Axion is a well-designed server CPU with 35% better performance, 60% better price performance, and 65% better power efficiency than “current generation” x86 chips. Now that it has become generally available, we should expect Axion to garner a significant share of the Google Cloud compute fleet for internal and customer applications.

Where Does This Leave Google?

We believe that Google now has a portfolio of complete in-house systems to compete with and even be the best of other cloud providers and chip companies. Google has a fast Arm-based CPU, a speedy and scalable AI accelerator that excels with the latest workloads like training MoE models, a quantum computing platform that should be able to keep up with or even pass most competitors and the AI software stack and models developers need to attract the ecosystem and end-users to its cloud computing platform.

Disclosures: This article expresses the author’s opinions and should not be taken as advice to purchase from or invest in the companies mentioned. Cambrian-AI Research is fortunate to have many, if not most, semiconductor firms as our clients, including Blaize, BrainChip, Cadence Design, Cerebras, D-Matrix, Eliyan, Esperanto, Flex, GML, Groq, IBM, Intel, Nvidia, Qualcomm Technologies, Si-Five, SiMa.ai, Synopsys, Ventana Microsystems, Tenstorrent and scores of investment clients. Like many in the tech industry, our portfolio does hold Nvidia, but otherwise we have no investment positions in any of the companies mentioned in this article and do not plan to initiate any in the near future. For more information, please visit our website at http://www.cambrian-ai.com .

Latest article