Friday, January 10, 2025

How AWS is leaning into AI with chips, code, models

Must read

re:Invent Amazon Web Services CEO Matt Garman opened the cloud computing service’s 13th annual re:Invent conference by observing: “Generative AI has the potential to disrupt every single industry out there.”

While that may be worrisome for those defending against disruption, it’s an appealing apocalypse for AWS, not just on its own Arm-based servers, but also for its x86 fleet, and AI-oriented silicon services, too. It’ll happily sell you the tools for that disruption.

“In 2018, we saw a trend in compute,” recalled Garman. “We were looking out there, and we saw that Arm cores were getting faster. Most of them were in mobile, but they were getting more powerful. And we had this idea that there’s this opportunity that maybe we could go combine that technology curve with our knowledge of what’s most important to customers running inside of AWS and develop a custom general purpose processor.”

Today, that processor – the Graviton – is widely used by almost every AWS customer through EC2 instances in one way or another.

“Graviton delivers 40 percent better price performance than x86,” claimed Garman. “It uses 60 percent less energy.”

In 2019, AWS was a $35 billion business, said the AWS chief, adding: “Today, AWS has more Graviton capacity than we had total capacity in 2019.” According to Dave Brown, VP of AWS Compute, over the last two years, more than 50 percent of AWS’s additional CPU capacity has been Graviton-based.

Coincidentally, Intel, creator of the x86 architecture, is looking for a new CEO.

The biggest compute problem currently, Garman argued, involves generative AI. Graviton is aimed at general computation, but AWS has more AI-focused offerings.

“Now, the vast majority of generative AI workloads today run on Nvidia GPUs, and AWS is by far the best place anywhere in the world to run GPU workloads,” boasted Garman. “Part of the reason is because AWS and Nvidia have been collaborating together for 14 years to ensure that we’re really great at operating and running GPU workloads.”

AWS is doubling down on that partnership, Garman declared, with plans to launch instances next year featuring Nvidia’s forthcoming Blackwell chips.

Garman claimed those “P6” instances “will give you up to 2.5 times faster compute than the current generation of GPUs.”

More immediately, Garman said Trainium2 (Trn2), the second coming of AWS’s AI training chip, has reached general availability. The Amazon-designed silicon delivers 30 to 40 percent better price performance than current GPU systems, he claimed.

“These are purpose-built for the demanding workloads of cutting-edge generative AI training and inference,” said Garman.

There will also be EC2 Trn2 UltraServers, which consist of four Trn2 instances tied together using the company’s NeuroLink interconnect for a total of 64 Trainium2 chips capable of up to 83.2 FP8 petaflops.

According to Garman, Databricks, an AWS partner, expects to use Trainium2 to reduce its total cost of ownership (TCO) for joint customers by up to 30 percent.

Benoit Dupin, senior director of machine learning and AI at Apple, made a brief appearance to plug Apple Intelligence and – rather unusually for the inwardly focused iGiant – Apple’s relationship with AWS.

“We’re in the early stages of evaluating Trainium 2, and we expect our model numbers to gain up to 50 percent improvement in efficiency in pre-training,” said Dupin. “With AWS, we found that working closely together and taking advantage of the latest technologies has helped us be more efficient in the cloud. AWS expertise, guidance, and services have been instrumental in supporting our scale and growth.”

We can’t recall the last time an Apple exec showed up at a public event and gushed about a tech supplier – particularly in a way that highlights the sort of external technical dependency that the iBiz typically tries to avoid.

Buckets of cloudy upgrades

Also at the conf, Amazon announced its S3 cloud storage service has a new flavor: S3 Tables, which promises 3x faster query performance, and 10x more transactions per second for Apache Iceberg tables. There’s also S3 Metadata in preview, for managing metadata in S3.

Amazon Aurora, the cloud-based MySQL/PostgreSQL relational database management system, is celebrating its ten-year anniversary, and has a new sibling called Aurora DSQL.

The “D” in this instance stands for distributed. The associated service manages to provide distributed data storage with low latency, which for those knowledgeable in the ways of databases and networks represents a bit of a technical achievement.

“What we did is we added a hardware reference clock in every single EC2 instance all around the world,” explained Garman. “And those hardware reference clocks sync with satellite-connected atomic clocks. So that means that every EC2 instance now has microsecond-precision accurate time that’s in sync with any instance anywhere in the world.”

Aurora DSQL provides distributed storage with low-latency reads and writes, as well as strong consistency across multiple regions. According to Garman, it does so 4 times faster than Google Spanner – a comparable distributed database management service – given a 10-statement SQL transaction.

Amazon DynamoDB (NoSQL) tables now also support multi-region strong consistency, as a preview capability.

Of course there’s AI news, too

Amazon Bedrock, AWS’s managed AI model service, also received some attention. Amazon Bedrock Model Distillation is a new service, in preview, to reduce the size and cost of AI models.

“What model distillation does is you take this large frontier model,” explained Garman. “In this example, it’s a Llama 405B model. You take this highly capable model, and you send it all your prompts and all the questions that you might want to ask it. Then you take all of the data and the answers that come out of that and together with the questions, you use that to train a smaller model, in this case a Llama 8B model, to be an expert at that one particular thing.”

The result is a smaller, faster model that’s more affordable – up to 500 percent faster and up to 75 percent less expensive than the original model, according to Amazon, with an accuracy loss of less than two percent on use cases like Retrieval Augmented Generation (RAG).

Bedrock has also gained another preview capability: multi-agent collaboration. It allows customers to design agents for specific tasks that can be compiled in parallel or in sequence under a supervisory agent.

Garman described beta testing work done with financial service provider Moody’s to develop a proof-of-concept application capable of generating financial risk reports for its customers.

“Before this proof of concept, this was a workflow that would take one of their agents or one of their employees about a week to do,” said Garman. “They ran this proof of concept for this multi-agent collaboration. They were able to accomplish the same task in one hour, and the ability to seamlessly scale it across any number of companies in parallel.”

Seeing things

Perhaps most impressive of all, AWS claims to have an answer to one of generative AI’s most pernicious problems: hallucination. Bedrock Guardrails has gained a new capability in preview called Automated Reasoning Checks.

“Automated Reasoning Checks prevent factual errors due to model hallucinations,” explained Garman. “So, when you implement one of these automated reasoning checks, what happens is Bedrock can actually check that the factual statements made by models are accurate.”

The idea is that this will allow businesses to deploy AI for customer-facing applications where responding correctly is critical. As an example, Garman described how an insurance company would want to be sure that an AI help system could correctly answer a customer query about whether or not insurance coverage applied to a particular situation.

Model building

Andy Jassy, CEO of Amazon and former AWS boss, took a turn on stage to announce Amazon Nova – a set of foundation models. They include Nova Micro (text-based), Nova Lite (multimodal), Nova Pro (multimodal), and Nova Premiere (multimodal, Q1 2025), as well as Nova Canvas (image generation) and Nova Reel (video generation). They are, by Amazon’s assessment, “at least 75 percent less expensive than the best performing models in their respective intelligence classes in Amazon Bedrock.”

An Amazon Nova speech-to-speech model is planned for 2025, along with an any-to-any modality model.

Amazon developed the Nova models itself, without the involvement of Anthropic, AWS’s preferred AI partner and recipient of some $7 billion of investment.

The Nova model card and AI Service cards attest to various responsible AI tests and security evaluations that have been conducted, but offer little in the way of safety comparisons with other models or current vulnerabilities.

That all said, AWS teased Project Rainier, a supercomputing cluster said to contain “hundreds of thousands” of Trainium2 chips, due to power up in 2025 and will be used by Anthropic for building models.

New toys for coders

Software developers were not forgotten in all this. Amazon’s Q coding assistant has been bestowed with appealing new capabilities such as generating unit tests, documentation, and performing code reviews.

“We think that these agents can materially reduce a lot of the time that’s spent on these really important, but maybe undifferentiated tasks, and allow your developers to spend more time on those value-added activities,” claimed Garman.

Customers would love an easy button to get off of Windows

The artificially intelligent Q Developer has become more adept at code migration – it can convert Windows-only .NET Framework applications to modern .NET Core (which can be hosted on Linux) far faster than was previously the case, can transform workloads built to run under VMware to cloud native architectures, and can even assist with the conversion of mainframe applications.

“Customers would love an easy button to get off of Windows,” observed Garman. “They’re tired of constant security issues, the constant packing or patching, all the scalability challenges that they have to deal with. And they definitely hate the onerous licensing costs.”

Garman estimated that Windows refugees can save as much as 40 percent. He said Signaturit – a European digital transaction firm – worked with an early beta of Q Developer to move from Windows .NET applications to Linux. He boasted: “A project that they estimated was gonna take six to eight months, they actually completed in just a few days.”

More detail on the goodies in Q Developer can be found at DevClass.

Amazon’s summary of all the stuff it’s teased and talked up at re:Invent so far can be found here, for those who want to dive into the individual announcements. We’ll have more coverage this week, including what’s going on with SageMaker. ®

Latest article