Monday, December 23, 2024

AI demonstrates the changing face of cloud infrastructure

Must read

Analysis Cloud infrastructure is undergoing an upheaval with service providers rushing to deploy servers configured for AI model training, often at the cost of postponing the usual refresh cycle for their standard server hardware.

This generative AI boom has reshaped the priorities of many of the cloud providers, with a recent report indicating that capital expenditure on infrastructure for cloud datacenters is set to hit a 30 percent increase this year.

But even that estimate may be out of date, according to Vlad Galabov, head of Cloud and Datacenter Research at Omdia.

“Some of the cloud service providers have just indicated that they might be raising their capex above of what they forecast before, and as a reminder, capex was up by about a third, so to raise it even further, it really is reaching unprecedented levels,” Galabov said.

In fact, AI servers are set to make up more than half of the overall server capex this year – 66 percent, according to Omdia’s current estimates – which Galabov described as truly extraordinary.

“We haven’t seen anything as disruptive that has happened before,” he said. “We’ve now added about $10 billion to our previous spending forecast.”

This large-scale investment is coming from a wider group than just the big three cloud providers – AWS, Azure, and Google Cloud. It includes the top 10 cloud service providers, but also a growing number of specialized AI cloud service providers such as CoreWeave that are investing strongly as well.

Although 66 percent of current server spend is going on AI-capable hardware, this doesn’t mean such systems make up the bulk of those being deployed into cloud datacenters. In fact, AI servers only make up 19 percent of shipments, according to Omdia’s figures, which is an indication of just how costly these boxes stuffed with GPUs are.

One question this raises is whether all this spending on AI systems is having an impact on the ability of the cloud providers to refresh their existing server fleet. But it seems the picture is more complicated than that, especially when the big cloud operators have already invested billions in other custom silicon to make their platforms run more efficiently.

“At the biggest cloud service providers, a very active consolidation of their non-AI server infrastructure is upon us, to lower their costs outside of AI,” Galabov said.

One of the best examples is what Google has done with its video processing servers, he told us. The company developed a custom processor called a Video Encoding Unit (VCU), optimized for H.264 video, and one of those coprocessors was able to replace two Intel Skylake servers.

That meant that during the company’s next refresh, instead of replacing Skylake servers on a one-for-one basis with newer servers, they were able to replace several servers with just one system, configured with 12 VCUs.

Afterward, Google optimized the processor for a more efficient video compression format called VP9, and this meant a server with 20 VCUs was able to replace several racks of Skylake servers.

“So they are saving money by not having to renew several racks of servers and instead they’re replacing them with a single server that’s highly accelerated and highly optimized to that application,” Galabov said.

“This is something that we anticipate will happen increasingly. Meta is doing something similar and we think it’s likely this is a general trend. Database applications are likely going to go through a similar optimization cycle, as are networking infrastructure applications.”

(There are already hardware accelerators from companies such as Pliops that can boost the performance of the key:value stores that underpin many databases, for example.)

Meanwhile, many enterprises still haven’t ditched their own infrastructure in favor of public cloud resources, and instead there has been a recent trend for bringing some workloads back in house – so-called cloud repatriation – which The Register has covered previously.

One of the reasons for this is that many companies found that consuming public cloud resources can be just as costly, if not more so, than running your own IT. While there are a number of factors behind this (AWS is often accused of having byzantine billing), it seems that sometimes this can be just down to poor management of resources.

“It was ballooning infrastructure costs and under-utilization of infrastructure that drove a lot of enterprises to the cloud,” Galabov said, but when they moved to the cloud, some users continued with the same management practices and so continued to have under-utilized infrastructure and ended up paying for resources they don’t use.

For this reason, Galabov said there is likely to be an even greater focus on right-sizing in future, or placing workloads where it makes the most sense, something Omdia refers to as “cloud smart” – though Galabov claims it is an industry term.

In the near future at least, Galabov said he believes enterprises will rely more on IT-as-a-service platforms such as Dell Apex or HPE Greenlake for their in-house compute.

These are intended to provide customers with the benefits of public cloud services, but in their own datacenter or colocation site. The vendor provides a complete and pre-configured infrastructure and also takes care of management, with the option of pay-as-you-go financing.

“I personally don’t anticipate a mass departure from the cloud. It’s rather an optimization,” he said.

The proof of this is that enterprise spending on cloud infrastructure services is stronger than ever, up 21 percent to more than $76 billion during the first quarter of this year.

Some 72 percent of all that spending now goes to the big three cloud operators, although second tier cloud players such as Snowflake, MongoDB, and Oracle showed the strongest year-on-year growth rates in the survey referred to above.

Perhaps these challengers, along with the specialized AI providers such as CoreWeave, will gradually erode the dominance of the cloud triumvirate, or perhaps some of the antitrust scrutiny now being applied to them will open up the market to make greater competition possible. Whatever happens, cloud isn’t going away. ®

Latest article