Sunday, December 22, 2024

Cisco launches integrated AI infrastructure ‘Pods’ | TechTarget

Must read

Cisco has packaged its AI infrastructure for enterprises looking to purchase a certified technology stack for AI inference, which is the process of fine-tuning models for specific tasks.

Cisco introduced the AI packages, called Pods, on Tuesday at its Partner Summit in Los Angeles. The company, which sells most of its products through channel partners, will begin taking orders for Pods in November.

Four Pod configurations, varying by the number of CPUs and Nvidia H100 and H200 Tensor Core GPUs, will be available. Each configuration will be offered within a Cisco UCS X-Series Modular System, which is managed and monitored through Cisco’s cloud-based Intersight software.

Many organizations use cloud providers, such as AWS, Google and Microsoft, to run AI applications. Pods are tailored AI infrastructure for organizations that want to keep data on-premises for security or compliance reasons, said Michael Leone, an analyst with TechTarget’s Enterprise Strategy Group. 

Cisco’s strategy of providing pre-integrated stacks for AI applications is similar to competitors Dell, Hewlett Packard Enterprise and Lenovo.

“Everybody is trying to put together their validated stack, their pre-integrated stack, to address inferencing,” Leone said. “This is the opportunity for all traditional infrastructure vendors.”

AI model inference typically includes a technique called retrieval-augmented generation, or RAG. RAG lets enterprises use their private data to tune models for specific tasks, such as fraud detection, text summarization, personal digital assistants, and medical image analysis.

Cisco’s X-Series Modular system for Pods uses a 7RU UCS X9508 chassis. UCS, or Unified Computing System, is an integrated data center or edge platform that includes computing, networking, management, virtualization and storage.

Each UCS X9508 chassis can house up to eight UCS X-Series M7 computing servers and four X440p PCIe nodes that support up to 16 Nvidia GPUs. Other chassis components include:

  • The UCS 9108 Intelligent Fabric Module for up to 100 Gbps of connectivity per computing node, with eight uplink ports of either 25 Gbps SFP28 or 100 Gbps QSFP28 connections;
  • The 1RU UCS 6536 10/25/40/100 Gigabit Ethernet, Fibre Channel over Ethernet and Fibre Channel switch, with 7.42 Tbps throughput with 36 ports. Another option is the UCS Fabric Interconnect 9108 100G Intelligent Fabric Module for enterprises using UCS blade servers within the X9508 chassis;
  • The UCS X9416 X-Fabric Module plug-in for the X9508 chassis. The module provides direct PCIe connections from each computing node to the GPUs.

Pods include Intersight and a subscription to the Nvidia AI Enterprise (NVAIE) software suite and the Nvidia HPC-X software toolkit. NVAIE provides tools and frameworks to finetune the included pre-trained models for specific tasks. The Nvidia HPC-X toolkit offers the technology needed to optimize high-performance computing applications.

Other Pod components include licensing for the Red Hat OpenShift platform to develop and deploy AI applications across hybrid cloud environments. Optional pieces include storage from NetApp or Pure Storage. Both offer toolkits for helping developers and data scientists perform data management tasks.

New UCS server

In addition to Pods, Cisco launched its first UCS server dedicated to running AI workloads on GPUs. The UCS C885A M8 can run applications for up to eight Nvidia GPUs.

Companies can order Cisco UCS C885A M8 now for shipping by the end of the year.

Most enterprises are moving toward deploying generative AI. A recent Enterprise Strategy Group global survey of 832 organizations found that 30% are in mature or early production of GenAI applications and 33% are in the pilot or proof-of-concept stage.

Those organizations will have to upgrade servers, storage and networking to support GenAI applications, Leone said. “It’s really hard to integrate existing infrastructure components to deliver something that performs optimally for AI workloads.”

Antone Gonsalves is an editor at large for TechTarget Editorial, reporting on industry trends critical to enterprise tech buyers. He has worked in tech journalism for 25 years and is based in San Francisco. Have a news tip? Please drop him an email.

Latest article