AI infrastructure is rapidly transforming industries, especially in networking and data transfer.
Networking is an exciting and necessary part of building high-performance infrastructure for data transfer and machine learning training, with a shift toward on-premises infrastructure due to cost and data privacy concerns, according to Raj Yavatkar (pictured), chief technology officer of Juniper Networks Inc..
“It is an exciting place to be because everybody is building huge GPU clusters,” he said. “All of those clusters need to be fed data from storage networks. Then once you have data in, you need to have data transferred between these GPUs. So, networking is needed no matter what — very high performance, a high throughput, low latency kind of network. That’s being built. More exciting part is that a lot of the infrastructure is not just being built in hyperscalers, it’s also being built by enterprises on-prem, which is a big shift in the market.”
Yavatkar spoke with theCUBE Research’s John Furrier at the AI Infrastructure Silicon Valley – Executive Series event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the integration of AI into networking, the shift toward AI-native infrastructure and the challenges of scaling networks to support large GPU clusters in enterprise environments.
AI infrastructure drives the shift to on-premises solutions
AI is being used to automate network configurations based on workload, with the Ultra Ethernet Consortium working to evolve the Ethernet standard to match training and inferencing requirements, according to Yavatkar.
“What’s happening with GPU-based clusters, these things are coming together. The standard Ethernet, we are being able to show it’s not just good enough. It performs as well or better than InfiniBand-based networks,” he said. “That is being recognized by industry now by creating this Ultra Ethernet Consortium, which is a consortium of all the vendors trying to evolve Ethernet standard to add some new capabilities to match the training workloads and inferencing workloads.”
Networking has evolved to intelligence-based, integrating telemetry and machine learning for application-aware assurance, automating root cause diagnostics and potentially remediation, Yavatkar added.
“You start collecting telemetry from application, from compute, GPU, networking, operating systems, and you start correlating using machine learning model,” he said. “If you do that, now you can start finding out where the problem is because you can find the anomalies, you find the switch buffers are running out of space, you find the packet loss has increased, latency has spiked. Based on that, you can point out where the problem is, and then you can do automated root cause diagnostics.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the AI Infrastructure Silicon Valley – Executive Series event:
Photo: SiliconANGLE
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU