AI

DPUs take the pressure off purpose-focused processes

By Gary Hilson Aug 14, 2024 10:00am

Data processing units (DPUs) are having a moment.

As artificial intelligence (AI), high performance computing (HPC), increasingly complex cloud architectures and security demands put more pressure on central processing units (CPUs), and graphics processing units (GPUs), DPUs have stepped up their game.

In baseball parlance, DPUs can be seen as a “utility player,” in that they can be deployed in different ways to support workload-focused players – the CPUs and GPUs. DPUs can trace their history to the Smart network interface card (SmartNIC), which were initially designed to offload some network functions from the CPU to improve network traffic flow.

But traditional SmartNICs aren’t smart enough proceed the high volumes and different patterns of data that DPUs can. The democratization of AI through ChatGPT has put a lot of pressure on the backend as frontend users began consuming it faster – once it hit a tipping point, everyone was using it. Not all DPUs are created equal however, and AI is just one use case that’s driving their adoption.

Security is a job of its own

One position DPUs can play is security. In September 2022, Nvidia introduced a DPU designed to implement zero-trust security distributed computing environments, both in the data center and at the edge. Nvidia’s BlueField-2 DPUs were specifically tailored to be used Dell PowerEdge systems with the aim of improving the performance of virtualized workloads based on VMware vSphere 8.

The concept of a DPU goes back nearly a decade, Kevin Deierling, senior VP of networking at Nvidia, told Fierce Electronics in an interview, with Nvidia beginning its early work in 2026. Security and networking functions like those taken on by the company’s BlueField-2 DPU help to take care of essential cloud operations capabilities while freeing up the CPU to run customer workloads.

Nvidia’s Bluefield-3, meanwhile, is the equivalent of 300 CPU cores, Deierling said, and it employs purpose-built accelerators to handle storage, security, networking and steering traffic. “All of that is done in hardware.”

DPUs are used to offload, accelerate and isolate, Deierling said. Isolation means 100% of the CPU cores are devoted to the customer’s workload, and they don’t need to worry about malevolent software or bad actors that could impact them, or other customers hosted in the shared data center infrastructure. “Isolation means that the cloud service provider policies are running on the DPU, not in the host CPU that's running in the application domain.”

Deierling said the even if an application is compromised and a rogue actor is running in that data center, it can’t propagate east-west through the cloud and impact other vendors.

AI is upping the ante for DPUs, he said, and Nvidia started looking at the requirements for it along with generative large language models (LLMs) and deep learning models four years ago to see how best to do collective offloads.

DPUs enable proliferating AI to scale

The next generation of AI is AI agents talking to other AI agents, Deierling said, and DPUs play a key role in automation and scalability while humans continue to increasingly interact with AI. “All of that needs to be orchestrated.” He said a DPU acts as a traffic cop as all these different AI elements spin up and autonomously communicate with each other, and then drop down when they’re not needed. “It needs to be secure.”

Deierling said DPUs were purpose-built initially for networking, storage, and security. “Those are all necessary but not sufficient for AI workloads.”

Nvidia is building out a microservices platform, NIM, that leverage DPUs to orchestrate AI interactions, providing containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models. “We have all of these different NIMS that are talking to each other,” Deierling said. “We might have dozens of different AIs.”

He said relying on CPUs will not get the desired performance per watt, which is a metric for data center performance. “If you rely on the application processor to do all that orchestration of all the different AIs, you get dramatically worse efficiency in terms of performance per dollar and performance per watt, so that's really where we see the DPU playing a super strong role today.”

Deierling said DPUs will eventually be able to understand the workloads that are running and optimize accordingly as they communicate with other DPUs and GPUs.

Hyperscalers need DPU help, too

The utility of DPUs in modern AI data centers has meant that other players beyond semiconductor vendors are seeing the value of owning their own DPU technology. Microsoft acquired Fungible to integrate its DPU technology into its data center infrastructure – it was already using DPUs in Azure, and there are plenty of options. Aside from Nvidia, Intel, Broadcom and AMD all have DPU offerings. Amazon Web Services, meanwhile, has its own DPUs, dubbed “Nitro.”

AMD bulked up its DPU capabilities in 2022 with the acquisition of Pensando and its distributed services platform, which had a footing in Azure, IBM Cloud and Oracle Cloud.

Eddie Tan, senior director of the networking technologies and solutions group at AMD, told Fierce Electronics in an interview that the Pensando acquisition means AMD finds its DPU adopted in several hyperscale cloud providers. It has also been incorporated into smart switches from Aruba and Cisco, he said.

Taking on front-end networking functions to securely connect to the GPU continues to be a key role for DPUs, Tan said, as well as enabling secure isolation in multi-tenant hyperscale cloud data centers. The DPU can also accelerate storage access and speed up data ingestion – the increased volumes of training data are driving speed and performance requirements, and hence more offloading to the DPU, he said.

Programmable DPUs also facilitate back-end networking where security is paramount. “There's a lot of sensitive data being moved between the GPUs,” Tan said.

Networking is today’s bottleneck

In the meantime, there are protocols that make GPU communications more efficient as well as efforts to modernize Ethernet to better serve the GPU, he said.

With the explosive growth of generative AI, networking is the currently bottleneck as the number of GPUs in data centers grow, which is why DPUs are essentially for managing data traffic and taking on functions so the GPUs can focus on application workloads.

Shane Corban, senior director, product management of AMD’s networking technologies and solutions group, said cloud providers have been using SmartNICs and router infrastructure for some time now. What AMD has done with its DPU is engineer a domain specific processor to offload and accelerate multiple functions at once, he said, and cloud providers now can have dedicated DPUs in different form factors for specific functions like software-defined networking, security and storage. “We've built a very unique architecture that's fully programmable from the ground up.”

Corban said meeting compliance challenges in the enterprise data center are becoming more challenging, so AMD’s platform developed with Aruba allows for extremely granular east-west segmentation and stateful firewalling on every single port rather than through agents within the server’s CPU.

Networking and security functions like firewalling and encryption, encryption, and connection tracking can all be done inline at line rate, he said, and having a combination of different security functions makes it easier for customers address compliance requirements.

Tan said AMD’s DPU efforts will continue to focus on better serving AI workloads while driving open standards, including the Ultra Accelerator Link, an interconnect for AI accelerators it is developing along with AMD, Broadcom, Intel, Google, and Microsoft, among others.

Leveraging the broader Ethernet ecosystem will continue to be essential for scaling out AI and enabling DPUs to intelligently route traffic and handle functions offloaded from CPUs and GPUs, Tand said. “Ethernet always wins.”

DPU Nvidia AMD Ethernet Electronics AI