AI

Latest benchmark helps bolster Nvidia readiness for coming AI factory

By Matt Hamblen Apr 2, 2025 11:00am

*Updated with Intel MLPerf benchmark results.

Nvidia is doubling down on its message of the AI factory of the future, where reasoning models like open source DeepSeek R1 will require massive accelerated computing at scale to drive reduced costs for cloud performance.

The company has been on a track for at least a year of explaining to investors and buyers how big compute investments for training were needed but will still be needed for inference work where reasoning occurs to solve more practical problems needed by organizations.

Intel has made the case that inference work is where its CPUs can perform well, without necessarily attacking Nvidia’s GPU approach with Hopper and Blackwell products. Intel far trails Nvidia and even AMD in the accelerator competition. *In a statement Tuesday, Intel said its Xeon 6 with Performance cores CPUs achieved an average 1.9x performance increase over 5th Gen Xeons in the latest MLPerf benchmarks. It was the only server CPU on MLPerf and the company said the results demonstrated it is "ideal...for AI workloads, offering a balance of performance and energy efficiency."

CEO Jensen Huang described the AI factory concept at the GTC conference March 17-19, but since that time investors have not been overly impressed. Nvidia has been battered by a dismal widespread Nasdaq market performance on varying announcements about tariffs by the Trump administration. The company has seen a 4% decline in shares since GTC, and more than a 6% decline over the past six months. (Still, the company stock is up 21% over the past year.)

In explaining the AI factory concept, Nvidia is able to brag that it outperformed all competitors on 11 MLPerf Inference V5.0 benchmarks released Tuesday. Nvidia often far exceeds the field, so arguably its achievements in the latest benchmarks are nothing new.

More specifically, Nvidia saw performance on the Hopper architecture, first introduced three years ago, increase by 60% over last year while Blackwell on a GB200 NV Link 72 rack-scale server configuration tripled on interactive performance, a key indicator for reasoning work.

“Hopper still has head room and is widely available on every server maker,” said Dave Salvator, Nvidia’s director of accelerated computing products, in a briefing with analysts and reporters.

“As we move into agentic AI you’ll see Blackwell performance move ahead and ahead of, really, the rest of the market.” In the benchmark, he noted that Nvidia had 15 submissions by server makers with very strong results.

Pertaining to the AI factory and the massive scaling it will require, Nvidia unveiled Dynamo at GTC, an open-source AI inference server as a successor to Nvidia Triton. Dynamo is open source and is designed to support scaling of inference workloads across GPU fleets with variables such as intelligent resource scheduling and request routing, optimized memory management and seamless data transfer. It supports all major AI inference backends.

Previously, Nvidia disclosed that Dynamo increased throughput on the open-source DeepSeek-R1 671B reasoning model on GB200 NVL72 by up to 30 times the number of tokens per second per GPU, while Hopper on Llama 2 70B increased throughput by 2x. “Dynamo is the ideal solution for developers looking to accelerate and scale generative AI models with the highest efficiency at the lowest cost,” Nvidia posted recently on its developer pages.

In his briefing, Salvator added: “It takes a full stack platform CPU and networking and storage and a full component of software and Dynamo to deliver on the promise of physical AI. And when we think about design at Nvidia, we are thinking design at data center scale…a complete full-stack solution.”

The Blackwell architecture with Dynamo is more efficient on DeepSeek R1 and can contribute to performance to provide a 40x increase in “revenue opportunity or AI out for an organization delivering leading edge AI,” Salvator added. For a cloud provider, this could conceivably result in rents at a much higher revenue potential to the cloud provider to enable greater ability to enterprises working to expand into an agentic or physical AI work. “Certainly, with agentic AI, there are lots of agents coming to market and we’re really just getting started with agentic.”

Dynamo was not a part of the MLPerf results, Salvator confirmed, where results were slightly different than earlier Dynamo-included findings.

Salvator was asked to assess DeepSeek’s impact on AI to which he responded: “What DeepSeek did is changing the game and showed the world what’s possible. Some incorrectly surmised that you wouldn’t really need as much infrastructure but that’s not really an understanding of what DeepSeek did.” He explained, instead, that DeepSeek’s approach is a different in its algorithmic approach to AI. “When algorithmic innovations come, they enable the next round of AI and in this case, its agentic AI.”

Whether Salvator’s explanation will appeal to any investor questioning Nvidia’s future for buying up its stock is unclear, but it underlies the concerns Nvidia apparently feels from some analysts have raised about never-ending profits for Nvidia in the AI space.

MLPerf Nvidia Dynamo GPU Electronics AI