AI

Nvidia eyes Llama 3.1 to help enterprises build AI 'supermodels'

Lots of people would like to spend more time with supermodels, and now they can. Nvidia announced this week that Nvidia AI Foundry users can leverage Meta’s newly-available Llama 3.1 405B large language model and their own data, along with Nvidia Nemotron models, to build their very own AI “supermodels.”

(Yes, supermodels to support production-level generative AI applications. What other kind of supermodels did you think we were talking about?)

Nvidia said Accenture will be the first corporate giant to use this service as it looks to employ Llama 3.1 and its customers’ own data to create customized models for them through its AI Refinery framework.

Kari Ann Briski, vice president of Generative AI software product management at Nvidia, said during a press briefing that Nvidia is aiming to make it easier for enterprises to advance their AI visions and plans into a production phase, which continues to be a challenge for much of the enterprise market where AI is concerned. The new customized models that they create can be run in production with the help of Nvidia’s NIM inference microservices, she added.

The Nvidia announcement came the same day that Meta announced open source availability of Llama 3.1 405B, a large collection of multi-lingual and generative AI models which Nvidia said was trained on more than 16,000 H100 Tensor Core GPUs.

In a blog post, Meta stated, “Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. With the release of the 405B model, we’re poised to supercharge innovation—with unprecedented opportunities for growth and exploration. We believe the latest generation of Llama will ignite new applications and modeling paradigms, including synthetic data generation to enable the improvement and training of smaller models, as well as model distillation—a capability that has never been achieved at this scale in open source.”

AMD also came out with its own blog post stating that its platforms have memory capability that allows them to be “ready to run” Llama 3.1. “We have confirmed that a server powered by eight AMD Instinct MI300X accelerators can fit the entire Llama 3.1 405B parameter model using the FP16 datatype. This means organizations can benefit from significant cost savings, simplified infrastructure management, and enhanced performance efficiency.”