Can an AI teammate deliver a virtual high-five? Many organizations are about to find out as they explore the creation of AI agents to be integrated with human workforces, with goals such as improving customer service, security, organizational productivity, corporate decision-making, and more.
For a while, the notion of an “AI agent” seemed almost synonymous with “chatbot,” but if agentic AI were merely about building a better chatbot there would not be too much to get excited about. The reason why companies across many industries are exploring the development of AI agents is that at their most promising, they could represent the beginning of a digital workforce, trained on corporate data and with reasoning capabilities to make very quick, well-informed decisions to support and enhance the productivity and effectiveness of their human counterparts within an organization.
Realizing that vision will not be easy, because organizations have yet to develop any kind of infrastructure to create and operate connected, secure AI agents. Nvidia is looking to help them overcome this hurdle with its NeMo microservices, an end-to-end platform of developer tools designed to help get agentic AI workforces up and running, and continuing tto toperae effectively.
The company this week announced general availability for its NeMo microservices. Part of the company’s existing AI Enterprise software platform, these tools put developers on a path to building what Nvidia described as “AI teammates that tap into data flywheels” informed by inference and business data, as well as user preferences, and that can continuously optimize their performance.
The NeMo microservices might be easy to confuse with Nvidia’s existing NIM microservices for inference optimization, but Joey Conway, senior director of generative AI software at Nvidia, explained in a briefing, “The way we think about it is NIM today is focused on inference deployment, meaning running the model so that responses come back out. The NeMo microservices are focused on how to improve that model. So the NeMo microservices do things like data preparation, training, techniques, evaluation, and when it is finished, that model gets deployed back in.”
One of the major benefits of using microservices is that “it allows us to have an easy package and deploy and simplify the developer experience in that they have a few APIs to interact with, instead of thousands of lines of the Python code,” Conway said.
But with NeMo microservices, the job is not over after initial deployment, as they are designed to help developers continue to optimize AI agents as they receive a constant stream of inputs from databases, user interactions or real-world signals that could otherwise weaken their reliability. Conway said AI agents in production mode require three types of data: inference data to gather insights and adapt to evolving data patterns, up-to-date business data to provide intelligence, and user feedback data to advise if the model and application are performing as expected. The NeMo microservices help developers tap into these three data types.
The new microservices include:
● NeMo Customizer, which accelerates LLM fine-tuning, delivering up to 1.8x higher training throughput. This high-performance, scalable microservice uses popular post-training techniques including supervised fine-tuning and low-rank adaptation.
● NeMo Evaluator, which simplifies the evaluation of AI models and workflows on custom and industry benchmarks with just five application programming interface calls.
● NeMo Guardrails, which improves compliance protection by up to 1.4x with only half a second of additional latency, helping organizations implement robust safety and security measures that align with organizational policies and guidelines.
Conway described how some very large companies already are using NeMo microservices to make a difference in their AI agent deployments. For example, AT&T, in collaboration with Arize and Quantiphi, built an AI agent to process a knowledge base of nearly 10,000 documents that is refreshed weekly. The scalable, high-performance AI agent is fine-tuned for three key business priorities: speed, cost efficiency and accuracy, which are all increasingly critical as adoption scales.
Using NeMo Customizer and NeMo Evaluator, AT&T was able to improve the accuracy of its AI agent by up to 40% by using the microservices to fine-tune a Mistral 7B model to help deliver personalized services, prevent fraud and optimize network performance, Conway said, adding that Meta and Cisco Systems are among other large companies the microservices to boost the value their new AI teammates can provide.