AI

Circuit makers rush to boost data center efficiency as energy demands soar

Handling an enormous surge in data created by GenAI and other AI has become a priority for  the electronics field--from silicon makers to data center integrated circuit (IC) suppliers.  Nvidia and others advertise greater energy efficiency with the latest, more powerful chips like Blackwell, while electric utilities are stuck with finding ways to produce more power to run such chips.

The demands on data centers are growing so tremendously that it’s even hard to find available land and sufficient water—although water can be recycled at tremendous cost.  Data centers globally take up about 2% of electricity demand today and will grow to 7% in 2030, a widely-advertised number that corresponds to the energy consumption of India, according to the International Energy Agency.

 In Northern Virginia in the US, just west of Washington, D.C, the world’s largest concentration of data centers in Loudon County and nearby areas consumes as much as 20% of the state’s electricity supply, according to Dominion Energy and others. Elsewhere, in Ireland and Singapore, no more data centers can be built due to scarcity of resources.

“The fear of breaking the grid is real,” said Athar Zaidi, senior vice president and business line head of power integrated circuit and connectivity systems at Infineon Technologies in comments with global press on Tuesday. “It’s a very challenging problem but it’s good to have everybody…Everybody has to solve the energy problem we have in front of us.”

Aside from finding more electricity for data centers, a major industry focus is finding ways to make data center server racks more efficient in the way circuits at the physical level handle interconnects between GPUs and CPUs and how air and water cooling systems are managed using electronics hardware and software to keep hot chips cool. Almost daily, another new technology approach is being introduced.

In one example on June 4, Cisco and Nvidia announced Cisco Nexus HyperFabric AI cluster, which combines Cisco networking and Nvidia accelerated computing and AI software with a robust VAST data store. The two companies argue the approach provides data centers with an AI platform and control plane that simplifies deployment of accelerated computing, networking and software needed for AI. 

Cisco’s Ethernet switching expertise gets combined with Nvidia’s accelerated computing and AI Enterprise software. A Cisco Optics family of QSFP-DD modules (network optical transceivers) is designed to support high density loads, while Nvidia Tensor Core GPUs will be part of the package, starting with the Nvidia H200 NVL, among other hardware components.  Ultimately, better cloud management is the objective of the Cisco-Nvidia offering, which would seem to please the IT goal for better TCO, which also, inevitably, gets interwoven with greater energy efficiency in any given data operation.

A much different approach to reducing energy and improving data center efficiency is being pursued by other companies, including  Infineon. The company is the largest semiconductor maker in Germany  with 58,000 employees and is a big supplier of Power Supply Units that work inside data center racks, often using a small circuit board. A PSU primarily converts AC power to DC power (also seen in a personal computer) but it also fluctuates voltage and power levels to meet the needs of different hardware components. There also may be multiple power sockets for use by multiple hardware components.

Infineon announced new PSUs in May that rely on circuits made of silicon, and newer materials-- silicon carbide and gallium nitride-- to improve energy efficiency.  All three materials are being combined into a single module for top performance. The company currently supports at 3.3 kilowatt PSUs and will move to an 8kW unit in Q1 2025 followed by a 12kW PSU sometime afterwards, pfficials told Fierce on May 24. 

 

Infineon noted that top GPUs today require up to 1 kW per chip and will reach 2kW iby 2030.

 Each B200 unit from Nvidia’s Blackwell platform just announced will suck up 1,200 watts of power, up from Nvidia H100’s 700 watts.   The new Infineon PSUs are designed to be 97.5 percent efficient and the 8kW PSU supports AI racks of up to 300 kW.

In addition to the combination of Si, SiC and GaN  semiconductor materials, Infineon described for reporters on Tuesday how its 48V architecture and vertical power delivery design can help data centers increase power density via smaller-sized boards to allow a further increase in compute power (to presumably accommodate Blackwell or others).

Infineon contends its vertical power delivery design (as compared to traditional lateral power design)  can reduce the power lost in an average data center of about 100,000 CPU nodes by more than 7mW. And that improvement would  in turn result in a greater than 12% improvement in TCO when compared to lateral power delivery networks, according to Infineon’s Zaidi. 

In that approach, as shown in the diagram below, the conventional lateral power design puts the converter and power stage components as well as the AI-related GPUs and CPUs on top of a Printed Computer Board. With vertical power, the power stage components are moved to just below the PCG, where they tunnel through the PCB and connect to the AI GPUs and CPUs on top of the PCB. The vertical power loss is far less at that short distance than the lateral one, plus the PCB board is smaller, to take up less space overall in a rack.

chart showing lateral or vertical interconnect

Zaidi said large-capacity GPUs from Nvidia, including the Grace Blackwell are being produced in trays used in data center racks where a GPU and CPU can be connected with high speed Inifiniband and high speed Ethernet. “It’s one big giant GPU. Big changes are coming at the rack level.”

He also said power delivery components from Infineon and others will be changing with the way data centers are cooled, as they advance from air cooling to liquid cooling and even immersion systems. “Many changes are coming now,” Zaidi said.

For data center operators power efficiency will become a key management concern in coming years, he added. It typically takes three years to build a medium-sized data center, with the time heavily involved in clearing  permits for water and power. Once a data center is permitted to operate, “the power limit is fixed” and it becomes a question of packing in more compute in that data center power envelope. “It you don’’t have a superefficient system, for the same power, it’s less efficient, which affects ROI,” he said.