MLPerf Coaching Outcomes Showcase Unprecedented Efficiency, Elasticity

June 12, 2024

27

MLPerf Coaching Outcomes Showcase Unprecedented Efficiency, Elasticity

The complete-stack NVIDIA accelerated computing platform has as soon as once more demonstrated distinctive efficiency within the newest MLPerf Coaching v4.0 benchmarks.

NVIDIA greater than tripled the efficiency on the big language mannequin (LLM) benchmark, based mostly on GPT-3 175B, in comparison with the record-setting NVIDIA submission made final yr. Utilizing an AI supercomputer that includes 11,616 NVIDIA H100 Tensor Core GPUs linked with NVIDIA Quantum-2 InfiniBand networking, NVIDIA achieved this exceptional feat by bigger scale — greater than triple that of the three,584 H100 GPU submission a yr in the past — and intensive full-stack engineering.

Because of the scalability of the NVIDIA AI platform, Eos can now practice large AI fashions like GPT-3 175B even quicker, and this nice AI efficiency interprets into important enterprise alternatives. For instance, in NVIDIA’s current earnings name, we described how LLM service suppliers can flip a single greenback invested into seven {dollars} in simply 4 years operating the Llama 3 70B mannequin on NVIDIA HGX H200 servers. This return assumes an LLM service supplier serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

NVIDIA H200 GPU Supercharges Generative AI and HPC

The NVIDIA H200 Tensor GPU builds upon the power of the Hopper structure, with 141GB of HBM3 reminiscence and over 40% extra reminiscence bandwidth in comparison with the H100 GPU. Pushing the boundaries of what’s potential in AI coaching, the NVIDIA H200 Tensor Core GPU prolonged the H100’s efficiency by as much as 47% in its MLPerf Coaching debut.

NVIDIA Software program Drives Unmatched Efficiency Positive factors

Moreover, our submissions utilizing a 512 H100 GPU configuration are actually as much as 27% quicker in comparison with only one yr in the past as a result of quite a few optimizations to the NVIDIA software program stack. This enchancment highlights how steady software program enhancements can considerably enhance efficiency, even with the identical {hardware}.

This work additionally delivered almost good scaling. Because the variety of GPUs elevated by 3.2x — going from 3,584 H100 GPUs final yr to 11,616 H100 GPUs with this submission — so did the delivered efficiency.

Be taught extra about these optimizations on the NVIDIA Technical Weblog.

Excelling at LLM Fantastic-Tuning

As enterprises search to customise pretrained massive language fashions, LLM fine-tuning is turning into a key business workload. MLPerf launched a brand new LLM fine-tuning benchmark this spherical, based mostly on the favored low-rank adaptation (LoRA) method utilized to Meta Llama 2 70B.

The NVIDIA platform excelled at this process, scaling from eight to 1,024 GPUs, with the largest-scale NVIDIA submission finishing the benchmark in a file 1.5 minutes.

Accelerating Secure Diffusion and GNN Coaching

NVIDIA additionally accelerated Secure Diffusion v2 coaching efficiency by as much as 80% on the similar system scales submitted final spherical. These advances mirror quite a few enhancements to the NVIDIA software program stack, showcasing how software program and {hardware} enhancements go hand-in-hand to ship top-tier efficiency.

On the brand new graph neural community (GNN) take a look at based mostly on R-GAT, the NVIDIA platform with H100 GPUs excelled at each small and huge scales. The H200 delivered a 47% enhance on single-node GNN coaching in comparison with the H100. This showcases the highly effective efficiency and excessive effectivity of NVIDIA GPUs, which make them ideally suited for a variety of AI purposes.

Broad Ecosystem Help

Reflecting the breadth of the NVIDIA AI ecosystem, 10 NVIDIA companions submitted outcomes, together with ASUS, Dell Applied sciences, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo, Oracle, Quanta Cloud Know-how, Supermicro and Sustainable Metallic Cloud. This broad participation, and their very own spectacular benchmark outcomes, underscores the widespread adoption and belief in NVIDIA’s AI platform throughout the business.

MLCommons’ ongoing work to convey benchmarking greatest practices to AI computing is important. By enabling peer-reviewed comparisons of AI and HPC platforms, and retaining tempo with the fast modifications that characterize AI computing, MLCommons supplies firms all over the place with essential information that may assist information essential buying choices.

And with the NVIDIA Blackwell platform, next-level AI efficiency on trillion-parameter generative AI fashions for each coaching and inference is coming quickly.

Previous articleAustralia’s worldwide scholar visa system is damaged however it may be fastened

Next articleFrench conservatives take away chief Eric Ciotti for backing Le Pen pact | Emmanuel Macron Information

MLPerf Coaching Outcomes Showcase Unprecedented Efficiency, Elasticity

NVIDIA H200 GPU Supercharges Generative AI and HPC

NVIDIA Software program Drives Unmatched Efficiency Positive factors

Excelling at LLM Fantastic-Tuning

Accelerating Secure Diffusion and GNN Coaching

Broad Ecosystem Help

Related Articles

Why the WNBA wants to vary its playoff format

This Amazon Sling Bag Holds So A lot for $15

Compton College students Degree Up with Esports and Extron

LEAVE A REPLY Cancel reply

Latest Articles

Why the WNBA wants to vary its playoff format

This Amazon Sling Bag Holds So A lot for $15

Compton College students Degree Up with Esports and Extron

Reside in the fantastic thing about a dream vacation spot at Riviera by Vista Manors

Biden administration designates UAE ‘main defence associate’ in uncommon transfer | Joe Biden Information