Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 benchmark

norcalnatv@alien.top · 1 year ago

Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 benchmark

a5ehren@alien.top · 1 year ago

“NVIDIA was once again the only company to run all MLPerf tests. H100 GPUs demonstrated the fastest performance and the greatest scaling in each of the nine benchmarks.”

norcalnatv@alien.top · 1 year ago

Among many new records and milestones, one in generative AI stands out: NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes.

Flowerstar1@alien.top · 1 year ago

Wow I wonder how long that would have taken in Turing/Volta era hardware.

iDontSeedMyTorrents@alien.top · 1 year ago

What does it mean to train on such-and-such number of tokens?

CrabEqual963@alien.top · 1 year ago

How much of this performance is just the usual cheap tricks like software optimizations and task specific hardware that doesn’t generalize? If it was an honest test of pure hardware muscle the AMD MI300 would come out on top.

DuranteA@alien.top · 1 year ago

The benchmarks test workloads people are actually interested in running.

Whether good performance is achieved by good hardware, a good software stack, or both doesn’t matter to the vast majority of buyers.