NVIDIA Blackwell Ultra Sets New Inference Records In MLPerf Debut
NVIDIA, Tuesday, September 9th, 2025
As large language models (LLMs) grow larger, they get smarter, with open models from leading developers now featuring hundreds of billions of parameters.
At the same time, today's leading models are also capable of reasoning, which means that they generate many intermediate reasoning tokens before delivering a final response to the user. The combination of these two trends-larger models that think using more tokens-drives the need for significantly higher compute performance.
Delivering the highest performance on production workloads takes a state-of-the-art technology stack-spanning chips, systems, and software-and an expansive developer ecosystem that is constantly building on that stack.