Back Issues This Week → Current Issue → Popular →

All issuesVolume 339, Issue 1IT Vendor NewsNVIDIA

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

NVIDIA Technical, Thursday, June 4th, 2026

NVIDIA released Nemotron 3 Ultra, a 550B-parameter Mixture-of-Experts model optimized for efficient long-running agent workflows.

NVIDIA introduced Nemotron 3 Ultra, a 550B-parameter Mixture-of-Experts model with 55B active parameters designed to optimize long-running agent workflows through frontier reasoning and high throughput.

The model features architectural innovations including hybrid Mamba-Transformer layers for efficient long-context handling, NVFP4 quantization for cross-architecture GPU deployment delivering up to 5x higher throughput, LatentMoE for expert routing, and multi-token prediction for improved generative speed.

The model was trained using Multi-Teacher On-Policy Distillation with feedback from over ten domain-specific teacher models, supported by a transparent and expansive pretraining and reinforcement learning data pipeline. NVIDIA is releasing fully open recipes, weights, and licensing to enable broad adoption and fine-tuning for domain-specific applications.

more →  ·  More from NVIDIA →