Glean Waldo: An Agentic Search Model that Pairs with LLMs to Deliver Frontier Intelligence with ~50% Lower Latency and ~25% Fewer Tokens
Glean, April 28,2026
Glean introduces Waldo, a specialized agentic search model that reduces latency by 50% and token usage by 25% when paired with frontier LLMs.
Glean has unveiled Waldo, an agentic search model designed to handle information-gathering tasks before frontier LLMs engage, separating search planning from deep reasoning.
Built on NVIDIA Nemotron 3 Nano and trained using DPO and reinforcement learning, Waldo achieves approximately 50% lower latency and 25% fewer tokens compared to using frontier models alone while maintaining quality.
The model intelligently decides how to break down queries, which tools to use, and when to hand off to a frontier model for synthesis and generation. Glean argues that by isolating well-defined, high-volume search tasks into a specialized model, enterprise AI systems can operate more efficiently while reserving expensive frontier models for complex reasoning and response generation.