DiffusionGemma: 4x Faster Text Generation
Google - The Keyword, Wednesday, June 10th, 2026
Google unveils DiffusionGemma, an experimental open model that generates text in parallel blocks for up to 4x faster inference.
DiffusionGemma is a 26B Mixture-of-Experts open model that generates entire blocks of text simultaneously rather than token-by-token like traditional LLMs.
Built on Gemma 4's architecture combined with diffusion research, it delivers up to 4x faster token output on dedicated GPUs while activating only 3.8B parameters at inference. It uses bi-directional attention across 256-token blocks, making it well suited to speed-critical tasks like in-line code editing and non-linear generation. However, it produces lower output quality than standard Gemma 4, so it is better for interactive local workflows than high-quality production systems.