We've Been Measuring AI Wrong: Why Economically Valuable Work Is the New Benchmark
The New Stack, Thursday, June 18th, 2026
Article argues traditional AI benchmarks like Agent's Last Exam miss the mark and economically valuable work is the better measure.
The New Stack examines the limits of academic AI benchmarks such as Agent's Last Exam, arguing that exam-style tests no longer capture what matters as models saturate them.
It proposes that the real measure of AI progress is the ability to perform economically valuable work in real-world settings.
The piece frames benchmarking as shifting from puzzle-solving toward practical, monetizable task completion by AI agents. This reframing has implications for how enterprises evaluate AI tools and ROI.