AWS AI Agent for Software Development Takes on More Complex Tasks
DevOps.com, Wednesday, September 25th, 2024
Amazon Web Services (AWS) has released an update to its Amazon Q Developer agent for software development that benchmark tests show can resolve 51% more tasks.
Using a benchmark, dubbed SWE-bench, created by OpenAI that evaluates the ability of an artificial intelligence (AI) platform to resolve software development issues that a Python developer might encounter, the Amazon Q Developer agent score has increased since first being made available from 25.6% tasks resolved to 38.8% on the verified dataset and from 13.82% to 19.75% on the full SWE-bench dataset.
Neha Goswami, director of engineering for Amazon Q Developer, said those results show that over time AI agents such as Amazon Q Developer continue to evolve in ways that will, for example, take advantage of advances in reasoning capabilities enabled by large language models (LLMs) to resolve increasingly more complex tasks.