SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
Original article: https://arxiv.org/abs/2504.08703v3
Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains chal...

This entry is part of the Top 50 AI Agent Articles curation.