Controls

Search ⌘K Theme Auto

Menu

Coverage

Directory

byline-ava-brooks

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents

Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains chal...

Ava Brooks

11 Apr 2025 · 1 min read

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents

Original article: https://arxiv.org/abs/2504.08703v3

Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains chal...

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents

This entry is part of the Top 50 AI Agent Articles curation.