Controls

Search ⌘K Theme Auto

Menu

Coverage

Directory

byline-zoe-walker

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software system...

Zoe Walker

14 Dec 2025 · 1 min read

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Original article: https://arxiv.org/abs/2512.12730v2

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software system...

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

This entry is part of the Top 50 AI Agent Articles curation.