WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games
Original article: https://arxiv.org/abs/2605.17637v2
Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or intermediate traces rather than the delivered application. We introduce WebGameBenc...

This entry is part of the Top 50 AI Agent Articles curation.