ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

Evaluation of repository-aware software engineering systems is often confounded by synthetic task design, prompt leakage, and temporal contamination between repository knowledge and future code changes. We present a time...

Liam Carter
· 1 min read
ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

Original article: https://arxiv.org/abs/2603.26137v1

Evaluation of repository-aware software engineering systems is often confounded by synthetic task design, prompt leakage, and temporal contamination between repository knowledge and future code changes. We present a time...

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

This entry is part of the Top 50 AI Agent Articles curation.