meddler meddler
  • Home
  • About
  • AI Agents
  • Coding Agents
  • Reading List
  • Quick Search ⌘K
  • More
    Benchmarks Security Tutorials Lifecycle Topics Authors Contact
Controls
Search ⌘K Theme Auto
Menu
  • Home
  • About
  • Contact
Coverage
  • AI Agents
  • Coding Agents
  • Reading List
  • Benchmarks
  • Security
  • Tutorials
Directory
  • Topics
  • Authors
  • Privacy
  • Terms

meddler.tech

Hi I'm meddler.tech
51 posts
ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation ai-agents-2-2

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

Evaluation of repository-aware software engineering systems is often confounded by synthetic task design, prompt leakage, and temporal contamination between repository knowledge and future code changes. We present a time...

  • Go to the profile of  meddler.tech
Liam Carter
27 Mar 2026 · 1 min read
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration? ai-agents-2-2

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

As large language models (LLMs) advance their mathematical capabilities toward the IMO and research level, the scarcity of challenging, high-quality problems has become a significant bottleneck for training, evaluation a...

  • Go to the profile of  meddler.tech
Ava Brooks
3 Mar 2026 · 1 min read
Your Code Agent Can Grow Alongside You with Structured Memory ai-agents-2-2

Your Code Agent Can Grow Alongside You with Structured Memory

While "Intent-oriented programming" (or "Vibe Coding") redefines software engineering, existing code agents remain tethered to static code snapshots. Consequently, they struggle to model the critical information embedded...

  • Go to the profile of  meddler.tech
Owen Blake
25 Feb 2026 · 1 min read
Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs? ai-agents-2-2

Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs?

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and invest...

  • Go to the profile of  meddler.tech
Nina Reed
20 Feb 2026 · 1 min read
Beyond Quantity: Trajectory Diversity Scaling for Code Agents ai-agents-2-2

Beyond Quantity: Trajectory Diversity Scaling for Code Agents

As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of qua...

  • Go to the profile of  meddler.tech
Leo Parker
3 Feb 2026 · 1 min read
BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization ai-agents-2-2

BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

Large language models (LLMs) have shown strong reasoning and coding capabilities, yet they struggle to generalize to real-world software engineering (SWE) problems that are long-horizon and out of distribution. Existing...

  • Go to the profile of  meddler.tech
Aria Patel
29 Dec 2025 · 1 min read
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents ai-agents-2-2

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software system...

  • Go to the profile of  meddler.tech
Zoe Walker
14 Dec 2025 · 1 min read
Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases ai-agents-2-2

Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Real-world software engineering tasks require coding agents that can operate on massive repositories, sustain long-horizon sessions, and reliably coordinate complex toolchains at test time. Existing research-grade coding...

  • Go to the profile of  meddler.tech
Ethan Shaw
11 Dec 2025 · 1 min read
Agint: Agentic Graph Compilation for Software Engineering Agents ai-agents-2-2

Agint: Agentic Graph Compilation for Software Engineering Agents

LLM-based coding agents are increasingly common but still face challenges in context management, latency, reliability, reproducibility, and scalability. We present Agint, an agentic graph compiler, interpreter, and runti...

  • Go to the profile of  meddler.tech
Maya Collins
24 Nov 2025 · 1 min read
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? ai-agents-2-2

Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

Large Language Models (LLMs) are reshaping almost all industries, including software engineering. In recent years, a number of LLM agents have been proposed to solve real-world software problems. Such software agents are...

  • Go to the profile of  meddler.tech
Noah Bennett
17 Nov 2025 · 1 min read
The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents ai-agents-2-2

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

Agents are now used widely in the process of software development, but building production-ready software engineering agents is a complex task. Deploying software agents effectively requires flexibility in implementation...

  • Go to the profile of  meddler.tech
Liam Carter
5 Nov 2025 · 1 min read
A Comprehensive Empirical Evaluation of Agent Frameworks on Code-centric Software Engineering Tasks ai-agents-2-2

A Comprehensive Empirical Evaluation of Agent Frameworks on Code-centric Software Engineering Tasks

Unlike traditional automation tools or static LLM-based systems, agents combine decision-making and tool utilization to accomplish complex tasks, showing great potential in software engineering. However, existing studies...

  • Go to the profile of  meddler.tech
Ava Brooks
2 Nov 2025 · 1 min read
TOM-SWE: User Mental Modeling For Software Engineering Agents ai-agents-2-2

TOM-SWE: User Mental Modeling For Software Engineering Agents

Recent advances in coding agents have made them capable of planning, editing, running, and testing complex code bases. Despite their growing ability in coding tasks, these systems still struggle to infer and track user i...

  • Go to the profile of  meddler.tech
Owen Blake
24 Oct 2025 · 1 min read

TEST

TEtt

  • Go to the profile of  meddler.tech
meddler.tech
4 Oct 2025 · 1 min read
RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents ai-agents-2-2

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While...

  • Go to the profile of  meddler.tech
Nina Reed
2 Oct 2025 · 1 min read
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture ai-agents-2-2

TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture

While integrating tools like Code Interpreter and Search has significantly enhanced Large Language Model (LLM) reasoning in models like ChatGPT Agent and Gemini-Pro, practical guidance on optimal tool use is lacking. The...

  • Go to the profile of  meddler.tech
Leo Parker
30 Sep 2025 · 1 min read
PerfBench: Can Agents Resolve Real-World Performance Bugs? ai-agents-2-2

PerfBench: Can Agents Resolve Real-World Performance Bugs?

Performance bugs are inefficiencies in software that waste computational resources without causing functional failures, making them particularly challenging to detect and fix. While recent advances in Software Engineerin...

  • Go to the profile of  meddler.tech
Aria Patel
28 Sep 2025 · 1 min read
Context Engineering for Multi-Agent LLM Code Assistants Using Elicit, NotebookLM, ChatGPT, and Claude Code ai-agents-2-2

Context Engineering for Multi-Agent LLM Code Assistants Using Elicit, NotebookLM, ChatGPT, and Claude Code

Large Language Models (LLMs) have shown promise in automating code generation and software engineering tasks, yet they often struggle with complex, multi-file projects due to context limitations and knowledge gaps. We pr...

  • Go to the profile of  meddler.tech
Zoe Walker
9 Aug 2025 · 1 min read
meddler meddler

meddler

Explore

  • AI Agents
  • Coding Agents
  • Reading List
  • Topics

Company

  • About
  • Authors
  • Contact
  • Podcast

Legal

  • Privacy Policy
  • Terms of Use
  • Cookie Policy
  • Editorial Policy
© 2026 meddler. All rights reserved.
RSS Sitemap Support