meddler meddler
  • Home
  • About
  • AI Agents
  • Coding Agents
  • Reading List
  • Quick Search ⌘K
  • More
    Benchmarks Security Tutorials Lifecycle Topics Authors Contact
Controls
Search ⌘K Theme Auto
Menu
  • Home
  • About
  • Contact
Coverage
  • AI Agents
  • Coding Agents
  • Reading List
  • Benchmarks
  • Security
  • Tutorials
Directory
  • Topics
  • Authors
  • Privacy
  • Terms

meddler.tech

Hi I'm meddler.tech
51 posts
Measuring AI agent autonomy in practice ai-agents-2-2

Measuring AI agent autonomy in practice

A concrete treatment of capability and autonomy measurement, useful for release gating.

  • Go to the profile of  meddler.tech
Owen Blake
9 Jun 2026 · 1 min read
Trustworthy agents in practice ai-agents-2-2

Trustworthy agents in practice

Safety and trust considerations for real deployments where agents take consequential actions.

  • Go to the profile of  meddler.tech
Nina Reed
9 Jun 2026 · 1 min read
Agent Engineering: A New Discipline ai-agents-2-2

Agent Engineering: A New Discipline

Useful mental model for iterative quality improvement of non-deterministic agent systems.

  • Go to the profile of  meddler.tech
Leo Parker
9 Jun 2026 · 1 min read
How to Build an Agent ai-agents-2-2

How to Build an Agent

Production-oriented sequence from prototype to evals, safety checks, and operational feedback loops.

  • Go to the profile of  meddler.tech
Aria Patel
9 Jun 2026 · 1 min read
What is an AI agent? ai-agents-2-2

What is an AI agent?

Strong conceptual framing for agent boundaries, when not to use agents, and practical decomposition.

  • Go to the profile of  meddler.tech
Zoe Walker
9 Jun 2026 · 1 min read
Agents • Cookbook ai-agents-2-2

Agents • Cookbook

Hands-on examples for coding agents end to end with realistic tool and memory patterns.

  • Go to the profile of  meddler.tech
Ethan Shaw
9 Jun 2026 · 1 min read
Tools | OpenAI API ai-agents-2-2

Tools | OpenAI API

Deep dive into web/file/tool-search patterns that materially change agent capability and reliability.

  • Go to the profile of  meddler.tech
Maya Collins
9 Jun 2026 · 1 min read
Agents SDK | OpenAI API ai-agents-2-2

Agents SDK | OpenAI API

Reference for orchestrating multi-step, tool-using agent systems with explicit application control.

  • Go to the profile of  meddler.tech
Noah Bennett
9 Jun 2026 · 1 min read
Agents | OpenAI Developers ai-agents-2-2

Agents | OpenAI Developers

Practical guide for architecture, control flow, safety, and eval loops in production agents.

  • Go to the profile of  meddler.tech
Liam Carter
9 Jun 2026 · 1 min read
Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems ai-agents-2-2

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks th...

  • Go to the profile of  meddler.tech
Ava Brooks
26 May 2026 · 1 min read
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations ai-agents-2-2

RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations

Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue resolution truly reflects repository con...

  • Go to the profile of  meddler.tech
Owen Blake
25 May 2026 · 1 min read
What Do Evolutionary Coding Agents Evolve? ai-agents-2-2

What Do Evolutionary Coding Agents Evolve?

Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, ye...

  • Go to the profile of  meddler.tech
Nina Reed
19 May 2026 · 1 min read
Code as Agent Harness ai-agents-2-2

Code as Agent Harness

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is...

  • Go to the profile of  meddler.tech
Leo Parker
18 May 2026 · 1 min read
WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games ai-agents-2-2

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or intermediate traces rather than the delivered application. We introduce WebGameBenc...

  • Go to the profile of  meddler.tech
Aria Patel
17 May 2026 · 1 min read
From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability ai-agents-2-2

From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability

Generative artificial intelligence (GenAI) and agentic systems are moving software engineering from code-centric production toward intent-centric human-agent work in which natural language, repository context, tools, tes...

  • Go to the profile of  meddler.tech
Zoe Walker
10 May 2026 · 1 min read
Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents ai-agents-2-2

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents

Software engineering agents are increasingly deployed in evaluable engineering environments, yet post-failure recovery remains costly, manual, and ad hoc. Existing systems expose traces or generate follow-up feedback, bu...

  • Go to the profile of  meddler.tech
Ethan Shaw
9 May 2026 · 1 min read
SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution ai-agents-2-2

SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution

We introduce SWE Atlas, a benchmark suite for coding agents spanning three professional software engineering workflows: Codebase Q&A (124 tasks), Test Writing (90 tasks), and Refactoring (70 tasks). SWE Atlas differs fro...

  • Go to the profile of  meddler.tech
Maya Collins
8 May 2026 · 1 min read
ProgramBench: Can Language Models Rebuild Programs From Scratch? ai-agents-2-2

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight...

  • Go to the profile of  meddler.tech
Noah Bennett
5 May 2026 · 1 min read
meddler meddler

meddler

Explore

  • AI Agents
  • Coding Agents
  • Reading List
  • Topics

Company

  • About
  • Authors
  • Contact
  • Podcast

Legal

  • Privacy Policy
  • Terms of Use
  • Cookie Policy
  • Editorial Policy
© 2026 meddler. All rights reserved.
RSS Sitemap Support