An academic contribution to SWE-bench
On 2 April 2024, the Princeton NLP group, in collaboration with Stanford, releases SWE-agent. The lead authors — John Yang, Carlos E. Jimenez and colleagues — come from the same research line that had produced, a few months earlier, the SWE-bench benchmark for evaluating software agents. The project is published under the MIT licence and is presented at NeurIPS 2024 with the paper “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering”.
The central contribution is conceptual: showing that the limiting factor in solving software engineering tasks is not so much the model’s capability as the design of the interface between agent and environment.
Agent-Computer Interface
The paper introduces the notion of Agent-Computer Interface (ACI): the set of primitives through which the agent observes and modifies the system state. A raw shell offers maximum generality but forces the agent to handle verbose output, escape sequences and inconsistent formats. An interface designed for agents — with structured commands, compact output and explicit feedback — reduces cognitive load and significantly improves performance.
SWE-agent implements a custom ACI that includes: a file editor with a sliding window view (avoids loading the entire file into context), structured commands for filesystem navigation, a test runner with filtered output, atomic patch operations. These primitives are available to the agent as tools, while operations not mediated by the ACI are in principle avoided.
Results on SWE-bench
In the initial paper release, SWE-agent solves between 12% and 18% of SWE-bench lite issues with GPT-4 class models, depending on configuration. The result is significant because, at publication time, many alternative solutions settled on lower percentages while using comparable models. The difference is attributed in the paper mainly to ACI design, not to orchestration or prompting.
Relationship to OpenHands
SWE-agent and OpenHands (formerly OpenDevin) emerged in the same period and address the same problem — automation of software engineering tasks — but from different traditions. SWE-agent is an academic contribution focused on ACI design, with a minimalist architecture oriented to experiment reproducibility. OpenHands is a broader platform with industrial ambitions.
The two projects are often compared on SWE-bench leaderboards and represent complementary approaches: SWE-agent as a research tool, OpenHands as an engineering platform.
Link: swe-agent.com
