Linux Services & Systems

Domains, hosting, PEC, email infrastructure, network services and Linux systems. Open Source infrastructure support and management.

Discover →

From community reaction to platform

In March 2024 the startup Cognition Labs unveils Devin, an autonomous agent for software engineering, in a high-impact demo. Within a few weeks an open source community response emerges: the OpenDevin project, coordinated among others by Robert Brennan, Xingyao Wang and Graham Neubig (CMU). On 3 September 2024 the project is renamed OpenHands and All Hands AI is established as its reference organisation. The licence is MIT.

The rebrand accompanies the project’s transformation from an ad-hoc initiative into a structured platform for software agent development.

Agent architecture

OpenHands is built around an autonomous agent that interacts with an isolated development environment. The agent operates through three tool categories: browser (web navigation via Playwright or Chrome DevTools Protocol), editor (reading, writing, patching project files) and shell (command execution in a Linux environment).

Execution takes place in a Docker sandbox, isolating filesystem, network and processes from the host. This choice is consistent with the risk profile of an agent that can execute arbitrary code: any side effect remains confined to the container and can be reverted at the end of the session.

CodeAct and the execution loop

The default agent implements the CodeAct pattern: at each iteration the model emits a block of code (bash or Python) that is executed in the sandbox; the execution output is returned as new context for the next step. This loop continues until the agent signals task completion or an iteration limit is reached.

The choice of expressing actions as code — rather than as structured JSON function calls — follows the line of the paper “Executable Code Actions Elicit Better LLM Agents” and aligns with similar implementations such as SmolAgents.

SWE-bench Verified as a benchmark

OpenHands is systematically evaluated on the SWE-bench Verified benchmark, a curated subset of SWE-bench comprising real issues from open source Python projects (Django, Flask, scikit-learn and others) with associated verification tests. Results are published on a public leaderboard that compares different agent configurations and underlying models.

Top positions on the leaderboard are frequently held by OpenHands configurations with commercial models (Claude, GPT-4 class), confirming the platform as one of the main references for open source SWE-agents.

Link: all-hands.dev

Company

Actions

Links

Products

Solutions

Industries

OpenHands (formerly OpenDevin): autonomous agent for software engineering

Linux Services & Systems

From community reaction to platform

Agent architecture

CodeAct and the execution loop

SWE-bench Verified as a benchmark