Preliminary notes
Content provided “as-is”, without warranty. Before running it on real datasets:
- Work on a copy of the dataset in a test directory, not the original.
- Back up the dataset before each session: Open Interpreter runs actual code and can overwrite or delete files.
- Never place secrets (DB connection strings, API tokens, license keys) into prompts.
- Interpreter logs (
~/.config/open-interpreter/in current versions) include command output and data samples: treat them as you would a Jupyter notebook. - For datasets under ethics approval or data use agreements, use only a local model via Ollama: cloud means third-party transmission.
What Open Interpreter is
Open Interpreter (openinterpreter.com) is a Python tool released on PyPI as v0.0.1 on 14 July 2023 by Killian Lucas, licensed AGPL-3.0. The idea is simple: a REPL where you describe a task in natural language and the LLM generates code (Python, shell, JavaScript) that is executed locally on the user’s machine. It is explicitly designed for data exploration, local automation and ad-hoc scripting.
For a research group, the immediate value is reducing the time spent recalling pandas, matplotlib or scipy syntax for exploratory tasks that do not justify writing a persistent script.
Use case: first exploration of a tabular dataset
A researcher receives a dataset from a collaborator — a 200 MB measurements.parquet with loosely documented schema. Within an hour they want: counts by category, main distributions, obvious outliers, and a first documented cleaning step.
1. Isolated environment
python3 -m venv ~/.venvs/interpreter
source ~/.venvs/interpreter/bin/activate
pip install open-interpreter
interpreter --version
Never install Open Interpreter globally on the system Python: it runs code and can install dependencies on its own. The virtualenv caps the blast radius.
2. Working copy of the dataset
mkdir -p ~/research/measurements-work
cp /data/raw/measurements.parquet ~/research/measurements-work/
cd ~/research/measurements-work
# Make the original read-only for safety:
chmod 444 measurements.parquet
If the agent proposes an “inline rename” or “inline type fix” on the file, read-only blocks it. The operator then deliberately produces a derived file (measurements.cleaned.parquet).
3. Launch in local mode
# Ollama must already be running with a capable model:
ollama pull codellama:13b-instruct # or llama2:13b-chat, wizardcoder, etc.
interpreter --local
# In the provider selection: Ollama → codellama:13b-instruct
The --local option (name and syntax may vary between releases) points the agent to a local endpoint with no cloud keys.
4. Exploration prompts
In the REPL:
- “Read
measurements.parquetwith pandas into dataframedf. Showdf.info()anddf.head(10).” - “For each numeric column compute min, max, mean, median, and the number of NaNs. Return a table.”
- “For the
categorycolumn, show value counts and a bar chart saved ascategory_counts.pngin the current directory.” - “Identify rows where
temperature_cis outside [-40, 150] and save them tooutliers.csv.”
Each instruction generates a code block that interpreter proposes. By default it asks for confirmation before running. Keep confirmation on: interpreter -y (auto-yes) is not appropriate until you know what you are inspecting.
5. Persisting the session
At the end of the session, ask:
“Export all Python code run so far into a file exploration_2023-09-10.py, cleaned up, with comments summarising each block.”
The resulting file is a reproducible artifact. Commit it to a git repo (or to the paper’s companion data bundle) together with category_counts.png and outliers.csv: the agent’s work becomes a controllable exploratory notebook.
Limits and caveats
- Open Interpreter runs real code: an ambiguous prompt can generate an
os.remove, adf.to_parquetthat overwrites the original, or an unwantedpip install. Virtualenv + read-only dataset are minimum defences, not exhaustive ones. - The local model has limits: the open models available in mid-2023 (Code Llama, Llama 2, WizardCoder) are still far from a frontier model on niche data-science libraries. Complex statistical analyses warrant expert human review as well.
- Moving-target versions: the project has evolved quickly, with CLI changes, default providers and sandbox semantics shifting. Pin the version (
pip install open-interpreter==<ver>) for sessions that need to be reproducible later. - Not a Jupyter replacement: for extensive or shared analyses the notebook remains the more readable artifact. Interpreter is a tool for a fast first look, not for reporting.
Link: openinterpreter.com — github.com/OpenInterpreter/open-interpreter
