Tutorial: Open Interpreter for local research dataset exploration

Open Interpreter used in a research environment to explore CSV/Parquet datasets. Install in a virtualenv, --local mode via Ollama, read-only working directory on originals, persistence of generated notebooks.

Open SourceAITutorial Open SourceAIAgenticTutorialOpen InterpreterResearchData

Preliminary notes

Content provided “as-is”, without warranty. Before running it on real datasets:

  • Work on a copy of the dataset in a test directory, not the original.
  • Back up the dataset before each session: Open Interpreter runs actual code and can overwrite or delete files.
  • Never place secrets (DB connection strings, API tokens, license keys) into prompts.
  • Interpreter logs (~/.config/open-interpreter/ in current versions) include command output and data samples: treat them as you would a Jupyter notebook.
  • For datasets under ethics approval or data use agreements, use only a local model via Ollama: cloud means third-party transmission.

What Open Interpreter is

Open Interpreter (openinterpreter.com) is a Python tool released on PyPI as v0.0.1 on 14 July 2023 by Killian Lucas, licensed AGPL-3.0. The idea is simple: a REPL where you describe a task in natural language and the LLM generates code (Python, shell, JavaScript) that is executed locally on the user’s machine. It is explicitly designed for data exploration, local automation and ad-hoc scripting.

For a research group, the immediate value is reducing the time spent recalling pandas, matplotlib or scipy syntax for exploratory tasks that do not justify writing a persistent script.

Use case: first exploration of a tabular dataset

A researcher receives a dataset from a collaborator — a 200 MB measurements.parquet with loosely documented schema. Within an hour they want: counts by category, main distributions, obvious outliers, and a first documented cleaning step.

1. Isolated environment

python3 -m venv ~/.venvs/interpreter
source ~/.venvs/interpreter/bin/activate
pip install open-interpreter
interpreter --version

Never install Open Interpreter globally on the system Python: it runs code and can install dependencies on its own. The virtualenv caps the blast radius.

2. Working copy of the dataset

mkdir -p ~/research/measurements-work
cp /data/raw/measurements.parquet ~/research/measurements-work/
cd ~/research/measurements-work
# Make the original read-only for safety:
chmod 444 measurements.parquet

If the agent proposes an “inline rename” or “inline type fix” on the file, read-only blocks it. The operator then deliberately produces a derived file (measurements.cleaned.parquet).

3. Launch in local mode

# Ollama must already be running with a capable model:
ollama pull codellama:13b-instruct   # or llama2:13b-chat, wizardcoder, etc.

interpreter --local
# In the provider selection: Ollama → codellama:13b-instruct

The --local option (name and syntax may vary between releases) points the agent to a local endpoint with no cloud keys.

4. Exploration prompts

In the REPL:

  • “Read measurements.parquet with pandas into dataframe df. Show df.info() and df.head(10).”
  • “For each numeric column compute min, max, mean, median, and the number of NaNs. Return a table.”
  • “For the category column, show value counts and a bar chart saved as category_counts.png in the current directory.”
  • “Identify rows where temperature_c is outside [-40, 150] and save them to outliers.csv.”

Each instruction generates a code block that interpreter proposes. By default it asks for confirmation before running. Keep confirmation on: interpreter -y (auto-yes) is not appropriate until you know what you are inspecting.

5. Persisting the session

At the end of the session, ask:

“Export all Python code run so far into a file exploration_2023-09-10.py, cleaned up, with comments summarising each block.”

The resulting file is a reproducible artifact. Commit it to a git repo (or to the paper’s companion data bundle) together with category_counts.png and outliers.csv: the agent’s work becomes a controllable exploratory notebook.

Limits and caveats

  • Open Interpreter runs real code: an ambiguous prompt can generate an os.remove, a df.to_parquet that overwrites the original, or an unwanted pip install. Virtualenv + read-only dataset are minimum defences, not exhaustive ones.
  • The local model has limits: the open models available in mid-2023 (Code Llama, Llama 2, WizardCoder) are still far from a frontier model on niche data-science libraries. Complex statistical analyses warrant expert human review as well.
  • Moving-target versions: the project has evolved quickly, with CLI changes, default providers and sandbox semantics shifting. Pin the version (pip install open-interpreter==<ver>) for sessions that need to be reproducible later.
  • Not a Jupyter replacement: for extensive or shared analyses the notebook remains the more readable artifact. Interpreter is a tool for a fast first look, not for reporting.

Link: openinterpreter.comgithub.com/OpenInterpreter/open-interpreter


Stefano Noferi — Founder e CEO/CTO di noze
Tech Entrepreneur — AI Governance & Security Architect

Need support? Under attack? Service Status
Need support? Under attack? Service Status