Tutorial: Continue.dev with on-prem Ollama for NDA-bound code

Continue.dev configured with a local Ollama backend: a code assistant in VS Code/JetBrains with no traffic to the cloud. config.json setup, models for autocomplete and chat, workspace isolation.

Open SourceAITutorial Open SourceAIAgenticTutorialContinueOllamaNDAOn-Prem

Preliminary notes

This tutorial is provided “as-is”. Before adopting:

  • Validate on an isolated machine or a test environment.
  • Back up the repo and branch where you run the tests.
  • Never place secrets into prompts, config files or IDE logs.
  • Check that the local runtime (Ollama) does not expose ports outside: the default 127.0.0.1:11434 API is not network-reachable, but some setups have changed it.
  • Continue and Ollama release cycles move fast: rely on the official docs for the config.json schema of your version.

What Continue.dev is

Continue (continue.dev) is an open-source extension for VS Code and JetBrains adding an AI chat panel and AI completions inside the IDE. Unlike SaaS products, Continue leaves provider choice to the user via a configuration file. This allows pairing it with Ollama, a local runtime that serves open models (Code Llama, DeepSeek Coder, …) through an OpenAI-compatible API on 127.0.0.1:11434.

The result is a code assistant where no part of the open file, prompt or response leaves the machine. That is the scenario consultancies and integrators care about when client code is under strict NDA clauses.

Use case: consultancy with multiple NDA-bound repositories

A 4-developer team works on several client projects, some of which explicitly forbid sending code snippets to cloud services. Goal: autocomplete and contextual chat on the open code, with no outbound traffic.

1. Install Ollama and pull the models

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Pull two models: one for chat, one for autocomplete
ollama pull deepseek-coder:6.7b   # chat, ~3.8 GB
ollama pull deepseek-coder:1.3b   # autocomplete, ~0.8 GB
ollama list

Model size depends on available RAM and GPU. On a 16 GB laptop with an integrated GPU, combining 6.7B (chat) with 1.3B (autocomplete, low latency) is a reasonable compromise in early 2024. On Apple Silicon or with a dedicated NVIDIA GPU you can move up to codellama:13b-instruct or deepseek-coder:33b for chat.

2. Install the Continue extension

From VS Code: Ctrl+Pext install Continue.continue. After installation a Continue icon appears in the sidebar.

On first launch, Continue creates a config file at ~/.continue/config.json (or config.yaml in recent versions). This file is the source of truth: no hidden settings, no ambiguous telemetry when configured properly.

3. Reference config for local Ollama

{
  "models": [
    {
      "title": "DeepSeek Coder 6.7B (local)",
      "provider": "ollama",
      "model": "deepseek-coder:6.7b",
      "apiBase": "http://127.0.0.1:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek Coder 1.3B (local)",
    "provider": "ollama",
    "model": "deepseek-coder:1.3b",
    "apiBase": "http://127.0.0.1:11434"
  },
  "allowAnonymousTelemetry": false
}

Notes on this configuration:

  • allowAnonymousTelemetry: false disables anonymous usage metadata being sent to the Continue project. Mandatory for NDA-bound environments.
  • Explicit apiBase on 127.0.0.1: no DNS, no default proxying.
  • No API key: Ollama does not require one, so there are no credentials to protect in config.json.

4. Verify no traffic escapes

Before handing real code to the extension, a network check on a VM is wise:

# In a separate terminal, filter outbound HTTPS during a session:
sudo tcpdump -n -i any 'tcp port 443 and not dst host 127.0.0.1' -c 50

When using chat and autocomplete in VS Code, the command above should not log related traffic (background system noise aside). If connections to continue.dev domains show up, recheck allowAnonymousTelemetry.

  • Contextual chat: Ctrl+L opens the panel; @ lets you attach files or folders from the workspace. The operator controls what is attached.
  • Inline edit: Ctrl+I on a selection allows guided edits without leaving the editor.
  • Autocomplete: runs in the background, backed by the 1.3B model — faster but less accurate than the chat model.

Limits and things to know

  • Quality below a frontier cloud model: local 6.7B–13B open models are useful, but on complex refactors or subtle debugging the gap from a frontier model is noticeable. The choice is a deliberate tradeoff between confidentiality and capability.
  • Compute cost: on laptops without a dedicated GPU, a 1.3B autocomplete model is the threshold where latency is acceptable.
  • Model updates: ollama pull <model> must be scheduled. A model “frozen” early in the year will miss regressions and improvements released upstream.
  • Not a complete NDA solution: on-prem Continue fixes code transfer to LLM providers. It does not replace machine-level policy (disk encryption, sudo audit, endpoint security) required by the clauses.

Link: continue.devollama.comgithub.com/continuedev/continue


Stefano Noferi — Founder e CEO/CTO di noze
Tech Entrepreneur — AI Governance & Security Architect

Need support? Under attack? Service Status
Need support? Under attack? Service Status