Linux Services & Systems

Domains, hosting, PEC, email infrastructure, network services and Linux systems. Open Source infrastructure support and management.

Discover →

V4-Pro and V4-Flash: two sizes, one preview

DeepSeek publishes the V4 Preview family in two Mixture-of-Experts variants. DeepSeek-V4-Pro declares 1.6 trillion total parameters with 49 billion active per token; the lab positions it as competitive with leading closed-source models. DeepSeek-V4-Flash drops to 284 billion total and 13 billion active, designed for low latency and cost while keeping reasoning capability close to Pro. Both variants support thinking and non-thinking modes, and the API is reachable from the day of the announcement.

Sparse attention and a one-million-token context

The headline architectural change is DeepSeek Sparse Attention (DSA), combined with token-wise compression. This is the mechanism that brings the context window to 1 million tokens by default across every official surface — chat, API, open weights — with no intermediate tier. Compression cuts KV-cache overhead, while the sparse pattern preserves retrieval fidelity over long contexts without falling back to approximations or sliding windows.

Compatible APIs and migration of the older models

V4 endpoints stay compatible with OpenAI Chat Completions and Anthropic APIs, reducing migration friction for teams with existing integrations: in most cases it is enough to switch the base URL and the model name. Together with the announcement, DeepSeek confirms the retirement of deepseek-chat and deepseek-reasoner, which will be fully removed on 24 July 2026 at 15:59 UTC: production users have a three-month window to test V4 and adjust their prompts.

What this means in practice

V4 does not move the frontier on any single benchmark, but it ships in Open Source a sparse architecture designed for long contexts and a pipeline that, for the first time on a model of this size, treats the million-token window as default rather than as a premium tier. For teams building RAG systems or agents over large corpora, an extended context at the same price changes the balance between preprocessing and prompting.

Link: DeepSeek V4 Preview announcement · chat.deepseek.com · HuggingFace · deepseek-ai

Company

Actions

Links

Products

Solutions

Industries

DeepSeek V4 Preview: 1M-token default and sparse attention

Linux Services & Systems

V4-Pro and V4-Flash: two sizes, one preview

Sparse attention and a one-million-token context

Compatible APIs and migration of the older models

What this means in practice