Quick answer: DeepSeek V4 launched on April 24, 2026 as an open-source model from DeepSeek (China) shipped in two variants: V4 Pro (1.6 trillion-parameter Mixture-of-Experts) and V4 Flash (284B total / 13B active). Both have a 1M token context window. API pricing is $0.435/M input for Pro and $0.14/M for Flash — that's 97% cheaper than GPT-5.5 and 90% cheaper than Claude Opus 4.7. Performance is near state-of-the-art at roughly 1/6 the cost of frontier US models. MIT Technology Review calls it a "serious threat" to Anthropic and NVIDIA.
🔥 The numbers shaking the market: DeepSeek V4 Flash is $0.14/M input · cache hits drop to $0.0028/M · vs Claude Opus 4.7 ($15/M input) = 107× cheaper. A $2 budget gets you 2-3 full days of coding-agent runs — a brand-new "price frontier" for AI.
DeepSeek went quiet after R2 underwhelmed in April 2025 and V3 faded mid-2025 — but the V4 Preview launch on April 24, 2026 hit the market with three points MIT Technology Review calls the most important: (1) fully open source, (2) trained on Huawei Ascend chips, not NVIDIA (signaling China can compete without CUDA), and (3) pricing that scares competitors. This is a complete review: specs, benchmarks, pricing, and how to access it.
What Is DeepSeek V4? — Two Variants Explained
DeepSeek shipped two models simultaneously rather than a single flagship — each tuned for different workloads.
| Spec | DeepSeek V4 Pro | DeepSeek V4 Flash |
|---|---|---|
| Total parameters | 1.6T (1.6 trillion) | 284B |
| Active parameters per token | Not disclosed (≈37B) | 13B |
| Architecture | Mixture-of-Experts (MoE) | Mixture-of-Experts (MoE) |
| Context window | 1M tokens | 1M tokens |
| Reasoning modes | 3 modes (fast / balanced / deep) | 2 modes |
| API price input | $0.435/M | $0.14/M |
| API price output | $0.87/M | $0.80/M |
| Cache hit input | $0.0036/M | $0.0028/M |
| License | Open source | Open source |
| Run locally? | Hard (1.6T = 800GB+ VRAM) | Yes (4-bit quantized 70GB) |
| Best for | Production heavy reasoning | Coding agents, high volume |
Bottom line: Pro = max horsepower for hard work (research, complex coding). Flash = fastest and cheapest for high-volume work (chatbots, automation, agent loops). Both share the 1M context window.

The Pricing Story — Why It's 97% Cheaper Than GPT-5.5
The numbers that triggered Reuters, SCMP, and MIT Technology Review to file stories within 24 hours:
| Model | Input ($/1M) | Output ($/1M) | vs DeepSeek V4 Flash |
|---|---|---|---|
| DeepSeek V4 Flash | $0.14 | $0.80 | Baseline |
| DeepSeek V4 Pro | $0.435 | $0.87 | 3.1× more |
| Gemini 2.5 Pro | $2.50 | $10 | 18× more |
| GPT-5.5 Standard | $5 | $30 | 36× more |
| Claude Opus 4.7 | $15 | $75 | 107× more |
| GPT-5.5 Pro | $30 | $180 | 214× more |
💰 Real-world cost from X user @quxiaoyin: "Tested coding agents with the DeepSeek V4 API — $2 lasted 2-3 full days." Compare with Claude Opus 4.7 where $2 buys 30-45 minutes of similar usage based on token consumption math.
Architecture Highlight — MoE That Activates 13B of 284B
Mixture-of-Experts (MoE) is the architecture that makes V4 cheap and fast — the model has many expert networks, but only a small subset activates per token.
- •Total parameters — full model size in memory (V4 Flash = 284B)
- •Active parameters — actually used to compute each token (V4 Flash = just 13B)
- •Routing gate — a network that decides which expert each token goes to
- •Result: quality of a 284B-class model at 13B-class inference cost — about 22× cheaper inference
Why MoE matters: Claude Opus 4.7 and GPT-5.5 use dense architectures (all parameters active per token). DeepSeek V4 Flash uses sparse MoE (only 4.6% of parameters active). That's the engineering reason V4 is dramatically cheaper despite similar size.

Three Reasoning Modes — V4 Pro's New Feature
V4 Pro introduces 3 modes to trade off quality vs speed vs cost.
- 1.Fast mode — short, fast answers for general chatbot use. Low tokens, low latency.
- 2.Balanced mode — the default. Good for general tasks needing moderate quality.
- 3.Deep mode — "thinking" mode similar to OpenAI's o1/o3 — burns more thinking tokens for the highest output quality. Designed for math, complex coding, and research.
Important caveat (per scaling01 on X): Deep mode scores slightly higher than V3.2 — but largely because it "thinks longer", not because reasoning is fundamentally better. The non-thinking version benches near V3.2. V4 wins on the cost frontier, not the reasoning frontier.
Benchmarks vs GPT-5.5 / Claude Opus 4.7
Official benchmark results from DeepSeek's technical paper, head-to-head with the flagship US models.
| Benchmark | DeepSeek V4 Pro | GPT-5.5 | Claude Opus 4.7 | Winner |
|---|---|---|---|---|
| MMLU-Pro (Knowledge) | 82.1% | 85.3% | 84.2% | GPT-5.5 |
| HumanEval+ (Coding) | 86.4% | 85.1% | 83.0% | DeepSeek V4 |
| SWE-Bench Verified | 62.3% | 66.5% | 63.2% | GPT-5.5 |
| FrontierMath | 44.8% | 51.7% | 43.8% | GPT-5.5 |
| GPQA Diamond | 78.9% | 81.3% | 80.1% | GPT-5.5 |
| AIME 2025 (Math) | 91.2% | 89.7% | 87.5% | DeepSeek V4 |
| LongContext (1M) | 92.5% | 91.0% | 93.8% | Opus 4.7 |
| Cost per typical task | $0.02-0.05 | $0.55 | $1.50 | DeepSeek V4 (-95%) |
Benchmark verdict: V4 Pro trails slightly on knowledge / SWE-Bench / FrontierMath (2-7 points behind), but wins on coding (HumanEval+) and AIME math. The decisive number is cost-per-task — 95% cheaper = "good-enough" quality at 1/20 the price.
How to Access DeepSeek V4 — 4 Channels
DeepSeek opens access via four routes — pick by need.
# Option 1: Official API (recommended)
curl https://api.deepseek.com/v1/chat/completions \
-H "Authorization: Bearer $DEEPSEEK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-flash",
"messages": [{"role":"user","content":"Hello"}]
}'
# Option 2: Ollama (local)
ollama pull deepseek-v4-flash
ollama run deepseek-v4-flash
# Option 3: HuggingFace (Python)
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"deepseek-ai/DeepSeek-V4-Pro",
trust_remote_code=True
)- 1.Official API (api.deepseek.com) — cheapest direct path. $5 of credit gets you a long evaluation runway.
- 2.Ollama (
ollama pull deepseek-v4-flash) — run locally and free. V4 Flash 4-bit quantized fits in ~70GB VRAM (NVIDIA RTX A6000 or Mac M3 Ultra 128GB) - 3.HuggingFace (
deepseek-ai/DeepSeek-V4-Pro) — download full weights for enterprise infrastructure deployment - 4.OpenRouter — universal API gateway with caching. Free tier sometimes runs free.
- 5.NVIDIA NIM endpoint — built-in support on NVIDIA Blackwell GPU instances for enterprise
Why MIT Tech Review Calls It a "Threat" — 3 Big Reasons
MIT Technology Review (April 24, 2026) names V4 one of the year's three most important AI stories.
- 1.Open source you can actually use — Meta's Llama 4 is open but unwieldy. V4 Flash runs on a Mac M3 Ultra. "Frontier model on your laptop" became real for the first time (Salvatore Sanfilippo, ex-Redis CEO/antirez, said as much on X).
- 2.Trained on Huawei Ascend chips, not NVIDIA — a major signal that China can train frontier models without CUDA. The implications run beyond AI into semiconductor geopolitics.
- 3.Pricing that "breaks the market" — if V4 covers 80% of Claude Opus 4.7 use cases at 1/107th the cost, Anthropic must cut prices or lose share. quu xiaoyin's "RIP Anthropic" tweet exaggerates, but the pressure is real.
Real Developer Reviews — Reddit, X, YouTube
Real community impressions in the first 4 days post-launch:
- •Reddit r/LocalLLaMA (1 day ago, 50+ comments): "Deepseek V4 Pro has honestly blown me away — better than other Chinese models like GLM by a wide margin"
- •Salvatore Sanfilippo (antirez) on X: "V4 Flash with local inference, 24h in — even at 2-bit selective quantization GGUF, this is the FIRST time I feel I have a frontier model running on my computer. A bigger landscape change than Pro."
- •WorldofAI YouTube (37K+ views, 4 days ago): "DeepSeek is BACK with V4 — possibly the best open-source model ever"
- •Matthew Berman (LinkedIn): "DeepSeek V4 = a serious threat to Anthropic"
- •xCreate YouTube (1 day ago): "Flash vs Pro tested — Flash much faster, quality is 80-90% the same on coding tasks"
- •scaling01 on X (6 hours ago): "LisanBench results — Pro and Flash score slightly above V3.2, but mostly because they think longer. Reasoning isn't fundamentally improved. V4 is on the cost frontier, not the reasoning frontier."
Limitations + Caveats Before You Adopt
V4 isn't a silver bullet — 5 things developers should know before jumping in.
- •Knowledge benchmarks trail GPT-5.5/Opus 4.7 — for deep-knowledge work or research-grade citation tasks, Claude/GPT-5.5 still lead by 2-7 points
- •Reasoning isn't fundamentally better than V3.2 — Deep mode scores higher because it thinks longer. If you pay for thinking tokens, the value vs going straight to GPT-5.5 may be marginal
- •Production reliability isn't proven yet — Preview release means bugs and edge cases. Hold off on production-critical workloads until the stable release
- •Trained on Chinese chips — some enterprises and regulators have concerns about data sovereignty / supply chain — verify compliance before enterprise adoption
- •API rate limits are unstable — DeepSeek's official API throttles at peak times — back up with OpenRouter or self-hosted Ollama
Comparison vs Other Models — Which to Pick When
Answering the most common question: "Should I switch to V4?"
- 1.Cost-sensitive coding agent (high volume) → DeepSeek V4 Flash — cheapest, quality is sufficient
- 2.English-language customer chatbot → DeepSeek V4 Pro or Gemini 2.5 Pro — depends on whether you need Search Grounding
- 3.Production-critical agent → still GPT-5.5 or Claude Opus 4.7 — V4 is a preview release, not production-ready yet
- 4.Local/private deployment → DeepSeek V4 Flash via Ollama — frontier capability on a laptop
- 5.Research / math-intensive work → GPT-5.5 still leads on FrontierMath, but V4 Pro edges it on AIME — depends on the specific use case
- 6.Long context (>1M) → Gemini 2.5 Pro (2M tokens) — V4 matches GPT-5.5/Claude at 1M but at much lower cost
CherCode — Using DeepSeek V4 in Thai Projects
At CherCode we've started piloting DeepSeek V4 Flash in client AI Chatbots on LINE OA and AI Automation Workflows — it's the best fit for internal tools + high-volume chatbots that are cost-sensitive. For customer-facing critical agents we still route through Claude or GPT-5.5 via an AI Router. Read more: GPT-5.5 vs Claude Opus 4.7 and GPT-5.5 vs Gemini 2.5 Pro. Free consultation.
Frequently Asked Questions
Frequently Asked Questions
DeepSeek V4 release date เมื่อไหร่?
DeepSeek V4 Preview เปิดตัวอย่างเป็นทางการเมื่อ 24 เมษายน 2026 ผ่าน api-docs.deepseek.com และ HuggingFace (deepseek-ai/DeepSeek-V4-Pro) — เป็น preview release ไม่ใช่ stable คาดว่า stable release จะมาใน 4-8 สัปดาห์ พร้อมๆ กันออก 2 variants คือ V4 Pro (1.6T) และ V4 Flash (284B/13B active) ทั้งคู่ context window 1M tokens
DeepSeek V4 vs V3 ต่างกันอย่างไร?
V4 ปรับปรุงจาก V3 ใน 4 จุดสำคัญ: (1) 2 variants (V3 มีตัวเดียว V4 มี Pro + Flash) (2) Context window 1M tokens (V3 = 128K) ใหญ่ขึ้น 8 เท่า (3) Three Reasoning Modes (V3 ไม่มี) (4) ราคาถูกลง 60-80% (V3 Flash $0.27/M input → V4 Flash $0.14/M) — แต่ reasoning quality ดีขึ้นเล็กน้อยเท่านั้น (per scaling01) คะแนนสูงขึ้นเพราะคิดนานขึ้น ไม่ใช่ฉลาดขึ้นจริง
DeepSeek V4 Pro vs Flash เลือกอะไรดี?
Pro สำหรับ: production heavy reasoning, complex coding agent, research tasks ที่ต้องการคุณภาพสูงสุด — ราคา $0.435/M input · Flash สำหรับ: high-volume chatbot, customer support, simple coding agent, automation workflow — ราคา $0.14/M input (ถูกกว่า Pro 3.1 เท่า) Real talk: xCreate ทดสอบแล้วบอก Flash quality 80-90% ของ Pro บนงาน coding ส่วนใหญ่ — ใช้ Flash ก่อนแล้วค่อย upgrade Pro เฉพาะ task ที่ต้องการ
ราคา API ของ DeepSeek V4 เท่าไหร่ ถูกกว่าคู่แข่งจริงไหม?
Flash $0.14/M input + $0.80/M output · Pro $0.435/M + $0.87/M · Cache hit Flash $0.0028/M (50× ถูกกว่า no-cache) เปรียบเทียบ: GPT-5.5 = $5/$30 (Flash ถูกกว่า 36 เท่า), Claude Opus 4.7 = $15/$75 (ถูกกว่า 107 เท่า), Gemini 2.5 Pro = $2.50/$10 (ถูกกว่า 18 เท่า) — ใช่ ถูกกว่าจริงและถูกที่สุดในตลาด frontier model ขณะนี้
DeepSeek V4 รันใน local laptop ได้ไหม?
V4 Flash รันได้ ผ่าน Ollama (4-bit quantized GGUF) ใช้ VRAM ~70GB — ต้องมี NVIDIA RTX A6000 หรือ Mac M3 Ultra 128GB หรือ Mac Studio M3 Ultra Salvatore Sanfilippo (antirez) บอกบน X ว่า "แม้ 2-bit quantized ก็ยังเป็น frontier model จริง" — V4 Pro รันยาก เพราะ 1.6T parameters ต้อง infrastructure เยอะ (800GB+ VRAM) ใช้ API หรือ Ollama dedicated server แทน
DeepSeek V4 ปลอดภัยไหม trained บน Chinese chips ใช่ไหม?
ใช่ trained บน Huawei Ascend chips (ไม่ใช่ NVIDIA) — เป็น signal สำคัญทาง geopolitics ว่าจีน train frontier model โดยไม่พึ่ง CUDA ได้แล้ว สำหรับ safety: weights เป็น open source audit ได้ — data sovereignty: ใช้ official API = ส่งข้อมูลไป server จีน ถ้ากังวลใช้ self-hosted Ollama หรือ AWS / Azure เปิด instance V4 เอง — สำหรับ enterprise ที่มี compliance requirement ตรวจ regulator ก่อนใช้ official API
DeepSeek V4 รองรับภาษาไทยดีไหม?
รองรับดี — V4 trained บน multilingual dataset ที่ใหญ่ขึ้นกว่า V3 รวมภาษาไทยไว้ใน training corpus ทดสอบจริง: เข้าใจภาษาไทยพูด/เขียนระดับเดียวกับ Claude Sonnet 4.6 (ดีกว่า GPT-5.5 บางกรณี) แต่ การสร้างคอนเทนต์ภาษาไทย ที่เป็นธรรมชาติยังตามหลัง Claude Opus 4.7 เล็กน้อย — เหมาะ chatbot/automation มากกว่า marketing copy
ควรเปลี่ยนจาก GPT-5.5 หรือ Claude มา DeepSeek V4 ไหม?
ขึ้นกับ use case — เปลี่ยน มาใช้ V4 Flash ถ้า: (1) high-volume API workload ที่ cost คือ priority (2) coding agent ที่ทำงานยาวๆ token เยอะ (3) personal/internal tools ที่ไม่ critical อย่าเปลี่ยน ถ้า: (1) production customer-facing ที่ต้องการ stability (V4 ยัง preview) (2) งานที่ต้องการ deep knowledge หรือ research-grade citations (3) งานที่ต้องการ Search Grounding ใช้ Gemini 2.5 Pro ดีกว่า — Hybrid strategy ใช้ AI Router routing คนละ model ตาม task = ROI ดีที่สุด
Arm - CherCode
Full-Stack Developer & Founder
Software developer with 5+ years of experience in Web Development, AI Integration, and Automation. Specializing in Next.js, React, n8n, and LLM Integration. Founder of CherCode, building systems for Thai businesses.
Portfolio


