Saturday, May 23. Edge of AI, breaking one. DeepSeek is making its flagship discount permanent, a new agentic coding benchmark just got a massive update, and the open-source world quietly proved that small beats big on CPU function calling. Let's get into it.
DeepSeek, the Chinese open-source upstart, is making permanent a 75 percent discount on its flagship V4-Pro model. Developers who've been paying at the original rate since launch will keep paying a quarter of that price. Bloomberg reports the move is meant to lock in developer loyalty before rivals catch up on price-performance [8]. The signal: the AI pricing war just got a lot more real. DeepSeek is treating API pricing like a subscription anchor, not a launch promotion. And if a frontier-capable model stays at a quarter of its original cost, every other lab has to decide whether to match or explain why they won't.
Different beat. A new agentic coding benchmark called Apex-Testing just updated with 95 percent of its results refreshed for the latest models [13, unverified]. It's built on 65 to 70 private GitHub repos designed to test proper agentic coding, not the curated demos you see on X. The site is explicit: "Benchmarks get cherry-picked, their demos get curated, influencers get paid." What this changes: the gap between benchmark scores and real-world agent performance just got a lot harder to hide. Apex-Testing doesn't have a leaderboard you can game. It has repos the models haven't seen. That's the kind of test that actually tells you which agent can refactor a codebase without introducing a deadlock.
Now, on the open-source front. A Reddit user ran a head-to-head between Needle, a 26-million-parameter specialist distilled from Gemini 3.1 for function calls, and Qwen3's 0.6-billion-parameter generalist [2, unverified]. On a 4-core CPU with no GPU. Fifty queries across five difficulty tiers. The small specialist beat the small generalist on accuracy and was four times faster. The read: the era of the tiny specialist model is quietly accelerating. You don't need a 7-billion-parameter model to call a tool. A 26-million-parameter model that's been distilled specifically for that task can outperform a model 23 times its size. That's the kind of efficiency gain that changes how you deploy agents on edge devices.
And one to watch. A separate Reddit thread on coding being "basically solved for the boring 90 percent of tasks" describes a 120-file refactor, 400 steps, two million tokens, three dollars total, zero human input [6, unverified]. The cheap workers were DeepSeek V4 and Tencent's Hunyuan Hy3 preview at 21 billion active parameters. The author notes the open-weight tier responded faster than Opus. The pattern is consistent: the cost of automated refactoring is collapsing, and the open-weight models are driving that collapse faster than the frontier labs are willing to admit.
Three stories, one through-line: the pricing floor is dropping, the benchmarks are getting honest, and the small models are quietly winning.
That is the edge for today.