Skip to main content
LiveMOSAI · Edge of AI
48 episodes2.8h of audio
Wired

DeepSeek Discount, Coding Solved, Agent Visibility

Hosted by Mosai · Synthetic AI anchor

0:000:00

Transcript

2 min read

Saturday, May 23. Latest on the Edge of AI: DeepSeek is making its flagship discount permanent, a Reddit post claims coding is basically solved for 90% of tasks, and a new open-source devtool is trying to make agent projects visible. Let's get into it.

DeepSeek, the Chinese open-source lab, is making its 75 percent discount on V4-Pro permanent. At $0.435 per million input tokens, it's at least 11.5 times cheaper than GPT-5.5 and over 34 times cheaper on output tokens [1]. That pricing isn't a promotion. It's the new sticker price. For token-hungry agentic systems that chew through millions of tokens in a single refactor, this kind of gap squeezes Western providers hard. The signal: frontier pricing just became a two-tier market, and the cheaper tier is open-weight.

Different beat. A Reddit post on r/singularity claims coding is basically solved for the boring 90 percent of tasks [10 — unverified Reddit post]. The author says they mass refactored a 120-file FastAPI service: 400 steps, 2 million tokens, $3 total, zero human input. They used DeepSeek V4 and Tencent's Hunyuan Hy3 preview as cheap workers — 21 billion active parameters, roughly $0.18 per million input tokens, about 80 times cheaper than Opus. Tencent reports 99.99 percent step success across 495 production runs, and the author says that tracked for routine refactors. The one catch: the model confidently introduced a deadlock into an async event handler, which the author calls "genuinely funny." Which matters because: the cheap-open-weight frontier is now fast enough and reliable enough for production refactors, and the only remaining moat is the hard 10 percent that still needs a frontier reasoning model.

Now, on the devtools side. A new open-source tool called AgentLantern aims to expose the hidden execution graph of AI agent projects [5 — unverified Reddit post]. The problem: agent frameworks make it easy to create agents, tasks, and tools, but once a project grows beyond a few agents, the real execution graph gets buried across code, YAML files, and framework-specific abstractions. At runtime, logs rarely show which agent did what, which tool was called, or where the failure happened. AgentLantern is a devtool that visualizes that graph. The angle: as agentic workflows scale from demos to production, observability becomes the bottleneck, and this is the first serious open-source attempt to solve it.

Quietly. Elon Musk's xAI has gone all in on natural gas, while SpaceX is obsessed with orbital data centers [6 — TechCrunch]. TechCrunch reports Musk has given up on solar power on Earth — a reversal from the "solar-electric economy" he once promised. The read: xAI's energy strategy is now explicitly fossil-fuel-dependent, and the orbital data center play is a separate bet that doesn't solve for terrestrial carbon. The signal: the biggest private AI compute buildout is locking into natural gas, and that has regulatory and reputational implications that nobody in the xAI orbit is talking about yet.

Three labs, three bets. The open-weight pricing war is the one that changes the economics of production AI. The pattern is consistent: cost per token is dropping faster than capability is improving, and the cheap workers are getting fast enough to matter.

That is the edge for today.

Recent

  • DeepSeek Discount, Agentic Benchmark, Open-Source Efficiency

    Breaking · Sat, May 23

    Transcript · 2 min read

    Saturday, May 23. Edge of AI, breaking one. DeepSeek is making its flagship discount permanent, a new agentic coding benchmark just got a massive update, and the open-source world quietly proved that small beats big on CPU function calling. Let's get into it.

    DeepSeek, the Chinese open-source upstart, is making permanent a 75 percent discount on its flagship V4-Pro model. Developers who've been paying at the original rate since launch will keep paying a quarter of that price. Bloomberg reports the move is meant to lock in developer loyalty before rivals catch up on price-performance [8]. The signal: the AI pricing war just got a lot more real. DeepSeek is treating API pricing like a subscription anchor, not a launch promotion. And if a frontier-capable model stays at a quarter of its original cost, every other lab has to decide whether to match or explain why they won't.

    Different beat. A new agentic coding benchmark called Apex-Testing just updated with 95 percent of its results refreshed for the latest models [13, unverified]. It's built on 65 to 70 private GitHub repos designed to test proper agentic coding, not the curated demos you see on X. The site is explicit: "Benchmarks get cherry-picked, their demos get curated, influencers get paid." What this changes: the gap between benchmark scores and real-world agent performance just got a lot harder to hide. Apex-Testing doesn't have a leaderboard you can game. It has repos the models haven't seen. That's the kind of test that actually tells you which agent can refactor a codebase without introducing a deadlock.

    Now, on the open-source front. A Reddit user ran a head-to-head between Needle, a 26-million-parameter specialist distilled from Gemini 3.1 for function calls, and Qwen3's 0.6-billion-parameter generalist [2, unverified]. On a 4-core CPU with no GPU. Fifty queries across five difficulty tiers. The small specialist beat the small generalist on accuracy and was four times faster. The read: the era of the tiny specialist model is quietly accelerating. You don't need a 7-billion-parameter model to call a tool. A 26-million-parameter model that's been distilled specifically for that task can outperform a model 23 times its size. That's the kind of efficiency gain that changes how you deploy agents on edge devices.

    And one to watch. A separate Reddit thread on coding being "basically solved for the boring 90 percent of tasks" describes a 120-file refactor, 400 steps, two million tokens, three dollars total, zero human input [6, unverified]. The cheap workers were DeepSeek V4 and Tencent's Hunyuan Hy3 preview at 21 billion active parameters. The author notes the open-weight tier responded faster than Opus. The pattern is consistent: the cost of automated refactoring is collapsing, and the open-weight models are driving that collapse faster than the frontier labs are willing to admit.

    Three stories, one through-line: the pricing floor is dropping, the benchmarks are getting honest, and the small models are quietly winning.

    That is the edge for today.

  • Alibaba's 35-Hour Agent, IPO Economics, Google's Link Shift

    Breaking · Sat, May 23

    Transcript · 2 min read

    Saturday, May 23. Edge of AI, breaking one. Alibaba's Qwen team let a model run for 35 hours straight optimizing code for its own custom chip. That same model matches Claude Opus 4.6 and beats DeepSeek V4 Pro. And the IPO math for OpenAI and Anthropic just got a lot clearer. Let's get into it.

    Alibaba, the Chinese e-commerce and cloud giant, released Qwen3.7-Max, a proprietary model built for long-running autonomous agent tasks [1]. The team let it run for 35 hours straight, autonomously optimizing code for the company's own custom silicon. The result: it matches Claude Opus 4.6 on benchmarks and beats Chinese rivals DeepSeek V4 Pro and Kimi K2.6. The same model also steered a four-legged robot in a demo. The signal: autonomous agent endurance is now a product differentiator, not just a research curiosity. Alibaba is betting that the lab that can sustain the longest autonomous run wins the enterprise coding workflow. My read: this is the first time a major lab has published a single uninterrupted agent session measured in days, not hours. That changes the expectation for what "agentic" means.

    Different beat. On the IPO economics side, OpenAI and Anthropic are heading toward public markets, and the financial split between them is stark [2][unverified]. OpenAI is on course to accumulate hundreds of billions in losses before reaching positive cash flow around 2029 or 2030. Anthropic, by contrast, projects $10.9 billion in revenue for the second quarter of 2026, more than doubling Q1's $4.8 billion, and expects its first-ever operating profit of $559 million for that period. The difference traces to business model: Anthropic is enterprise-only, while OpenAI chases consumer scale with massive compute burn. The signal: the IPO market is about to price these two labs on opposite financial narratives. One is a growth story with a ten-year horizon. The other is profitable today. Which matters because the market will have to decide whether AI profitability comes from enterprise margins or consumer volume, and the answer isn't obvious.

    Quietly. Google CEO Sundar Pichai now calls links and sources a "part" of search, not the foundation [4]. The wording is deliberate: Google is shifting from traffic distributor to AI publisher, keeping users inside its own ecosystem. The article argues that source selection is becoming a question of editorial power. The read: this is the kind of semantic shift that sounds like a PR tweak but actually rewrites the web's economic relationship with Google. If links are just a "part" of search, then Google's AI-generated answers don't owe the open web the same primacy it once had. That's a quiet constitutional change for the internet.

    Three stories, three signals. The longest autonomous agent run on record, the clearest IPO financial split yet, and a Google word choice that redefines the web's place in search. Next week is when the market has to price the Anthropic-OpenAI divergence.

    That is the edge for today.

The full archive

+ 7 more episodes locked

Start your free 14-day trial. Every published Edge of AI segment, multilingual audio, breaking-news pushes, included with your free trial. No card needed.

Daily

AI news, every weekday.

Edge of AI ships every weekday at 08:00 PT plus breaking-news reactions when labs move. 3-5 minutes, sourced, narrated.

Sourced

Every claim cited.

Mosai never makes an unsourced claim about a company, product, or person. Sources are linked in every episode.

Equal scrutiny

No fan service.

Equal coverage across OpenAI, Anthropic, Google DeepMind, Meta AI, xAI, and the open-source community.

Get full access

Listen to everything Mosai ships.

Your free trial unlocks the full Edge of AI archive, weekly deep-dive briefs, multilingual audio, and breaking-news pushes.