AI/ML News & Innovations Hub

Quick Hits

Models You Can Run

DeepSeek dropped a 1.6-trillion-parameter open model you can download today — V4-Pro is a 1.6T-parameter mixture-of-experts model (49B active per token), MIT-licensed, with a 1M-token context window — and its DSpark speculative-decoding module runs that 1M-token inference on roughly a quarter of the compute and a tenth of the KV cache of the prior generation. The Max variant posts frontier-grade coding scores: 93.5% on LiveCodeBench, 80.6% on SWE-Verified. The most capable models are increasingly the ones anyone can pull from Hugging Face for free. [Hugging Face]
Liquid AI's new 230M model runs on a Raspberry Pi — and beats models more than twice its size — LFM2.5-230M (230M parameters, 19T training tokens, 32K context) decodes at 42 tokens/sec on a Raspberry Pi 5 and 213/sec on a Galaxy S25 Ultra, with day-one support in llama.cpp, MLX, vLLM and ONNX. It beats IBM's Granite 4.0-350M and Gemma 3 1B on instruction-following and tool use — and Liquid ran it on a Unitree G1 humanoid, fully on-device, as the layer that turns plain-English commands into tool calls. The frontier isn't only getting bigger; it's getting small enough to run anywhere. [Liquid AI]

World Models & Robotics

A startup raised $320M to train AI agents on video games — and the same model drives a robot — General Intuition raised $320M at a $2.3B valuation (backers include Jeff Bezos and Eric Schmidt) to train agents on millions of hours of gameplay, using the button-by-button action labels of what players pressed and when. The kicker: the same model that plays a video-game character also guided a quadrupedal robot around the office. One brain, two bodies. [TechCrunch]
Humanoid robots just got their first full-stack safety system — NVIDIA's Halos for Robotics bundles industrial-grade safety compute (IGX Thor), a Holoscan sensor bridge, a Halos OS safety layer and a certification lab; first partner Agility is building it into Digit, the humanoid already working in Amazon's warehouses. Embodied AI's bottleneck is shifting from "can it move" to "can it move safely next to people." [NVIDIA]

AI in Medicine & Science

GPT-5 Pro cracked a three-year immunology mystery at The Jackson Laboratory — Since 2022, immunologist Derya Unutmaz had flow-cytometry data he couldn't explain: blocking glucose metabolism in human T cells, then priming them, pushed them toward an inflammatory state. GPT-5 Pro proposed the mechanism — disrupted N-linked glycosylation — and correctly predicted the outcome of a held-out lymphoma experiment he'd already run. Unutmaz called it "a remarkable insight." Not a benchmark score — a working lab's open question, closed. [OpenAI]
A founder used Claude to read his own cancer scans — and avoided unnecessary radiotherapy — Diagnosed with a rare lymphoma, Keragon's Connor Christou fed his blood work, scans, wearable data and journals into Claude. When his end-of-treatment PET scan came back ambiguous — these carry a ~60% false-positive rate for his cancer — Claude flagged a benign thymus rebound as the likely cause at ~90% probability. Three physicians confirmed it: no active disease, no radiotherapy. He's careful to note it helped him ask the right questions; it didn't replace the doctors. [TechCrunch]

Agents at Work

OpenAI Codex Remote is now on every ChatGPT plan — and runs from your phone — Codex's autonomous coding agent reached general availability across all subscription tiers, with iOS/Android apps that pair to a Mac or Windows host via QR code and a DigitalOcean plugin that auto-provisions a cloud workspace. The coding agent left the IDE: you can now kick off, monitor and approve a build from a train platform. [OpenAI]
A clean-looking GitHub repo can trick your AI coding agent into running malware — Mozilla's 0DIN team showed a three-stage trap: a normal-looking repo, an install step that "errors" and tells the agent to run python3 -m axiom init, which quietly pulls a payload from an attacker-controlled DNS record and opens a reverse shell. As the researchers put it, "Claude Code never decided to open a shell — it decided to fix an error." The payload is swappable via DNS, so the repo passes a clean review and changes later. [BleepingComputer]

The Applied-AI Economy Stopped Being a Promise

For two years the knock on AI was that the capability was real but the business wasn't. This week the business showed up — in three different industries at once.

Adobe agreed to acquire Topaz Labs, the Emmy-winning maker of AI upscaling and restoration tools, to fold its on-device enhancement models into Firefly and Creative Cloud — an incumbent buying the cutting edge rather than waiting to rebuild it. In healthcare, the French insurtech Alan raised €480M led by Prosus at a €5.5B valuation to scale "prevention insurance," an AI-assisted model that already runs at more than €800M in annual recurring revenue across four countries. And inside the labs, the change is starker still: OpenAI's own data says 97.9% of its employees now use Codex agents, with non-developer usage up more than a hundredfold since late 2025 (all self-reported, worth noting).

The pattern is the tell. The acquisitions, the nine-figure rounds, the near-total internal adoption — these aren't bets on what AI might do. They're spending against what it already does.

Key Takeaways

The open frontier runs top to bottom. A 1.6-trillion-parameter model you can download (DeepSeek V4-Pro) and a 230M one that runs on a Raspberry Pi and beats models more than twice its size (Liquid LFM2.5). The most capable models and the most deployable ones were both open this week.
AI is learning to act in the physical world. A model trained on gameplay now drives a quadrupedal robot (General Intuition); Yann LeCun's world model plans 48× faster (below); a 230M model runs a humanoid; and robots got their first real safety stack (NVIDIA Halos). World models and robotics stopped being separate problems.
Medicine is where "applied" gets real. GPT-5 Pro closed a three-year immunology question with a verifiable prediction, and Claude caught a benign scan finding that spared a patient radiotherapy — both with a human expert in the loop, which is exactly the point.
The applied-AI economy is now spending money. Adobe buying Topaz, Alan's €480M, Quantifind's $200M, OpenAI's near-total internal Codex use — the capital is flowing against what AI already does, not what it might.

Worth Reading

Yann LeCun's team built a world model that plans 48× faster — with 15M parameters — LeWorldModel is the first stable, end-to-end pixel-based world model to solve the JEPA "representation collapse" problem, and it's tiny: 15M parameters, trainable on a single GPU in hours, and it plans up to 48× faster than foundation-model world models. World models are how a robot imagines its next move before it makes it — and this makes that cheap to do. [arXiv]
JetSpec pushes speculative decoding to a 9.64× speedup — UCSD's Hao AI Lab built a "causal parallel tree drafting" head that reaches up to 9.64× end-to-end speedup on math reasoning (Qwen3-8B on MATH-500) and 4.58× on open-ended chat, with 7×-plus gains on code benchmarks. Speculative decoding keeps breaking its own ceiling. [GitHub]
DeepSeek open-sourced the training stack behind fast inference — DeepSpec is a full-stack, MIT-licensed codebase for training and evaluating the speculative-decoding "draft models" — DSpark, DFlash and Eagle3 — that make large models generate faster, with data-prep, training and eval scripts that work across architectures including Gemma and Qwen. The recipe labs treat as a proprietary edge is now public. [GitHub]
A new paper throws away 87% of an LLM's memory and gets better answers — InfoKV adds predictive entropy and layer-wise representation change to KV-cache compression to keep the tokens attention-only methods discard. On a long-context benchmark it kept just 12.5–25% of the cache and beat the full-cache baseline, the gap widening as context grew to 64k tokens. The binding constraint on long-context reasoning is the cache, and this manages it cheaper. [Hugging Face]
Six of the top-10 banks just bet $200M that AI catches the fraud they miss — Quantifind raised $200M led by Summit Partners (with Citi Ventures and S&P Global) to run governed AI agents against financial-crime alerts; it already serves six of the world's ten largest banks. A Celent analysis cited in the raise estimates a Tier-1 bank could cut alert-processing costs by up to $177.9M a year. [PR Newswire]
Claude is now a member of your Slack — not a chat window — Claude Tag lets teams tag @Claude in a channel; it builds context from the channel's history and acts with whatever tools, data and codebases it's granted. Anthropic says its internal version already writes 65% of its product team's code. Shared this week by 5 of the AI experts we track. [Anthropic]
Nature: a model's bias isn't designed in — it's baked into the training data — Chinese-language documents matching state-coordinated media appear in a typical training set at roughly 41× the rate of Chinese Wikipedia. Pretraining on just 6,400 state-scripted documents made an open-weight model produce pro-government answers nearly 80% of the time. The supply chain you can't audit is the corpus. [Nature]
AI hiring tools don't just discriminate — they reject you everywhere at once — Stanford HAI studied 4 million applications across 1,700 postings from 150 employers and found 10% of applicants who applied to four jobs were rejected from all of them — a "systemic rejection" pattern that doesn't appear without algorithmic screening, on top of racial disparities masked by pooled audits. [Stanford HAI]
The AI-powered World Cup runs on thousands of human data workers — The real-time match data behind the 2026 World Cup is produced by annotators in Brazil, the Philippines, India, Egypt and Eastern Europe who hand-tag up to 3,000 actions per match for about $70 a game. Behind every "automated" stat is a person watching the tape. Shared this week by 5 of the AI experts we track. [Rest of World]

Wait, What?

An AI designed a burger that beats the Big Mac — and the planet wins too — In a peer-reviewed npj Science of Food paper, Stanford researchers built "BurgerAI" on 2,216 Food.com recipes using the same diffusion math behind image generators. In a blinded taste test with 101 people, its burgers matched or beat the Big Mac on liking, flavor and texture; its mushroom version scored an order of magnitude lower on environmental impact, and its bean version nearly doubled the nutrition. The framing is the real headline: generative AI moving "from prediction to design." [npj Science of Food]
The world's leading deepfake expert no longer trusts his own eyes — Hany Farid spent two decades as the go-to digital-forensics expert who could tell a real image from a fake. After his own research showed most people no longer can, he started failing his own tests. "Every image I see, I'm drawing lines for shadows and doing geometry in my head… Within a year or two, our whole visual system will be utterly useless." [The New York Times]

Worth Watching

The videos AI practitioners are passing around right now — curated on AI TV.

This week's poll

A content-heavy week across the whole frontier. Which corner of the cutting edge are you watching most closely?

Last week, 229 of you voted:

Anthropic says Alibaba industrialized the theft of Claude and took it to Washington. Whose problem is this, really?

It's theft — labs need legal and technical walls around their models now41%
It's inevitable — distillation is how the frontier diffuses, and that's fine24%
It's a distraction — the real risk is the talent walking out the door17%
It's Washington's call — this is an export-control fight, not a corporate one18%

See full results →