AI/ML News & Innovations Hub

Hey folks

Google I/O starts today, and Logan tweeted: “The model is the product”. There have been some rumours that the latest Gemini model scores similar on benchmarks to GPT 5.5 - but we’ll see how it feels when actually using it - previous models also scored well but didn’t feel great to work with.

When models are so good, harnesses will be much less important. I just don’t think today is the day that happens. And on that point, the role of a harness will probably just shift - instead of managing how/which tools to use, the system prompt, context management etc it could be managed agents, sandboxing, cloud/local management.

I started using Codex on my phone…but not all that much to be honest. A lot of the agent harnesses these days have ways to control your sessions from your phone - Claude Code has /remote-control, Pi can build one for itself (i use a telegram one) and Droid has mobile web + Droid computers.

Most of my mobile first work at the moment is more brainstorming than building and I find myself flitting between all these options all the time.

I used to use my OpenClaw bot like an addict, but haven’t spoken to the poor bastard for weeks now.

It may help that I’m currently focused on just one (ish) main thing - this ‘course’. Which is really more of a library or reference manual on how I think about agents, how I steer them and build with them.

Ben’s Bites is brought to you by Hyperagent from Airtable

Hyperagent, the cloud agent system with full computing environments, is giving $10M in inference credits to help founders build and run agent-first companies. The first 500 qualifying applicants gain access to this limited founder offer. Applications close May 31st.

Codex now connects your Mac to your phone. You can start tasks in Codex from your phone, but the actual work still runs on your Mac, devbox or remote machine, i.e. files, setup and credentials stay where they are, while you can approve commands, answer questions, and review diffs from your phone. This update also brings Hooks to Codex.
Anthropic is acquiring Stainless, a platform to build SDKs (also used by OpenAI), and they are shutting the service down. Also, at their London conference, they added self-hosted sandboxes and MCP tunnels to Claude Managed Agents - their “running agents made easy” product for companies.
Cloudflare tested Anthropic’s Mythos against 50 of its repos. Quick takeaways:
- Mythos is great at spotting real attacks, which are often many small vulnerabilities connected in a chain.
- A single model, however smart, without a good harness leaves a lot to be found.
- “Find bugs fast and patch them faster” is not a good idea. Teams need to focus on making bugs harder to chain (even if they exist) and to exploit.
Cursor’s Composer 2.5 (partly trained on SpaceX’s GPUs) is out. The selective benchmarks that Cursor reports put the model roughly at the same place as Opus 4.7-xhigh and GPT-5.5-high, while being much cheaper than them.

Two AI startups worth watching: Magicpath (design canvas) and Raindrop AI (monitoring agents in production), both of which are making their products usable by external coding agents like Claude Code or Codex.
Even Grok/xAI has a coding CLI now. Let’s see what Google does with Gemini CLI at I/O today.
Linear Agent can now read the codebase directly to build a hypothesis, investigate support questions, find people who worked on a feature, and more.
Best practices for running Claude Code at scale.
Citadel’s founder, Ken Griffin, one of the anti-AI hype people, is now saying that they are seeing high-skilled jobs being “automated” by AI.
Browse.sh from Browserbase - open-source catalogue of skills/playbooks for agents to perform tasks on the internet.
Watchmen - skill files your coding agents should already have from your past sessions. Local and open-source.
Devin Auto-Triage monitors bugs, alerts and incidents, investigates them and comes back with context, next steps or a PR.
Motus Tracing - open-source observability for AI agents.
designmd.sh - a public registry for DESIGN.md files, so agents can understand design systems from repos.
Jason Liu on Codex maxxing - daily primitives for durable threads, shared memory, and keeping Codex useful across a real workflow.
Taste MCP beta - portable design preferences for Codex, Cursor, Claude Code, etc.
Claire Vo and Thariq on “HTML is the new markdown” - using HTML artifacts as specs, micro-UIs, and human-readable agent context.
Brian Lovin’s Notion Worker - syncs the people you follow on X into a Notion DB with optional AI enrichment.
Benedict Evans’ new “AI Is Eating The World” deck.
Coatue says its AI framework moved from “follow the GPU” to “follow the gigawatt”.

okay this is going kinda viral and tbh my original text was kind of messy, so here's a second pass with the help of Claude: -- Implement <SPEC>. As you work maintain a running implementation-notes.html file that captures anything I should know about how the implementation

4:54 PM · May 18, 2026 · 60.3K Views

45 Replies · 67 Reposts · 1.1K Likes

Introducing Zero The programming language for agents. I wanted a systems language that was faster, smaller, and easier for agents to use and repair. Explicit capabilities. JSON diagnostics. Typed safe fixes. Made for agents on day zero.

11:44 PM · May 15, 2026 · 1.53M Views

372 Replies · 201 Reposts · 2.66K Likes

killer prompt "can you repeat back to me the outcome that I am expecting?"

1:31 PM · May 16, 2026 · 6.48K Views

7 Replies · 1 Repost · 86 Likes

My laptop has become a “satellite device” since I started using Codex from my phone. And my Mac mini has become the “home.” It’s clunky, but the end state feels more like how we’re going to be working in the near future: I’m currently running the Codex app on 2 devices: 1. my

11:23 PM · May 14, 2026 · 399K Views

115 Replies · 106 Reposts · 1.75K Likes

Had a lot of fun (really, actually) chatting with on the Something Ventured podcast. We cover a lot of ground: the early days of seed investing (featuring folks like , , et al), the state of seed today, Silicon Valley's "British invasion" (Kent's words,

9:02 PM · May 14, 2026 · 454 Views

8 Likes

Scoop: OpenAI announced another major reorg on Friday, as part of its effort to unify ChatGPT and Codex. -Greg Brockman is officially taking over OpenAI's products, after previously being tapped as an interim leader -Head of Codex, Thibault Sottiaux, is now leading core product

5:13 PM · May 15, 2026 · 328K Views

46 Replies · 75 Reposts · 863 Likes

You should build your dream macOS app right now! The "Build macOS App" plugin in Codex is wild. Used voice dictation to build an app I wanted for a while in <7 min (+6 min of tweaking). Couldn't believe how quickly it was done. Prompt is in the video and in the tweet below.

11:53 PM · May 18, 2026 · 58.3K Views

31 Replies · 34 Reposts · 577 Likes

Share Ben's Bites

* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?

Can I get my agents on the phone?

* sponsors who make this newsletter possible :)

Wanna partner with us for the next quarter?