Hey folks

Google I/O starts today, and Logan tweeted: “The model is the product”. There have been some rumours that the latest Gemini model scores similar on benchmarks to GPT 5.5 - but we’ll see how it feels when actually using it - previous models also scored well but didn’t feel great to work with.

When models are so good, harnesses will be much less important. I just don’t think today is the day that happens. And on that point, the role of a harness will probably just shift - instead of managing how/which tools to use, the system prompt, context management etc it could be managed agents, sandboxing, cloud/local management.

I started using Codex on my phone…but not all that much to be honest. A lot of the agent harnesses these days have ways to control your sessions from your phone - Claude Code has /remote-control, Pi can build one for itself (i use a telegram one) and Droid has mobile web + Droid computers.

Most of my mobile first work at the moment is more brainstorming than building and I find myself flitting between all these options all the time.

I used to use my OpenClaw bot like an addict, but haven’t spoken to the poor bastard for weeks now.

It may help that I’m currently focused on just one (ish) main thing - this ‘course’. Which is really more of a library or reference manual on how I think about agents, how I steer them and build with them.


Ben’s Bites is brought to you by Hyperagent from Airtable

Hyperagent, the cloud agent system with full computing environments, is giving $10M in inference credits to help founders build and run agent-first companies. The first 500 qualifying applicants gain access to this limited founder offer. Applications close May 31st.


  • Codex now connects your Mac to your phone. You can start tasks in Codex from your phone, but the actual work still runs on your Mac, devbox or remote machine, i.e. files, setup and credentials stay where they are, while you can approve commands, answer questions, and review diffs from your phone. This update also brings Hooks to Codex.

  • Anthropic is acquiring Stainless, a platform to build SDKs (also used by OpenAI), and they are shutting the service down. Also, at their London conference, they added self-hosted sandboxes and MCP tunnels to Claude Managed Agents - their “running agents made easy” product for companies.

  • Cloudflare tested Anthropic’s Mythos against 50 of its repos. Quick takeaways:

    • Mythos is great at spotting real attacks, which are often many small vulnerabilities connected in a chain.

    • A single model, however smart, without a good harness leaves a lot to be found.

    • “Find bugs fast and patch them faster” is not a good idea. Teams need to focus on making bugs harder to chain (even if they exist) and to exploit.

  • Cursor’s Composer 2.5 (partly trained on SpaceX’s GPUs) is out. The selective benchmarks that Cursor reports put the model roughly at the same place as Opus 4.7-xhigh and GPT-5.5-high, while being much cheaper than them.




Share Ben's Bites


* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?