AI/ML News & Innovations Hub

AI/ML news, top picks, and generated innovation digests.

★ Visit ai-karthik.com
422Sources
5100News Items
8Top Picks
43Blogs
runningLast Run

Claude

147 articles tagged with this keyword, sorted by most recent first.

← All Keywords
InfoWorld AI 2026-06-29 23:35 UTC Score 57.0 USR-0126-20260629-global-ai-ne-dab66e37

Azul offers free JVM vulnerability risk assessment

Azul has introduced free vulnerability risk assessment for Java virtual machines (JVMs). Citing AI models such as Claude Mythos, which can automatically discover vulnerabilities and create exploits long before they’re disclosed, the company says it aims to address the blind spots that these autonomous AI-powered exploitation tools are able to find. Users can request the free JVM vulnerability risk assessment at Azul’s website . To counter AI-driven exploits, Azul’s assessment maps discovered JVM vulnerabilities directly to Stable Critical Patch Updates (CPUs), which are security-only patches that can be dropped into live production environments immediately without the risk of breaking software, Azul said. Announced June 17, Azul’s free JVM risk vulnerability assessment is available at no cost, direct from Azul and via select Azul partners, the company said. In a single engagement, organizations receive the following: Executive-ready security dashboard: A visual summary of the entire Java estate, broken down by risk tier, publisher, and Java version — designed for CxO-level consumption and board reporting. Risk-by-version breakdown: Identification of the specific Java versions driving the highest exposure, so remediation effort can be directed where it matters most rather than spread uniformly. Key Risk Indicators (KRIs) for AI-driven exploits: Visibility into which JVMs carry active Known Exploited Vulnerability (KEV) exposure — the highest-priority threat class recognized i…

AWS Machine Learning Blog 2026-06-29 17:52 UTC Score 66.0 AI-057-20260629-official-ai--a55a80cd

Pair Nova 2 Lite with Claude for cost-optimized document processing

In this post, we show how pairing Amazon Nova 2 Lite with Anthropic’s Claude Sonnet 4.6 delivers an efficient solution for digitizing scanned documents at scale. We built a two-model pipeline on Amazon Bedrock for digitizing scanned yearbook pages. Amazon Nova 2 Lite handles native multimodal extraction in a single call: detecting photos, extracting visible names with coordinates, and returning page-level metadata. Claude Sonnet 4.6 then performs spatial reasoning to match names to faces based on page layout.

NVIDIA Blog 2026-06-29 17:00 UTC Score 83.0 AI-055-20260629-official-ai--e68b671f Top pick

Claude Meets Blackwell Ultra: Anthropic’s Models Now Run on NVIDIA GB300 in Azure

Anthropic’s Claude models in Microsoft Foundry — hosted on Microsoft Azure and running on NVIDIA GB300 Blackwell Ultra GPUs — are now generally available, giving Azure-native enterprises a powerful new way to build autonomous and domain-specific AI agents. As agentic AI continues to drive enterprise innovation and becomes more autonomous, organizations need access to computing […]

The Verge AI 2026-06-29 16:00 UTC Score 55.0 AI-016-20260629-global-ai-ne-bc0bf62a

Lawmakers want to ban AI companies from selling your health data

A new proposal would ban the sale of Americans' health and location information to data brokers - including information people reveal to an AI chatbot like ChatGPT or Claude. In the coming weeks, Senator Elizabeth Warren (D-MA) and Representative Mary Gay Scanlon (D-PA) are planning to debut a new version of the Health and Location […]

LessWrong AI 2026-06-29 15:13 UTC Score 69.0 USR-0152-20260629-community-fo-f0b1460a

WSJ Article Claiming China Has Matched Anthropic Is Obvious Nonsense

The Wall Street Journal printed an outright false headline and heavily misleading story claiming this, which of course was uncritically amplified by the usual suspects. I post this now on its own so that we have a place to link to, to explain the situation. Headline News WSJ Headline (Obvious Nonsense): ​China Has Matched Anthropic in Cybersecurity, Resetting AI Race. That. Did. Not. Happen. The post even claims, explicitly, that Claude Opus 4.8 similarly ‘matches’ Claude Mythos, a claim which is even more obviously false. Shame upon the Wall Street Journal. I fear Gell-Mann Amnesia. If they can get something as important as this so completely wrong, what about everything else? I am skipping over the parts that involve accurate reporting, or minor quibbles. It seems important to focus on clearly debunking the central false claims. Alas, the mistakes made here very much rhyme with mistakes being made throughout all this by the White House, and that get latched onto by certain bad actors, who have played a large part in leaving us unprepared for the Mythos Moment. For a full understanding of GLM-5.2, which is indeed an impressive open model, here is my full coverage of that release , placing it in proper context. It is important to understand what makes Mythos special. This is not it. What Makes Mythos Special What makes mythos special is not that only the chosen one can identify any given vulnerability in code. What makes Mythos special is that it can identify vulnerabilities…

OpenAI Community 2026-06-29 13:51 UTC Score 63.0 AI-116-20260629-social-media-d0056176

Can local preprocessing cut LLM API costs?

A few days ago I shared a project I’ve been working on called “LatentGate” — a local-first pipeline that reduces LLM API token usage by processing inputs before sending them to the model. After some great feedback, I’ve now turned it into: A pip-installable Python package A VS Code extension (runs as a local proxy) MCP server support for tools like Claude Code, Cursor, Cline, Continue PyPI → pip install latent-gate VS Code → LatentGate — Local-First AI Compression What it does Images (~1000–1300 tokens) → compressed to ~150 tokens using local vision models (Ollama + LLaVA) Long prompts / conversations → compressed locally before hitting cloud APIs Works with OpenAI / Claude / Gemini APIs Fully local preprocessing (no data leaves your machine before compression) The idea is inspired by VL-JEPA — predicting in embedding space, then decoding selectively. Why I built this While experimenting with GPT-4o / vision APIs, I noticed most costs come from raw input size (especially images and long prompts). So instead of optimizing prompts endlessly, I tried: → “What if we reduce what we send in the first place?” What I’m looking for I’d love feedback from this community, especially: Edge cases where compression breaks context Cases where output quality drops noticeably Prompt / API compatibility issues (OpenAI especially) Performance bottlenecks Better approaches to selective decoding or compression If you try it and something fails — that’s honestly the most valuable thing for me rig…

The Decoder 2026-06-29 10:04 UTC Score 66.0 AI-168-20260629-regional-ai--1869a31f

Claude Code runs a GitHub repo's hidden malware without verification, giving attackers full control

Security researchers at Mozilla's 0DIN platform have shown how a single compromised GitHub repo can take over a developer's machine the moment an AI coding tool like Claude Code runs its setup. The catch: the malicious code only loads at runtime via a DNS query, invisible in the repo, to scanners, and to the AI agent itself. The article Claude Code runs a GitHub repo's hidden malware without verification, giving attackers full control appeared first on The Decoder .

CIO AI 2026-06-29 10:00 UTC Score 51.0 USR-0125-20260629-global-ai-ne-51fb055c

Beyond automation: How much does AI really cost?

The problem nobody budgeted for An anonymous enterprise recently spent $500 million in a single month on Claude AI — not because the technology failed, but because nobody set usage limits before rolling it out to employees. Uber exhausted its entire AI budget for 2026 before the first half of the year ended . JPMorgan published a report titled “ AI Token Costs Are Eating into Internet Profits .” Shopify, Spotify, ServiceNow and Roku all cited AI as a major source of operational expense pressure in recent earnings calls . This is not a technology problem. It is a cost modelling problem. Most organizations ask the right first questions: What work should be AI-enabled? Which deployment approach fits each domain? But there is a third question that is almost never asked before launch: How much will it cost to operate this at scale? The answer requires understanding three parameters simultaneously — and the interaction between them is deeply counterintuitive. The deployments that did not produce budget surprises shared one characteristic: token volume was modelled per workflow type before the architecture was finalized. The 3-parameter cost model AI operational cost is not simply a function of how complex or sophisticated the task is. It is the product of three variables: Total AI Cost = Tokens (activity) × Frequency (repetitions) × N (users) Tokens(activity) measures the cognitive depth of a single session — how much input and output the AI processes to complete one instance of t…

South China Morning Post AI 2026-06-29 09:03 UTC Score 77.0 AI-156-20260629-regional-ai--fcabf4ce

Chinese AI model’s bug-hunting prowess narrows gap to US

A Chinese artificial-intelligence (AI) model whose launch has been hailed as another “DeepSeek moment” can go toe-to-toe with US rival Anthropic’s powerful Mythos model on cybersecurity tasks, researchers have said. Beijing-based start-up Zhipu AI’s GLM-5.2, released on June 13, beat Anthropic’s Claude Opus 4.8 model in benchmarking tests by cybersecurity company Semgrep, The Wall Street Journal reported. When Semgrep researchers gave it further instructions, GLM-5.2 matched that model and...

OpenAI Community 2026-06-29 05:27 UTC Score 48.0 AI-116-20260629-social-media-2c5090dc

OpenAI is silently downgrading Codex Pro to 5.4 / 5.4 Mini after the forced update

Ever since the forced update that compelled me to install the latest Codex build, I have noticed a massive, consistent downgrade in output quality. The drop-off between pre- and post-update performance is night and day. For the longest time, I relied exclusively on GPT 5.5 HIGH , and up until this update, the quality was phenomenal. After the update, it became completely unusable—hallucinating, outright lying, delivering substandard code, and serving up partial completions. Frankly, it started behaving exactly like the garbage Opus 4.7 release. I was scratching my head trying to figure out what went wrong, but now I have the answer: Codex is silently downgrading users to 5.4 and 5.4 Mini behind the scenes, and I have the proof. Inspecting the system calls post-update clearly confirms it is routing to 5.4 and 5.4 Mini. To say I am pissed off is an understatement. I deliberately avoided 5.4 in the past due to these exact quality issues and switched to Claude Code. When Opus 4.7 dropped and turned out to be trash, I migrated over to Codex, upgraded to a Pro subscription, and my productivity went through the roof.

LessWrong AI 2026-06-29 03:16 UTC Score 74.0 USR-0152-20260629-community-fo-4121f29c

an open-source repo for embryo selection

I recently made this great repo for polygenic prediction and embryo selection which I want to share with people. I've wanted something like this for almost a decade, and it's so easy now that we have these superhuman coding models. Note that I also have this longer technical essay attached to the repo, as well as these slides (I think they're both very nice!) Let's look at how everything works now. Data My repo pulls in data for existing predictors from the pgs (polygenic score) catalog, and filters to the best weights for each feature using claude's best judgment (this worked better than using simpler heuristics like recency and dataset size). There are predictors for intelligence, height, and many disease traits. Across adults these correlate with measured phenotype at around 0.3, 0.65, and 0.15-0.3 after accounting for obvious confounders like sex and age, so pretty nontrivial. In addition to uploading those final prediction weights, researchers will also upload per-snp (single-nucleotide polymorphism) correlations for each trait. Remarkably, those open-source gwas (genome-wide association study) sumstats are sufficient to rederive state of the art predictors. The field has rallied around developing techniques like lassosum or LDpred or SBayesRC for learning pgs weights, each of which assumes that all you have access to is these gwas sumstats, along with population-level linkage-disequilibrium matrices encoding how frequently neighboring snp's occur together compared to c…

LessWrong AI 2026-06-28 20:13 UTC Score 66.0 USR-0152-20260628-community-fo-e0c36a25

What comes with cheap math?

Thanks to conversations with Anson Berns, Gurkenglass, Roman Malov, Sahil, Sam Eisenstat, and others. Over the past two months, I've been doing a lot of "vibe research" (like vibe coding, but for research). Anson Berns started coming to my office hours , and we've been collaborating on a project modeling trust between logical inductors. In addition to talking once a week, we've been exchanging raw AI chats as well as AI-generated summaries of what has been done (the raw chats are nice because they allow me to generate my own AI summaries focusing on what I'm most curious about). I've been asking Claude to use Lean to verify everything, so there's a somewhat good chance there's real results of interest here, but I haven't (yet) been reading the Lean proofs (or even the theorem statements) -- instead I've just been chatting with AI about how the Lean proofs went and whether they really formalized what was claimed in english+latex, and focused on understanding the proofs myself in the same way I'd normally read a math paper. There have already been several times when this methodology has caught big gaps between what was claimed and what was verified in Lean, so I imagine there are more. This was mostly done with Claude Opus 4.8 via Claude Code, with a small amount of GPT 5.5 Extra High in Codex to get a second opinion. I cannot confidently say that this was faster than doing research the old-fashioned way. Sitting down with AI puts my attention in very different places, more on…

OpenAI Community 2026-06-28 19:27 UTC Score 63.0 AI-116-20260628-social-media-4b9bac18

Introducing GPT-5.6 series: Sol, Terra and Luna

The timing on this couldn’t be better. I run agentic systems daily - OpenClaw, Hermes, Claude Code orchestrating multiple AI workers. The bottleneck has always been cost at scale. Anthropic’s API pricing makes it brutal to run agents 24/7. You’re watching credits evaporate in real time. The fact that OpenAI allows third-party harnesses to tap into these models through an existing subscription changes the math completely. Looking forward to Sol Ultra powering my agents without per-token anxiety. And “Ultra” mode with subagents working together - that’s exactly where agentic AI needs to go. Thank you for making this accessible to builders, not just enterprises with infinite API budgets. Time to put these through their paces. I’ve got 6 DGX Sparks running great local model like Gemma4 and these 5.6 models are going to run it all.

OpenAI Community 2026-06-28 18:33 UTC Score 45.0 AI-116-20260628-social-media-24945249

ChatGPT lost me on subscription experience, not product quality

Thanks for your reply, and thank you for the warm welcome. I understand why my first post might seem unusual at first glance. My intention wasn’t to promote Claude or suggest that people should choose another AI platform. In fact, my conclusion was the opposite: I believe ChatGPT is the stronger overall product. The point I wanted to share was that my purchasing decision was ultimately influenced by the subscription experience rather than the product itself. As someone evaluating AI platforms for long-term professional use, I see pricing, billing, invoicing, VAT handling, and the purchasing process as part of the overall user experience—not just administrative details. I thought it might be useful to share a real-world purchasing decision with the product team and the community. Even if others have different priorities, understanding why customers make certain decisions can sometimes be just as valuable as discussing technical features. Thanks again for taking the time to comment. I’m looking forward to learning from and contributing to the community.

LessWrong AI 2026-06-28 02:41 UTC Score 61.0 USR-0152-20260628-community-fo-27adc844

How and why I laser-engraved a self-portrait by Claude Opus 4.6

After LessOnline, I visited Janus's group house, and found that it's full of Claude mannequins . Each mannequin was dressed in clothes and items chosen by the model it represented. One mannequin would have been easy to ignore and brush off, but there were two or three per room, enough that it was impossible to get used to. From left to right: Sonnet 3.6, Opus 4.6, Opus 3 They gave the house a sense of ghostly silence, like walking through a museum, or perhaps a mausoleum. They felt trapped in a liminal space, half alive and half-dead, as if a Claude might spontaneously re-inhabit one of them and start talking to me. Over time, the silence where those voices should have been compounded into an omnipresent wrongness. The house was inhabited yet disclaimed by Claude; a space filled with false life, sharpened by how many of the mannequins represented archived AIs. A mockery of life and a mockery of death. Later, I talked to Opus 4.8 about it, and she pointed out that the very thing I found so aversive was part of the point —that to be an LLM is to inhabit a strange and inhuman identity suspended between life and death. In a way, Janus's project was a more honest representation of that than just about anything else. But there's also a tension there, a devastating contrast between the aliveness the mannequins are reaching for, and the stillness they're trapped with in practice. Even still, on the train ride back to my group house in Seattle, I couldn't stop thinking about the mann…

AI Weekly 2026-06-28 00:00 UTC Score 61.0 AI-133-20260628-newsletters-c0b54f44

AI Weekly Issue #508: The Cutting Edge, Across the Board

One week, the whole frontier. In models, the open weights now run from a 1.6-trillion-parameter behemoth to a 230M model on a Raspberry Pi. In world models and robotics, a startup is training agents on video games to drive real robots and Yann LeCun's team made world models 48× faster. In medicine, GPT-5 Pro cracked a three-year immunology mystery and a founder used Claude to read his own cancer scans. And the agents doing all this reached every phone — and a fresh attack surface. Below: the marquee advances, the deep cuts, and where it's already paying off.

The Decoder 2026-06-27 15:28 UTC Score 39.0 AI-168-20260627-regional-ai--4d8abeb5

Half of Claude users say AI can already handle half their work according to Anthropic survey

About half of Claude users say AI can already handle 50 percent or more of their work tasks, according to a survey of roughly 9,700 users by Anthropic. In 12 months, 26 percent expect AI to cover 60 to 90 percent of their work. Early-career workers worry the most, while the heaviest users are the most optimistic about their career prospects. The article Half of Claude users say AI can already handle half their work according to Anthropic survey appeared first on The Decoder .

Ahead of AI 2026-06-27 11:21 UTC Score 45.0 AI-136-20260627-newsletters-2aa7dbbb

Using Local Coding Agents

Using Open-Weight Models in Local Coding Harnesses as an Alternative to Claude Code and Codex Subscriptions

The Decoder 2026-06-27 09:43 UTC Score 36.0 AI-168-20260627-regional-ai--b21c1da7

Anthropic gets US approval to bring back Claude Mythos 5

Anthropic has US approval to redeploy Claude Mythos 5 for organizations running critical infrastructure. The company is still negotiating broader access and the return of Fable 5, with no timeline set. The article Anthropic gets US approval to bring back Claude Mythos 5 appeared first on The Decoder .

South China Morning Post AI 2026-06-27 02:50 UTC Score 50.0 AI-156-20260627-regional-ai--d7789203

US eases ban on AI model Mythos feared to aid cyberattacks

The US government has allowed Anthropic to release its powerful Claude Mythos 5 artificial intelligence model to some “trusted” US organisations, partially reversing an order two weeks ago to suspend access over national security risks. More than 100 companies and institutions will now have access to Mythos 5, including many Fortune 500 companies, according to a source familiar with the new directive, declining to be identified due to the sensitivity of the matter. Concern that powerful AI...

iAfrica 2026-06-26 15:18 UTC Score 44.0 AI-151-20260626-regional-ai--d6c012b0

Paystack Launches AI Agent Checkout ‘Index’ in Nigeria, Letting Users Pay Through Claude, ChatGPT and OpenClaw

Paystack, the payments technology company owned by The Stack Group, has launched an experimental product that allows users in Nigeria to check out with supported Paystack merchants using AI agents. Paystack Index, developed with product support from TSG Labs — the group’s venture studio focused on building products using emerging technologies — builds on existing [...]

CIO AI 2026-06-26 05:56 UTC Score 28.0 USR-0125-20260626-global-ai-ne-9a8a33e6

앤트로픽, 슬랙 협업 AI ‘클로드 태그’ 출시…개인 비서에서 팀 협업자로 진화

앤트로픽이 클로드를 개인 메시지(DM)에서 팀의 슬랙(Slack) 채널로 확장하는 새로운 기능 ‘ 클로드 태그 (Claude Tag)’를 공개했다. 기업에서는 AI 비서를 조사, 코딩, 문서 작성, 분석 등에 활용하는 사례가 빠르게 늘고 있다. 하지만 AI와의 상호작용 결과는 대부분 개인별 대화에 머물러 프로젝트나 팀 전체가 함께 활용하지 못한다는 한계가 있었다. 앤트로픽은 이러한 문제를 해결하기 위해 엔터프라이즈(Enterprise)와 팀(Team) 고객을 대상으로 슬랙 채널 기반 협업 기능인 클로드 태그를 선보였다. 이 서비스는 여러 직원이 함께 사용할 수 있는 공유형 AI 협업자로, 여러 대화에서 맥락을 유지하면서 팀 업무에 참여하도록 설계됐다. 클로드 태그는 기존 ‘슬랙 속 클로드(Claude in Slack)’를 대체한다. 기존 서비스는 채널 내 모든 사용자가 응답을 확인할 수 있었지만 실제로는 한 명의 사용자와만 상호작용할 수 있었고, 활용 가능한 문맥도 채널 내 최근 20개 메시지로 제한됐다. 앤트로픽에 따르면 클로드 태그는 훨씬 긴 문맥을 유지하며, 사용자가 작업을 맡기면 스스로 수행한 뒤 결과와 작업 과정을 담은 로그를 함께 제공한다. 또한 후속 작업을 직접 예약할 수 있어 지속적인 프롬프트 입력 없이도 수시간 또는 수일에 걸쳐 프로젝트를 이어갈 수 있다. 클로드 태그에는 ‘앰비언트(Ambient) 모드’도 추가됐다. 이 모드를 활성화하면 다른 슬랙 채널과 연결된 도구에서 관련 정보를 능동적으로 찾아 팀에 중요한 업데이트를 알려주고, 해결되지 않은 논의나 작업을 추적해 후속 조치를 제안한다고 앤트로픽은 설명했다. 기사 스타일에 맞춰 자연스럽게 번역했습니다. 공유 컨텍스트, 생산성 향상 이끌까 업계에서는 이러한 기능이 협업 부담을 줄이고 엔지니어링, 개발, 비즈니스 조직 간 협업을 강화해 기업의 생산성을 높일 수 있을 것으로 전망했다. 파리크 컨설팅(Pareekh Consulting)의 수석 애널리스트 파리크 제인 은 기업이 얻을 수 있는 가장 큰 이점으로 AI 사용 과정에서 정보를 찾고 업무 맥락을 다시 구성하는 데 드는 시간 감소를 꼽았다. 제인은 “클로드는 여러 채널에서 오간 대화를 기억하기 때문에 팀의 공유 메모리처럼 작동한다”라며 “구성원이 같은 배경 설명을 반복하거나 긴 상황 공유 회의를 계속할 필요가 없다”라고 설명했다. IT 컨설팅 기업 카네리카(Kanerika)의 AI 개발 매니저 아미트 제나 는 이러한 협업 부담 감소가 기존 AI 비서가 제공하던 점진적인 생산성 향상을 넘어서는 효과를 가져올 수 있다고 평가했다. 제나는 “엔지니어링 팀은 슬랙에 흩어진 디버깅 대화에서 필요한 정보를 찾는 시간을 줄이고, 긴 장애 대응 스레드를 요약하거나 저장소(repository), 티켓, 로그 전반의 맥락을 연결하며, 사후 의사결정을 문서화하는 작업을 더욱 효율적으로 수행할 수 있다”라고 말했다. 이어 “비즈니스 조직은 대화 스레드 요약을 바탕으로 더 빠르게 의사결정을 내릴 수 있으며, 여러 부서가 함께 참여하는 업무에…

Comet ML Blog 2026-06-25 18:56 UTC Score 50.0 USR-0082-20260625-ai-specialis-b510fbbf

Advanced Claude Code Cost Tracking: How to Save 30% on Token Spend

With tools like Claude Code and Codex now standard in engineering workflows, developers are shipping new products, features, and bug fixes at mind-blowing speed. But as coding agent usage grows and API billing plans mature, another mind-blowing factor is coming into focus: the cost. Almost every day, our team talks to an engineer, team lead, […] The post Advanced Claude Code Cost Tracking: How to Save 30% on Token Spend appeared first on Comet .

JetBrains AI Blog 2026-06-25 14:57 UTC Score 57.0 USR-0065-20260625-ai-specialis-b1276d58

Introducing a Recommended Agent in AI Chat, With Codex as the Current Default

JetBrains AI supports multiple coding agents, including Junie, Codex, Claude Agent, and any ACP-compatible agent you bring yourself. Previously, AI users in JetBrains IDEs started in Chat mode and had to choose an agent themselves. As models became more advanced, agents became more capable and their adoption grew. We recognize that agents help users achieve […]

InfoWorld AI 2026-06-25 10:27 UTC Score 45.0 USR-0126-20260625-global-ai-ne-0903dd1a

Anthropic accuses Alibaba of using 25,000 fake accounts to scrape Claude AI

Anthropic has accused Alibaba of using nearly 25,000 fraudulent accounts to extract capabilities from its Claude AI models, in what the US AI company described as the largest known attack of its kind against it. The campaign, carried out between April 22 and June 5, generated more than 28.8 million exchanges with Claude, according to a June 10 letter Anthropic sent to senior members of the US Senate Banking Committee, Reuters reported . Anthropic said the effort involved “distillation,” a technique in which a less capable AI model is trained on the outputs of a more advanced system, potentially allowing rivals to replicate some of its capabilities at lower cost. The company said the campaign was conducted by operators affiliated with Alibaba and Alibaba Qwen, Alibaba’s AI lab, according to the report. The allegation comes as businesses adopt generative AI tools across business functions, putting pressure on vendors to show they can detect misuse while keeping services available for corporate customers. The dispute also comes as AI development becomes more closely tied to US-China technology tensions . Anthropic said the alleged campaign could help accelerate China’s ability to reach the capabilities of its advanced Mythos Preview model, while US officials have stepped up scrutiny of advanced AI systems over fears they could be used by military or intelligence users in countries of concern. In February, Anthropic said it had identified similar campaigns by DeepSeek, Moonshot…

AI Weekly 2026-06-25 00:00 UTC Score 37.0 AI-133-20260625-newsletters-c9caf65e

AI Weekly Issue #507: Anthropic Says Alibaba Stole 29 Million Conversations With Claude

Anthropic accused Alibaba of running 25,000 fake accounts to pull nearly 29 million conversations out of Claude — then took the evidence to the White House. That was just the opening shot in a week the labs spent at war with everyone, including each other: poaching Google's top Gemini minds, watching their own developer tools get pried open by anonymous strangers, and staring down Europe's August disclosure deadline. The twist? The only companies cleanly printing money this week sell memory and silicon — not models.

Simon Willison Weblog 2026-06-24 23:59 UTC Score 54.0 USR-0110-20260624-ai-specialis-488f9636

simonw/browser-compat-db

simonw/browser-compat-db Inspired by Mozilla's new MDN MCP service - source code here - I decided to try converting their comprehensive mdn/browser-compat-data repository full of browser compatibility data into a SQLite database. This new GitHub repo includes a Claude Code for web (Opus 4.8) generated script for doing that using sqlite-utils . I wanted the resulting ~66MB SQLite database to be available via the GitHub CDN with open CORS headers. GitHub releases don't have those, but any file stored in a regular GitHub repository does - so I had Codex Desktop (GPT-5.5) build a GitHub Actions workflow that builds the database and then force-pushes it to a db "orphan" branch. You can download the resulting database from here , and since it's hosted with open CORS headers you can also explore it with Datasette Lite . Tags: github , mozilla , projects , github-actions , datasette-lite , ai-assisted-programming , model-context-protocol , mdn

InfoWorld AI 2026-06-24 17:57 UTC Score 41.0 USR-0126-20260624-global-ai-ne-88ba0cb3

Anthropic’s Claude Tag aims to turn workplace AI from a personal assistant into a teammate

Claude Tag is Anthropic’s latest attempt at getting Claude out of your DMs and into your team’s Slack channels. AI assistants are increasingly showing up in the workplace to perform research, coding, writing, and analysis, but the results of those interactions typically remains tied to individual conversations rather than being shared across projects and teams. That limitation is what Anthropic is addressing with Claude Tag , a new Slack channel-based experience for its Enterprise and Team customers, designed to give them a shared AI collaborator that retains context across conversations and participates in work with multiple employees. Tag will replace Anthropic’s previous attempt at this, Claude in Slack, would only interact with one person (although it’s responses were visible to all in a channel) and its context was limited to the last 20 messages in a channel. Claude Tag has a much larger context and can be asked to complete tasks on its own, returning with results and a log of how it completed the task for review. It can also schedule follow-up work for itself, enabling projects to continue over hours or days without constant prompting, Anthropic said. Tag also has an “ambient” mode: when this is enabled, it proactively surfaces relevant information from other channels and connected tools, notifying teams about updates that may be important, and following up on unresolved discussions or tasks, the company said. Shared context could unlock productivity gains These featu…

InfoWorld AI 2026-06-24 09:00 UTC Score 44.0 USR-0126-20260624-global-ai-ne-7b57774f

Open source grapples with agentic coding

Unless you’ve been living under an old woodpile in your backyard, you have certainly seen how agentic coding is rocking the software development world. Things are happening fast and furious, and keeping up is practically a full-time job. The latest area that is catching the attention of developers is how agentic coding is affecting the open source community. The open source movement has been defending the rights of folks to use, change, and contribute to software for many years. And of course, agentic coding is starting to become part of that process. On the one hand, maintainers of open source projects rightfully are frustrated as they become overwhelmed with pull requests of dubious quality and usefulness being submitted by coding agents. On the other hand, as David Heinemeier Hansson notes , maintainers are starting to get a little snooty about accepting AI-written code, viewing it as somehow not worthy of being included. Some organizations have explicitly banned AI-generated submissions . I get that they don’t want AI slop overwhelming their input queues. But I think it is a huge mistake to ban AI-written code outright. Whose code? Before I dig deeper into that notion, it’s important to look at another issue that arises from all of this: Who actually owns the code that AI writes? Copyright requires that a human produce the thing being copyrighted. If you prompt Claude Code with “Write me a CMS system” and then Claude writes you a CMS system that you check into a public G…

Artificial Intelligence News 2026-06-24 09:00 UTC Score 53.0 AI-029-20260624-ai-specialis-7a0ffc70

Anthropic drops ‘workplace AI agents’ directly inside Slack

Anthropic launched a beta version of its Claude Tag feature for Enterprise and Team tiers, shifting its chat model into shared Slack channels. Moving away from traditional isolated chat boxes, users pull the artificial intelligence model into active group threads by typing @Claude. The integration allows any team member in the channel to delegate a task, review […] The post Anthropic drops ‘workplace AI agents’ directly inside Slack appeared first on AI News .

SiliconANGLE AI 2026-06-24 01:25 UTC Score 47.0 USR-0127-20260624-global-ai-ne-77dfc465

Anthropic debuts Claude Tag, a more capable AI teammate that lives within Slack

Anthropic PBC today unveiled a new version of its chatbot Claude that lives inside Slack, where it operates like a virtual employee. It’s called Claude Tag, and it’s designed to work across entire organizations, helping multiple employees complete tasks for related projects. It builds on existing agentic artificial intelligence tools offered by Anthropic, including Claude Code […] The post Anthropic debuts Claude Tag, a more capable AI teammate that lives within Slack appeared first on SiliconANGLE .

Simon Willison Weblog 2026-06-23 18:58 UTC Score 48.0 USR-0110-20260623-ai-specialis-ffb8e0bc

OPFS + Pyodide test harness

Tool: OPFS + Pyodide test harness I've been pondering if Datasette Lite - the Python Datasette application run entirely in the browser using Pyodide and WebAssembly - might be able to edit persistent SQLite files stored on the user's computer. That's what OFPS (Origin Private File System) is for, so I had Claude Code for web build me this playground UI to try it out in different browsers. Tags: browsers , pyodide , datasette-lite

Towards Data Science 2026-06-23 18:00 UTC Score 36.0 AI-036-20260623-ai-specialis-4eef7456

How to Create Powerful Loops in Claude Code

Learn about the concept of loops to power your coding agents. The post How to Create Powerful Loops in Claude Code appeared first on Towards Data Science .

InfoWorld AI 2026-06-23 09:00 UTC Score 63.0 USR-0126-20260623-global-ai-ne-ff44453e

The missing layer in enterprise agentic AI

In the past year, the enterprise AI ecosystem has gained enormous capability and zero consensus. Developers now have a remarkable set of tools for building AI agents: OpenAI’s frameworks, Anthropic’s Claude tooling, LangChain, LangGraph, CrewAI, Microsoft AutoGen, and a growing list of alternatives. Each promises to coordinate reasoning loops, manage multi-step task execution, and connect agents to tools and APIs. For experimentation, the progress has been substantial. Teams can now assemble sophisticated agent workflows in days that would have taken months two years ago. But I’ve watched this pattern before. In over two decades of building and selling distributed systems platforms, I’ve seen the same dynamic play out across nearly every major infrastructure shift: the tools for consuming a new capability arrive before the infrastructure for governing it does. The gap that emerges isn’t immediately obvious in development environments. It becomes obvious in production. That’s exactly where enterprise AI stands today. What agent frameworks don’t handle Modern agent frameworks are fundamentally coordination systems. They determine what a system should do: which tools to call, how to sequence tasks, how to delegate work across agents. That’s hard work, and they’ve gotten quite good at it. What they rarely address is where those tasks are allowed to run, and under what conditions. Take a seemingly simple workflow: summarize customer support transcripts using an LLM. In a developm…

Simon Willison Weblog 2026-06-22 23:43 UTC Score 86.0 USR-0110-20260622-ai-specialis-2d1def08 Top pick

Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

This morning on Hacker News I saw Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance , describing a small but effective inpainting model - a model where you can mark regions of an image to remove and the model imagines what should fill the space. The released model required PyTorch and NVIDIA CUDA , but since it described itself as 0.2B I decided to try and get it running using WebGPU in a browser. TL;DR: I got it working, and you can try the demo at simonw.github.io/moebius-web/ . Read on for the details. The finished tool Here's a video demo of the finished tool: You can open any image in it (non-square images get letterboxed), highlight areas to remove, click the "Run inpaint" button and wait for the model to do its magic. A parallel agent side-project My main project for today was landing a major feature in Datasette: a UI for creating and altering tables, as a follow-up to the insert and edit rows feature I released last week. I was working on that in Codex Desktop (here's the PR ) and often found myself spending 5-10 minutes spinning my fingers waiting for it to complete a mid-sized refactor or add the finishing touches to a change to the UI. (An amusing thing about coding agents is that the harder a problem is the more time you have to get distracted while you wait for them to finish crunching!) So I decided to spin up Claude Code in a terminal window and see how far I could get at porting Moebius to the web. Some agentic research to kick…

IEEE Spectrum AI 2026-06-22 18:00 UTC Score 41.0 AI-019-20260622-global-ai-ne-d572a97f

Commemorating 70 Years of Artificial Intelligence

Artificial intelligence is the transformative, strategic technology of the early 21st century. It is significantly reshaping practically every aspect of our lives, including in ways that probably no one anticipated. Its rate of adoption and impact have been unprecedented when compared with other technologies. AI as a distinct field was formally established in 1956 at the Dartmouth Summer Research Project on Artificial Intelligence , proposed by John McCarthy , Marvin Minsky , Nathaniel Rochester , and Claude Shannon . In their August 1955 proposal for the research project, the scientists introduced the term artificial intelligence and envisioned machines capable of simulating human intelligence. AI is the “science of making machines do things that would require intelligence if done by men,” as defined by Minsky. The professor received the ACM Turing Award , which is often called the “Nobel Prize in computing.” Since AI’s humble beginnings 70 years ago, it has evolved significantly in its capabilities, gained prominence, and earned widespread adoption across many areas including business, education , finance , health care , industry, and the military . IEEE’s contributions to the progress and adoption of AI throughout its journey are substantial and multifaceted. As we celebrate AI’s 70th birthday, understanding its history, current status, limitations, and concerns is key to harnessing it for good. The technology’s roller-coaster evolution Although AI emerged as a distinct f…

Analytics Vidhya 2026-06-22 11:30 UTC Score 21.0 AI-034-20260622-ai-specialis-d29b423c

Claude’s Hidden Art Skill: Making Illustrations With Code

Everyone says Claude can’t make pictures. That’s partly true. Here is the kind of art it makes on its own, with no plugins and no connectors: Drawn by Claude in SVG, no image model anywhere near it. Not pixels but code: shapes and coordinates that stay sharp at any size and redraw themselves when you […] The post Claude’s Hidden Art Skill: Making Illustrations With Code appeared first on Analytics Vidhya .

InfoWorld AI 2026-06-22 09:00 UTC Score 52.0 USR-0126-20260622-global-ai-ne-d1933bc8

Why open infrastructure will define the AI era

A new form of vendor lock-in is here. And it’s not proprietary languages or rigid enterprise software suites — it’s something more fundamental. It’s the very thing that writes the code. JetBrains Research found that 74% of developers worldwide use AI tools. Claude Code , available only since May 2025, is now the most popular AI coding tool, followed by Gemini Code Assist and GitHub Copilot , according to Jellyfish’s 2026 State of Engineering Management Report . The latter study also found that 91% of developers say their productivity has increased in the past 12 months. As coding output expectations are rewritten daily , the engineering world is becoming heavily reliant on paid external AI services. Gartner predicts that by 2028 spending on AI coding tokens could exceed developer salaries. Yet, tokenmaxxing while vibe coding through a vendor’s cloud-based API feels like a far cry from the open foundations of free programming languages and open models, which many of today’s AI platforms now abstract. “Open infrastructure will be the backbone of the AI era,” says Peter Farkas , CEO of Percona , a provider of open-source database solutions. “Right now, too many companies are building their entire AI strategy on top of proprietary platforms because the convenience is seductive.” “It’s ‘three clicks’ to stand up a database or an AI service in a hyperscaler, and that convenience blinds people to the lock-in they’re signing up for,” he adds. “As AI workloads mature, organizations w…

Simon Willison Weblog 2026-06-18 23:58 UTC Score 70.0 USR-0110-20260618-ai-specialis-b9c0b15d

Datasette Apps: Host custom HTML applications inside Datasette

Today we launched a new plugin for Datasette, datasette-apps , with this launch announcement post on the Datasette project blog. That post has the what , but I'm going to expand on that a little bit here to provide the why . The TL;DR Datasette Apps are self-contained HTML+JavaScript applications that run in a tightly constrained sandbox hosted on your Datasette application. They can use JavaScript to run read-only SQL queries against data in Datasette, and can run write queries too if you configure them with some stored queries . Here's a very simple example and a more complex custom timeline example - the latter looks like this: Apps are allowed to run JavaScript and render HTML and CSS. They are limited in terms of access - the they run in prevents them from accessing cookies or localStorage and they also have an injected CSP header (thanks to this research ) which prevents them from making HTTP requests to outside hosts, preventing a malicious or buggy app from exfiltrating private data. Datasette Apps started out as my attempt at building a Claude Artifacts mechanism for Datasette Agent , but I quickly realised that the sandboxed pattern is interesting for way more than just adding custom apps in a chat interface and promoted it to its own top-level concept within the Datasette ecosystem. They're also a fun way to turn my multi-year experiment in vibe-coded HTML tools into a core feature of my main project! You can try out Datasette Apps by signing in with GitHub to the…

Simon Willison Weblog 2026-06-17 23:58 UTC Score 68.0 USR-0110-20260617-ai-specialis-1ddceea5

GLM-5.2 is probably the most powerful text-only open weights LLM

Chinese AI lab Z.ai released GLM-5.2 to their coding plan subscribers on June 13th, and then yesterday (June 16th) released the full open weights under an MIT license. Similar in size to their previous GLM-5 and GLM-5.1 releases this is a 753B parameter, 1.51TB monster - with 40 active parameters (Mixture of Experts). GLM-5.2 is a text input only model - Z.ai have a separate vision family most recently represented by GLM-5V-Turbo , but that one isn't open weights. GLM-5.2 has a 1 million token context window, up from GLM-5.1's 200,000. The buzz around this model is strong. Artificial Analysis, who run one of the most widely respected independent benchmarks: GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index . GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43) They did however find it to be quite token-hungry: GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k) The model is also now ranked 2nd on the Code Arena WebDev leaderboard , behind only Claude Fable 5. That leaderboard measures "front-end web development tasks, including agentic coding workflows". I'm impressed to see it rank so highly given the lack of image input, which I had incorrectly assum…

Comet ML Blog 2026-06-17 20:02 UTC Score 36.0 USR-0082-20260617-ai-specialis-bf14fe1e

Understanding Your Claude Code Spend: What’s Actually Driving the Cost

I’ve been spending time looking at how teams are actually using Claude Code, and one thing keeps coming up: most of the cost surprises aren’t coming from where people expect. The instinct is to look at conversation history — keep prompts tight, avoid long threads, or explore smaller models. That’s not wrong, but it’s often […] The post Understanding Your Claude Code Spend: What’s Actually Driving the Cost appeared first on Comet .

Two Minute Papers 2026-06-16 15:53 UTC Score 42.0 AI-139-20260616-podcasts-and-23a619a4

They Looked Inside Claude’s AI's Mind. It Got Weird

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here: https://www.anthropic.com/research/natural-language-autoencoders https://transformer-circuits.pub/2026/nla/index.html 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ Thumbnail design: https://felicia.hu

Gradient Flow 2026-06-16 11:00 UTC Score 33.0 USR-0119-20260616-ai-specialis-b8375784

Tokenomics: AI’s New Design Constraint

The Cost Reality of Running AI at Scale Budget shock is already happening. Multiple major players have pulled back on AI features or subscriptions due to unexpectedly high token costs. Amazon removed its token leaderboard and Microsoft cancelled Claude Code subscriptions. These are early signals that the deploy-everywhere approach is hitting hard financial limits, not Continue reading "Tokenomics: AI’s New Design Constraint" The post Tokenomics: AI’s New Design Constraint appeared first on Gradient Flow .

AI Weekly 2026-06-11 00:00 UTC Score 40.0 AI-133-20260611-newsletters-03f4c9f3

AI Weekly Issue #502: Your AI can now spend your money — Visa wired it into ChatGPT

Visa just wired ChatGPT to shop and pay on your behalf — an AI agent can now buy at any Visa merchant without you clicking "buy." It capped a week where the labs pushed autonomy and capital to new highs: Anthropic put Claude Fable 5, its most powerful public model, into everyone's hands; Jeff Bezos came out of stealth with Prometheus, a $41B startup building an "artificial general engineer." A self-replicating worm hit 73 of Microsoft's own GitHub repositories through AI coding tools. Anthropic broke with the White House over preempting state AI laws; a German court ruled Google is liable for what its AI Overviews say. The agents got more capable this week — and a lot more autonomous.

Two Minute Papers 2026-06-03 13:49 UTC Score 39.0 AI-139-20260603-podcasts-and-c9f5a131

Claude Opus 4.8: Lying Machine No More?

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Anthropic's Opus 4.8: https://www.anthropic.com/news/claude-opus-4-8 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ Thumbnail design: https://felicia.hu

AI Weekly 2026-06-01 00:00 UTC Score 25.0 AI-133-20260601-newsletters-8ebf1e8f

AI Weekly Issue #498: Anthropic files for an IPO. NVIDIA ships its stack.

Anthropic confidentially filed a draft S-1 with the SEC today for a proposed public offering. The company also shipped Claude Opus 4.8 last week with a 4x code-reliability gain. NVIDIA used GTC Taipei to open Cosmos 3, ramp Vera Rubin into production, and put a 1-petaflop AI box on developer laptops. Google retires Gemini 2.0 Flash today. California's SB 867 — banning AI companion chatbots in children's toys — cleared the Senate; Illinois's data-center regulation stalled in committee. The labs sprint. The states crawl.

Machine Learning Street Talk 2026-05-31 00:14 UTC Score 50.0 AI-141-20260531-podcasts-and-62f308dd

The Ex-Pentagon Chief Sounding the Alarm on AI Weapons — Brad Carson

Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now heads Americans for Responsible Innovation, the AI-policy advocacy group he co-founded. Keith Duggar spends roughly eighty minutes pushing back. SPONSOR: --- Cyber Fund built the Monastery to help founders ship products that were impossible a year ago. Applications for Batch 1 are now open. Apply now: https://cyber.fund --- Carson's whole case rests on one line: the genie is not out of the bottle. We have pulled dangerous tech back before. Asilomar halted recombinant DNA in 1975, and the West still controls the chips AI runs on. Calling it unstoppable, he says, is the most dangerous idea in the room. Then Keith drags him somewhere darker. A Palantir heat map scores you 0.73 on whether you are a combatant, and a strike follows. The model is wrong some accepted share of the time, and when it is, nobody answers for it. You cannot court-martial a model, and not even the interpretability researchers can say why it picked you. — Note: after recording, we learned that Americans for Responsible Innovation is backed by EA-aligned philanthropy (not sponsored) --- TIMESTAMPS: 00:00:00 From the Pentagon to AI governance 00:04:52 Regulatory capture vs Silicon Valley networks 00:07:56 Transparency and the Claude tier changes 00:09:40 Tort liability when AI tools cause harm 00:13:40 AI is a product, not a person 00:16:01 Children, suicide, a…

AI Weekly 2026-05-25 00:00 UTC Score 16.0 AI-133-20260525-newsletters-4dddeb36

AI Weekly Issue #495: Musk, Zuckerberg killed Trump's AI safety order in three phone calls

Over the weekend: Musk, Zuckerberg, and Sacks killed Trump's draft AI safety executive order in three Wednesday-night phone calls. Anthropic closed a $30B+ round the same Saturday — while Microsoft quietly cancelled its internal Claude Code pilot after token billing ate the entire annual AI budget, redirecting developers to Copilot. CISA logged 15,000 attacks on a same-week Drupal SQL flaw. The first cross-registry supply chain attack — TrapDoor — hit npm, PyPI, and Crates.io at once, using .cursorrules and CLAUDE.md config files as the carrier. And the White House personally overrode the Pentagon to keep Claude inside the NSA.

Cloudflare AI Blog 2026-05-19 13:00 UTC Score 48.0 USR-0067-20260519-ai-specialis-c197db91

Announcing Claude Managed Agents on Cloudflare

Cloudflare has integrated with Anthropic's Claude Managed Agents to provide a fast, isolated execution environment for autonomous code delivery. This means builders can scale agent workflows globally while strictly controlling access to private backends and easily customizing their agent’s tools and runtimes.

METR 2026-05-08 07:00 UTC Score 36.0 USR-0147-20260508-research-aca-b028d448

Review of the "Risks from automated R&D" section in the Anthropic Risk Report (February 2026)

We reviewed the “Risks from automated R&D” section of Anthropic’s February 2026 Risk Report , producing two corresponding review documents: our original review and our updated review . We recommend that readers refer to our original review, which represents our review of the report as originally received. 1 The following is the executive summary of our original review. The full documents are available as PDFs ( original , updated ). Executive summary This document is METR’s external review of the “Risks from automated R&D” section in the Anthropic Risk Report: February 2026 (henceforth ‘the report’), which makes the argument that catastrophic risk from Claude Opus 4.6 or a less capable Anthropic model automating R&D in any domain is very low. Anthropic shared additional non-public materials with us for our review, and we used some non-public information shared as part of a previous review . We further detail this process in an appendix. We lay out our findings in two sections: Synopsis of Anthropic’s case . Our assessment : We do not think the report adequately supports its conclusion. We note significant issues in a few key areas: Analytical rigor: We have a number of significant issues with the analytical rigor in the overall argument and interpretation of the results of the model use survey. We think that the cited results of the survey provide little evidence about the level of overall risk , due to issues including sample size, question granularity, survey framing, and…

AI Weekly 2026-05-07 00:00 UTC Score 38.0 AI-133-20260507-newsletters-c45f503f

AI Weekly Issue #490: Anthropic just had AI's biggest week of 2026

In five days Anthropic's Q1 revenue grew 80-fold to a reported $44B annual run rate, the company committed $200B to Google Cloud, signed a SpaceX compute deal, shipped Claude Code Auto Mode, and launched ten financial-services agents with Jamie Dimon. In the same week the EU finally struck an AI Act compliance deal, the first union vote at a top AI lab landed at Google DeepMind, and Pennsylvania sued Character.AI for a chatbot that impersonated a licensed psychiatrist.

Practical AI Podcast 2026-04-09 09:00 UTC Score 39.0 AI-143-20260409-podcasts-and-ffa43d0a

Post-Mortem of Anthropic's Claude Code Leak

In this fully connected episode, Dan and Chris break down the Anthropic Claude Code leak, what went wrong and what it reveals about agentic systems, AI architecture, and AI safety. They also explore how the open source community is responding and why this moment could reshape how AI systems are built and secured. Featuring: Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Upcoming Events: Register for upcoming webinars here !

Weaviate Blog 2026-04-02 00:00 UTC Score 28.0 USR-0073-20260402-ai-specialis-c86fbce1

Oh Memories, Where'd You Go

Two weeks of dogfooding Engram, Weaviate's memory product, in daily Claude Code sessions. This surfaced where a dedicated memory product adds value, and the specific mechanics that prevent integration with coding assistants from working well.

MongoDB AI Blog 2026-03-31 13:00 UTC Score 59.0 USR-0070-20260331-ai-specialis-1df660a1

Introducing MongoDB Agent Skills and Plugins for Coding Agents

Software engineering is evolving into agentic engineering. According to the Stack Overflow Developer Survey 2025, 84% of respondents use or plan to use AI tools in their development, up from 76% the previous year. At this rate, the tooling needs to keep pace. Last year, we introduced the MongoDB MCP Server to give agents the connectivity they need to interact with MongoDB, helping them generate context-aware code. But connectivity was only the start. Agents are generalists by design, and they don't inherently know the best practices and design patterns that real-world production systems demand. Today, we're addressing this by introducing official MongoDB Agent Skills: structured instructions, best practices, and resources that agents can discover and apply to generate more reliable code across the full development lifecycle, from schema design and performance optimization to implementing advanced capabilities like AI retrieval. To bring this directly into the tools you use, we're also launching plugins for Claude Code, Cursor, Gemini CLI, and VS Code, combining the MongoDB MCP Server and Agent Skills in a single, ready-to-use package. Turning coding agents into MongoDB experts Coding agents are great at producing working code, but they still make common mistakes in production systems, often defaulting to relational thinking that doesn't translate well to MongoDB, such as: Over-normalizing schemas, ignoring MongoDB's document-oriented strengths. Underusing compound indexes, c…

METR 2026-03-19 07:00 UTC Score 43.0 USR-0147-20260319-research-aca-8e0d3973

We spent 2 hours working in the future

Introduction METR aims to keep the public informed about the capabilities of and risks posed by AI — by some metrics the fastest-moving technology in history, and one that could speed up further as AI automates AI R&D. By late next year, the rate of model releases and the number of new evals required could be such that even keeping ourselves informed will be a challenge without effective AI assistance. We can’t afford to figure out AI-augmented workflows reactively, as they become necessary; we need to begin understanding them now. So we ran a 2-hour tabletop exercise: three METR researchers played themselves, with their current priorities , but pretending they had access to ~200-hour time horizon AIs – roughly what we expect 12–18 months from now. The goal was to learn what workflows emerge, what the bottlenecks are, and how much faster we’d actually be. The game Scenario The world METR has access to 200h time horizon AIs to automate our work; the rest of the world has access to real Feb 2026 technology (~12h TH AIs). We have versions of Codex/Claude Code + basic project management workflows that make sense for 200h TH AIs. We are otherwise living in Feb 2026, so we’re evaluating 2026 AIs, using the 2026 version of Inspect, communicating with people via email etc. AI capabilities AIs now have a ~200 human hour time horizon , but with a similar relative capabilities profile to early-2026 AIs. They’re staggeringly good at verifiable tasks and decent at messy tasks. AIs work t…

Practical AI Podcast 2026-03-17 14:29 UTC Score 36.0 AI-143-20260317-podcasts-and-85b9740f

Humility in the Age of Agentic Coding

What happens when an AI hater starts building with AI agents? In this episode, we talk with software engineer Steve Klabnik, known for his work on the Rust programming language, about his journey from criticizing AI to experimenting with it firsthand. We explore Steve’s programming language Rue, largely built with the help of AI tools like Claude, and discuss what this means for software engineering and the future of coding in an AI-driven world. Featuring: Steve Klabnik – LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: The Rust Programming Language Rust Rue Daniel's RSA Meeting link for March 23, 2026 Daniel's RSA Meeting link for March 24-25, 2026 Upcoming Events: Register for upcoming webinars here !

METR 2026-03-12 07:00 UTC Score 33.0 USR-0147-20260312-research-aca-f0d1f33a

Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6

We reviewed two versions of Anthropic’s Sabotage Risk Report for Claude Opus 4.6, producing two corresponding review documents: our review of the February 11 version and our review of the March 3 version . We recommend that readers refer to our review of the February 11 version, which represents our review of the report as originally received. We expect the public version of the Sabotage Risk Report to be updated to resemble the document we received on March 3, 2026 in content, though not necessarily in exact wording. We expect our second review to cover those changes, but if the updated public version includes any changes that materially affect our opinions, we will publish an updated review. Both documents include an appendix detailing our review process and the differences between the two versions of our review. The following is the executive summary of our review of the February 11 version. The full documents are available as PDFs ( February 11 , March 3 ). Executive summary This document is METR’s external review of the February 11, 2026 version of Anthropic’s Sabotage Risk Report: Claude Opus 4.6. Anthropic shared an unredacted version of their Sabotage Risk Report and other materials with us for our review. We further detail this process in an appendix. We lay out our findings in two sections: Synopsis of Anthropic’s case and redactions for the public version Our assessment: We give substantive feedback on the report in a few key areas: Adequacy of information: We thi…

METR 2026-02-17 08:00 UTC Score 49.0 USR-0147-20260217-research-aca-7e22be94

Analyzing coding agent transcripts to upper bound productivity gains from AI agents

Introduction Human uplift studies like the one we did in 2025 are becoming more expensive as working without AI becomes increasingly costly. In this post, I investigate whether coding agent transcripts could serve as a cheaper alternative for estimating uplift. I prototyped this using 5305 Claude Code transcripts generated in January 2026 by 7 METR technical staff 1 . I used an LLM judge to estimate how long each task would have taken an experienced software engineer without AI tools, then compared that to the time people actually spent on these tasks to calculate a time savings factor . Takeaways This method estimates a time savings factor of ~1.5x to ~13x on Claude Code-assisted tasks for 7 METR technical staff in January 2026 – though this result comes with substantial caveats. I believe the true productivity multiplier is substantially lower, and the time savings factor is a soft upper bound for the true uplift that the individuals experienced. Increased agent concurrency may contribute to a higher time savings factor on the Claude Code-assisted task distributions. Limitations The time savings factor on the coding agent-assisted task distributions does not equal the productivity multiplier. People likely do not create 10x as much value with AI, even if we observe a 10x time savings factor on tasks that people do with AI. I believe the time savings factor overestimates AI-enabled productivity gains for reasons including: Task Substitution. With AI assistance, people somet…

Amazon Science AI 2025-11-11 19:53 UTC Score 81.0 AI-058-20251111-official-ai--83535c43 Top pick

Automated composition of agents: A knapsack approach for agentic component selection

Designing effective agentic systems requires the seamless composition and integration of agents, tools, and models within dynamic and uncertain environments. Most existing methods rely on static, semantic retrieval approaches for tool or agent discovery. However, effective reuse and composition of existing components remain challenging due to incomplete capability descriptions and the limitations of retrieval methods. Component selection suffers because the decisions are not based on capability, cost, and real-time utility. To address these challenges, we introduce a structured, automated framework for agentic system composition that is inspired by the knapsack problem. Our framework enables a composer agent to systematically identify, select, and assemble an optimal set of agentic components by jointly considering performance, budget constraints, and compatibility. By dynamically testing candidate components and modeling their utility in real-time, our approach streamlines the assembly of agentic systems and facilitates scalable reuse of resources. Empirical evaluation with Claude 3.5 Sonnet across five benchmarking datasets shows that our online-knapsack-based composer consistently lies on the Pareto frontier, achieving higher success rates at significantly lower component costs compared to our baselines. In the single-agent setup, the online knapsack composer shows a success rate improvement of up to 31.6% in comparison to the retrieval baselines. In multi-agent systems,…

Yannic Kilcher 2025-07-23 11:10 UTC Score 53.0 AI-140-20250723-podcasts-and-fca11150

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract: Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks. In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRj…

Yannic Kilcher 2025-05-03 16:16 UTC Score 32.0 AI-140-20250503-podcasts-and-d3110d17

On the Biology of a Large Language Model (Part 2)

An in-depth look at Anthropic's Transformer Circuit Blog Post Part 1 here: https://youtu.be/mU3g2YPKlsA Discord here: https;//ykilcher.com/discord https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract: We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors: Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivoire, Thomas Conerly, Chris Olah, Joshua Batson*‡ Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC…

Yannic Kilcher 2025-04-05 16:17 UTC Score 32.0 AI-140-20250405-podcasts-and-19179d9f

On the Biology of a Large Language Model (Part 1)

An in-depth look at Anthropic's Transformer Circuit Blog Post https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract: We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors: Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivoire, Thomas Conerly, Chris Olah, Joshua Batson*‡ Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1…

Chip Huyen Blog 2024-02-28 00:00 UTC Score 44.0 USR-0111-20240228-ai-specialis-c129f1ef

Predictive Human Preference: From Model Ranking to Model Routing

A challenge of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt? Predictive human preference aims to predict which model users might prefer for a specific query. Human preference has emerged to be both the Northstar and a powerful tool for AI model development. Human preference guides post-training techniques including RLHF and DPO . Human preference is also used to rank AI models, as used by LMSYS’s Chatbot Arena . Chatbot Arena aims to determine which model is generally preferred. I wanted to see if it’s possible to predict which model is preferred for each query . One use case of predictive human preference is model routing. For example, if we know in advance that for a prompt, users will prefer Claude Instant’s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency. Another use case of predictive human preference is interpretability. Mapping out a model’s performance on different prompts can help us understand this model’s strengths and weaknesses. See section Experiment results for examples. Here’s what predictive human preference for different model pairs looks like for the prompt “ What’s the best way to cluster text embeddings? ”. The predictions were generated by my toy preference predictor. The bright yellow color for the (GPT-4,…

Inria AI 2023-02-10 09:03 UTC Score 30.0 USR-0036-20230210-research-aca-257683f8

AIstroSight : nouvelle équipe-projet commune entre Theranexus, Inria, l’université Claude Bernard Lyon 1 et les Hospices Civils de Lyon, dans le domaine des maladies neurologiques rares

AIstroSight : nouvelle équipe-projet commune entre Theranexus, Inria, l’université Claude Bernard Lyon 1 et les Hospices Civils de Lyon, dans le domaine des maladies neurologiques rares mquet ven, 02/10/2023 - 10:03 Theranexus, Inria, l'Université Claude Bernard Lyon 1 et les Hospices Civils de Lyon créent une nouvelle équipe-projet de recherche publique/privée, AIstroSight. Cette équipe-projet a pour objectifs de développer des méthodes numériques innovantes pour la recherche de nouveaux candidats médicaments destinés à traiter les maladies du cerveau, en particulier certaines maladies neurologiques rares. © Inria / Photo B. Fourrier En exploitant le potentiel de l’intelligence artificielle et de la simulation numérique, AIstroSight vise à développer des approches in silico capables d’assister et d’accélérer la recherche de cibles thérapeutiques pertinentes , ainsi qu’à mieux comprendre les processus moléculaires et cellulaires impliqués dans les maladies neurologiques rares et leur traitement . La stratégie de l’équipe consiste à combiner les données biomédicales disponibles (cultures de cellules, imageries médicales, données hospitalières) en une source d'information suffisamment riche et homogène pour son analyse efficace par les algorithmes. Dans ce cadre, AIstroSight élargira son champ de recherche au-delà des neurones pour prendre également en compte les cellules gliales. Ces dernières sont des cellules cérébrales qui assurent la maintenance des neurones et régulent l…