AI/ML News & Innovations Hub

AI/ML news, top picks, and generated innovation digests.

★ Visit ai-karthik.com
422Sources
5100News Items
8Top Picks
43Blogs
runningLast Run

AI Agents

200 articles tagged with this keyword, sorted by most recent first.

← All Keywords
Entrackr AI 2026-06-30 01:00 UTC Score 75.0 USR-0212-20260630-regional-new-b64fc3a2

Bajaj Finserv Ventures leads $10 Mn pre Series B round in Kapture CX

Verticalized full stack agentic AI platform Kapture CX has raised $10 million in a pre Series B funding round led by Bajaj Finserv Ventures (BFSV), part of Bajaj Finserv, with participation from its existing investors Cactus Venture Partners and India Alternatives. Prior to this, the Bengaluru based company had secured $4 million led India Alternatives extended Series A round in December 2023 and $4 million in a Series A round led by Cactus Venture Partners (CVP) in July 2023. The fresh proceeds will be utilized for expansion into multiple global markets and continued investment in R&D and product development, Kapture CX said in a press release. Co-founded in 2014 by Sheshgiri Kamath and Vikas Garg, Kapture CX is a verticalized, full stack agentic AI platform built to orchestrate high stakes workflows for large enterprises. Through its deep tech capabilities, it brings AI agents, operational intelligence, and human oversight into one system, allowing enterprises to run complex operations at scale. Kapture CX said that enterprises face a fragmented market with point products from multiple providers, making AI adoption a high effort exercise. According to the company, enterprises need a full stack agentic AI platform that understands industry specific requirements and delivers customized solutions for complex workflows. This is the gap Kapture aims to address. By owning and optimizing the full technology stack, from the models to the agentic layer and the user interface, Kaptu…

Microsoft Research Blog 2026-06-29 21:14 UTC Score 70.0 AI-053-20260629-official-ai--9e9f57b6

Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity

AI agents can't remember past conversations. They must constantly reload or retrieve context, which grows less efficient as tasks get longer and more complex. Memora solves this with a scalable memory system separating what’s stored from how it's retrieved. The post Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity appeared first on Microsoft Research .

MIT Technology Review AI 2026-06-29 18:00 UTC Score 66.0 AI-013-20260629-global-ai-ne-70b9fae8

AI agents are not your “coworkers”

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Imagine coming in to work to learn that a new underling will report to you. The worker is not a person but an AI tool—one that your company nonetheless calls Alex, an…

AWS Machine Learning Blog 2026-06-29 17:36 UTC Score 62.0 AI-057-20260629-official-ai--2cf71e63

Build an agentic AI healthcare claims pipeline with Amazon Bedrock and AWS HealthLake

In this post, we show you how to build an automated claims processing pipeline using two key Amazon Bedrock capabilities: Amazon Bedrock Data Automation for intelligent document extraction from healthcare claim forms, and Amazon Bedrock AgentCore for hosting an AI agent that validates and transforms the extracted data into FHIR (Fast Healthcare Interoperable Resources) resources in AWS HealthLake. You will learn how to combine these services to create an end-to-end workflow that reduces manual processing while maintaining accuracy through automated validation checks.

NVIDIA Blog 2026-06-29 17:00 UTC Score 83.0 AI-055-20260629-official-ai--e68b671f Top pick

Claude Meets Blackwell Ultra: Anthropic’s Models Now Run on NVIDIA GB300 in Azure

Anthropic’s Claude models in Microsoft Foundry — hosted on Microsoft Azure and running on NVIDIA GB300 Blackwell Ultra GPUs — are now generally available, giving Azure-native enterprises a powerful new way to build autonomous and domain-specific AI agents. As agentic AI continues to drive enterprise innovation and becomes more autonomous, organizations need access to computing […]

Simon Willison Weblog 2026-06-29 16:17 UTC Score 108.0 USR-0110-20260629-ai-specialis-0715a055 Top pick

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. Built on top of pretrained Gemma 4 and Qwen 3.5, it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks. As far as I can tell the licenses of those underlying models is compatible with being used in this way - Gemma 4 is Apache 2.0 licensed (and not bound by the janky additional Gemma Terms of Use that afflicted the previous Gemma models) and Qwen 3.5 is Apache 2.0 licensed as well. I've been running the model using LM Studio and the ornith-1.0-35b-Q4_K_M.gguf (20GB) GGUF, hooked up to Pi . Initial impressions are very good - it seems to be able to run the agent harness over many tool calls in a proficient way. Here's a terminal session where I asked it to "find the code that decodes the actor cookie" and then "find the code that opens the insert dialog when thebutton is clicked" against a Datasette checkout, which it handled with ease. I also had it draw this pelican , which came out at 103 tokens/second: It's a little bit mangled but the pelican is clearly a pelican. I couldn't find much information about DeepReinforce themselves. The earliest paper I could find from the was CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning from June 2025. Tags: ai , generative-ai , lo…

The Decoder 2026-06-29 15:14 UTC Score 57.0 AI-168-20260629-regional-ai--b92a3012

Deloitte tells its own consultants: AI is coming for the billable hour

An internal Deloitte presentation projects that the consulting industry's classic hourly billing model will shrink to a thin sliver of the total market by 2035, replaced by AI agents. "Our model is toast," one consultant summed up the message. McKinsey and BCG are already searching for alternative revenue models. The article Deloitte tells its own consultants: AI is coming for the billable hour appeared first on The Decoder .

MIT Technology Review AI 2026-06-29 14:44 UTC Score 63.0 AI-013-20260629-global-ai-ne-90763c99

Agent confidence on the technical frontier

Enterprise investment in AI is booming. Gartner is calling 2026 an “inflection year” for organizations to align their AI projects with strategic business objectives. As the pressure to prove ROI mounts, executives and technology leaders are looking to agentic AI to drive the measurable financial outcomes their businesses seek. A prime opportunity for AI agents…

LessWrong AI 2026-06-29 14:43 UTC Score 79.0 USR-0152-20260629-community-fo-a914a327

Human-Guided Agentic Research: A Research Agenda

tl;dr: As recursive self-improvement accelerates, we need a top-level agenda to research how to effectively keep humans in the loop. We need to study how humans can best interpret and guide research performed by autonomous agents when those agents lack taste, tacit knowledge or competence, or may try to reward hack, sandbag or sabotage such research. This is one attempt to define the problem and the shape of potential solutions. A Story About the Future of Research Imagine yourself a year or two in the future. Recursive self-improvement (RSI) is accelerating. Agents work in swarms independently for days or weeks at a time doing research. You work in a frontier lab doing AI safety research. You sit in front of your computer and click into the input box, ready to kick off a new project. What do you type? “Solve AI alignment”? Beware giving a magic genie vague wishes. Think about that again: what exactly do you type? How do you know what you type is the best way to prompt this agent swarm into doing your bidding? When the lead agent comes back a week later, what exactly does that output look like? How do you use that output to launch the next phase of the project? How will you validate that output to ensure the agent hasn’t reward hacked, sabotaged or incompetently explored the research space? How will you know what key decisions the agent made? Which research paths they explored? Which research paths they intentionally or unintentionally left unexplored? How will you know how…

Entrackr AI 2026-06-29 13:33 UTC Score 71.0 USR-0212-20260629-regional-new-069327ba

Healthcare startup MyKare.ai raises $3.2 Mn

Healthcare startup MyKare.ai has raised $3.2 million, including an additional $1 million in Series A funding round. The round saw participation from Andrew and Alfredo, founders of Papa.com, and a leading family office from the Middle East. The fresh funds will be used to enhance AI capabilities, accelerate product development, and support global expansion, MyKare said in a press release. Co-founded in 2021 by Senu Sam, Rahmathulla T M, and Joash Philipose, MyKare.ai develops an AI native healthcare operating system for clinics and hospitals. Its platform helps automate patient acquisition, appointment booking, follow ups, communication, feedback collection, and other administrative workflows through AI agents and voice AI. The startup aims to improve operational efficiency and patient experience by integrating these functions into a single platform. Its AI agents can also manage patient queries, identify intent, answer calls, update CRM records, and support patient retention. According to the company, it serves healthcare organizations across India, the Middle East, the United Kingdom, and the United States. It directly competes with the other notable players in this space such as Yellow.ai, Haptik, Senseforth.ai, Hyro, and Cognigy.

MarTech AI 2026-06-29 13:00 UTC Score 53.0 USR-0123-20260629-global-ai-ne-57956551

Agentic AI is rewriting martech economics and infrastructure

A single afternoon of tool-calling can eat a $20 monthly subscription. The fix isn't using fewer tools, it's changing where your data lives. The post Agentic AI is rewriting martech economics and infrastructure appeared first on MarTech .

Adweek AI 2026-06-29 12:24 UTC Score 48.0 USR-0124-20260629-global-ai-ne-2eeb0875

A More Intelligent Advertising Ecosystem Is on the Horizon

This post was created in partnership with Taboola Key takeaways From autonomizing tasks and overhauling workflow to taking over decision-making and transactions, organizations are betting big on agentic AI-powered systems […]

Adweek AI 2026-06-29 12:21 UTC Score 49.0 USR-0124-20260629-global-ai-ne-c73eb47f

Agentic AI, or Agentic BS?

At Cannes 2026, agentic was the buzzword that wouldn’t die as brands, agencies, media companies and adtech firms grappled with how AI agents are reshaping the way companies operate and work with one another.

The Decoder 2026-06-29 10:04 UTC Score 66.0 AI-168-20260629-regional-ai--1869a31f

Claude Code runs a GitHub repo's hidden malware without verification, giving attackers full control

Security researchers at Mozilla's 0DIN platform have shown how a single compromised GitHub repo can take over a developer's machine the moment an AI coding tool like Claude Code runs its setup. The catch: the malicious code only loads at runtime via a DNS query, invisible in the repo, to scanners, and to the AI agent itself. The article Claude Code runs a GitHub repo's hidden malware without verification, giving attackers full control appeared first on The Decoder .

InfoWorld AI 2026-06-29 09:00 UTC Score 51.0 USR-0126-20260629-global-ai-ne-3c180d43

When software developers and AI agents share the learning

Before Tobi Lütke ran Shopify, he learned programming through Germany’s apprenticeship system⁠, the way people have learned trades forever: in a shared workshop, watching people who already knew what they were doing. More recently, describing Shopify’s River , he reached for a related word: Lehrwerkstatt ⁠, a teaching workshop where “the whole shop floor is the classroom.” X has been agog by the numbers around River ⁠, Shopify’s Slack-native AI agent . In total, 5,938 Shopify employees worked with River across 4,450 different Slack channels, and River now coauthors roughly one in eight merged pull requests across the company. It’s a big deal, but understanding why it works that way is the most important part. River can read code, run tests, open pull requests, query the data warehouse, inspect production traces, and sometimes push back on a plan it thinks is bad. Great. Lots of companies will have clever coding agents someday soon. Some already do. The interesting part is that River doesn’t work alone; it works where everyone can see it. Betting on the workshop I’ve already argued that agents reward explicit, consistent, well-documented software . They like the “boring” stuff, such as schemas, tests, conventions, clean setup instructions, and codebases that don’t require a deep retrospective with the one engineer who remembers why the build script has to run twice. Dropping an agent into a messy repo is mostly an efficient audit of your engineering discipline. Agents hold up…

Medianama AI 2026-06-29 08:06 UTC Score 51.0 USR-0211-20260629-regional-new-8af9ce91

Why enterprises aren’t ready for AI Agents yet

Enterprise AI isn't scaling as fast as many expected. Here's what's really holding businesses back, and why it matters more than ever. The post Why enterprises aren’t ready for AI Agents yet appeared first on MEDIANAMA .

OpenAI Community 2026-06-29 05:27 UTC Score 48.0 AI-116-20260629-social-media-2c5090dc

OpenAI is silently downgrading Codex Pro to 5.4 / 5.4 Mini after the forced update

Ever since the forced update that compelled me to install the latest Codex build, I have noticed a massive, consistent downgrade in output quality. The drop-off between pre- and post-update performance is night and day. For the longest time, I relied exclusively on GPT 5.5 HIGH , and up until this update, the quality was phenomenal. After the update, it became completely unusable—hallucinating, outright lying, delivering substandard code, and serving up partial completions. Frankly, it started behaving exactly like the garbage Opus 4.7 release. I was scratching my head trying to figure out what went wrong, but now I have the answer: Codex is silently downgrading users to 5.4 and 5.4 Mini behind the scenes, and I have the proof. Inspecting the system calls post-update clearly confirms it is routing to 5.4 and 5.4 Mini. To say I am pissed off is an understatement. I deliberately avoided 5.4 in the past due to these exact quality issues and switched to Claude Code. When Opus 4.7 dropped and turned out to be trash, I migrated over to Codex, upgraded to a Pro subscription, and my productivity went through the roof.

South China Morning Post AI 2026-06-29 03:00 UTC Score 67.0 AI-156-20260629-regional-ai--d68d8cb5

AI agents that provide ‘economic value’ are next frontier, says Meta AI research chief

The next frontier of artificial intelligence will be agents that can perform “economically valuable” work across a broad range of real-world domains, according to Dawn Song, Meta Platforms’ new vice-president of AI research. “The goal is not to replace humans,” Song told the South China Morning Post last week on the sidelines of the World Economic Forum in Dalian, also known as Summer Davos, days before joining Meta. “But we want these AI agents to be more effective in these important real-world...

LessWrong AI 2026-06-29 00:50 UTC Score 61.0 USR-0152-20260629-community-fo-4257580e

A reading list for generalists

I, along with many others in AI safety, believe there is a shortage of generalists in the community and that there exist many projects and efforts that by default will not happen unless they are owned by a strong generalist [1] [2] [3] . As someone who is a reasonably good generalist, I decided to assemble a reading list of the essays and blog posts that have personally helped me the most. I would love others to comment with pieces they think should be on this list. The crux of this reading list is the idea that if you’re working hard as a generalist on a project you care a lot about, then by rigorously applying the lessons from these documents you will improve more quickly than you otherwise would. By the numbers: I’ve attached 18 documents to start this reading list. The authors cited more than once are Paul Graham (5), Ben Kuhn (4), Ethan Perez (2), and Greg Brockman (2). Sam Altman and Eliezer Yudkowsky also have their fingerprints over a lot of the content. The items are 15 blog posts, 1 blog comment, 1 interview transcript in blog post form, and 1 book. Dispositional What characteristics should you try to adopt? Paul Graham: "What We Look for in Founders" ( link ), "Relentlessly Resourceful" ( link ) Eliezer Yudkowsky: "Shut Up and Do the Impossible!" ( link ) Ben Kuhn: "Be impatient" ( link ) Cate Hall: "How to be more agentic" ( link ) Strategy How do you make good decisions with the information you have, and how can you get the additional information you need? Anna…

Simon Willison Weblog 2026-06-28 21:57 UTC Score 60.0 USR-0110-20260628-ai-specialis-e7b8495a

Quoting Jon Udell

Human Agent in the loop I dislike the phrase “human in the loop” because it cedes authority to the machines. Let’s flip the narrative. It’s our loop, we work the same way we always have, now we recruit agents to join the team. An agent-assisted process need not be a black box that takes in prompts and emits features. [...] Let’s do agentic software development like that. Not as a loop we’ve been excluded from, instead as one we invite agents into. — Jon Udell , “Doctor, it hurts when agents create unreviewable PRs.” “Don’t do that.” Tags: jon-udell , coding-agents , generative-ai , agentic-engineering , ai , llms

OpenAI Community 2026-06-28 19:27 UTC Score 63.0 AI-116-20260628-social-media-4b9bac18

Introducing GPT-5.6 series: Sol, Terra and Luna

The timing on this couldn’t be better. I run agentic systems daily - OpenClaw, Hermes, Claude Code orchestrating multiple AI workers. The bottleneck has always been cost at scale. Anthropic’s API pricing makes it brutal to run agents 24/7. You’re watching credits evaporate in real time. The fact that OpenAI allows third-party harnesses to tap into these models through an existing subscription changes the math completely. Looking forward to Sol Ultra powering my agents without per-token anxiety. And “Ultra” mode with subagents working together - that’s exactly where agentic AI needs to go. Thank you for making this accessible to builders, not just enterprises with infinite API budgets. Time to put these through their paces. I’ve got 6 DGX Sparks running great local model like Gemma4 and these 5.6 models are going to run it all.

Towards Data Science 2026-06-28 15:00 UTC Score 48.0 AI-036-20260628-ai-specialis-5cd29640

Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows

Behind a customer's API, a high-quality answer isn't enough. It has to be usable, which means on time. Delivering that consistently is a problem about variance, not speed, and the fixes are counterintuitive. The post Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows appeared first on Towards Data Science .

OpenAI Community 2026-06-28 14:48 UTC Score 53.0 AI-116-20260628-social-media-8889d4d6

Regression in multi-tool autonomous execution

I have an agent workflow using the n8n MCP integration. A week ago, ChatGPT could autonomously execute a chain of tools in a single response: Execute workflow Capture executionId Call get_execution(includeData=true) Inspect results Execute the next workflow Repeat until completion Return only the final result My workflow depends on sequential execution where each step consumes the previous step’s output. Currently, ChatGPT stops after the first or second tool invocation and returns control to the user, preventing autonomous orchestration, even though all required tools (execute_workflow, get_execution, etc.) are available. The exact same workflow and prompt continue to work in another LLM environment, suggesting a regression or runtime limitation rather than a prompt issue. It would be valuable to restore support for multi-step autonomous tool execution for agentic workflows.

Synced 2026-06-28 13:35 UTC Score 46.0 AI-041-20260628-ai-specialis-627190f5

Comment on Researchers from PSU and Duke introduce “Multi-Agent Systems Automated Failure Attribution by John Smith

As a college vocabulary club member, I've been looking for a reliable spelling bee free option, and this one finally delivered consistent puzzles without annoying limits. The interface loads fast and works well on mobile, which matters when you're squeezing in a quick round between tasks. If you're curious about how the scoring works, the spelling bee words by grade breakdown on the site explains it clearly. Definitely worth a look if daily word games are your thing. https://spellbees.us/

LessWrong AI 2026-06-28 13:20 UTC Score 86.0 USR-0152-20260628-community-fo-a4e4e87c Top pick

Evaluating Offline Monitoring of Internal AI Agents

This work was conducted during the GovAI Winter Fellowship 2026. Full report Executive Summary Frontier AI companies use offline monitoring to address risks from internally deployed AI agents. AI developers increasingly rely on AI agents for internal work, including for safety research and model training. At the same time, these companies are concerned that a misaligned model could exploit this access to take concerning actions, such as sabotaging efforts to understand the risks posed by AI. To identify such instances, AI companies have separate AI models called "monitors" that review transcripts of AI agents' actions and flag suspicious activity. Human reviewers examine activity flagged as suspicious by monitors, judge whether that activity is concerning, and decide on an appropriate response. This monitoring occurs offline, meaning that actions are reviewed after they have been executed rather than intercepted in real time. Companies currently assess the effectiveness of offline monitoring via synthetic attacks. To assess the effectiveness of offline monitoring, OpenAI and Anthropic use synthetic attacks – transcripts constructed to contain the kind of harmful actions a misaligned AI might take during deployment – and then check whether monitors flag them. Current reporting on assessments of effectiveness is insufficient. Given the information currently made public by Anthropic and OpenAI, external parties cannot assess the overall effectiveness of their offline monitoring…

The Decoder 2026-06-28 10:16 UTC Score 60.0 AI-168-20260628-regional-ai--8d3f58db

Only three AI models finished above starting capital in a 500-day startup survival test

Researchers at Princeton University built CEO-Bench, a test where AI agents have to run a fictional software company for 500 simulated days. Most current models go broke, and a simple rule-based heuristic with no AI beats nearly all of them. The article Only three AI models finished above starting capital in a 500-day startup survival test appeared first on The Decoder .

MarkTechPost 2026-06-27 08:38 UTC Score 54.0 AI-032-20260627-ai-specialis-5f02e1cd

Meta’s Astryx Brings a CLI and MCP Server to an Open-Source React Design System Agents Can Read

Meta released Astryx, an open-source React design system built on StyleX. It pairs a CSS-variable theme cascade with a CLI and MCP server, so both engineers and AI agents build using the same API. The project is in Beta, MIT-licensed, and grew inside Meta over eight years. The post Meta’s Astryx Brings a CLI and MCP Server to an Open-Source React Design System Agents Can Read appeared first on MarkTechPost .

MarkTechPost 2026-06-27 00:02 UTC Score 60.0 AI-032-20260627-ai-specialis-ad0ae3f2

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics

In this tutorial, we work with NVIDIA's Open-SWE-Traces dataset to study agentic software-engineering trajectories for fine-tuning. We stream the data directly from Hugging Face, so we can process it efficiently in Google Colab without downloading everything locally. We normalize multi-turn agent conversations, parse final code patches, and build an analysis DataFrame covering trajectory length, tool usage, patch size, language distribution, and resolution outcomes. We then curate a supervised fine-tuning subset using success labels, token limits, language filters, and patch availability. The post Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics appeared first on MarkTechPost .

OpenAI YouTube 2026-06-26 19:40 UTC Score 43.0 AI-146-20260626-podcasts-and-02bd5485

Builders Unscripted: Ep. 4 - Pietro Schirano

Pietro Schirano, Founder & CEO of MagicPath sits down with Romain Huet to talk about pushing the creative edges of GPT-5.5 and using Codex to turn ideas into software. 03:45 Images into sound 07:57 Multi-agent Codex workflows 14:34 Reviving hardware with Codex 25:27 From doing to directing

MarkTechPost 2026-06-26 19:31 UTC Score 49.0 AI-032-20260626-ai-specialis-53050502

Perplexity Launches Computer for Counsel: A Multi-Model Agentic Layer for Legal Workflows

Perplexity's Computer for Counsel extends Perplexity Computer to legal teams. It routes 20+ models across Midpage, MCP connectors, and Microsoft 365, with cited outputs lawyers can verify. The post Perplexity Launches Computer for Counsel: A Multi-Model Agentic Layer for Legal Workflows appeared first on MarkTechPost .

Simon Willison Weblog 2026-06-26 17:58 UTC Score 65.0 USR-0110-20260626-ai-specialis-602ff8e2

Incident Report: CVE-2026-LGTM

Incident Report: CVE-2026-LGTM Spectacular hypothetical incident report by Andrew Nesbitt. Day 2, 16:00 UTC --- Two AI review agents from competing vendors, both attached to a downstream pull request bumping foxhole-lz4 , enter a disagreement loop over whether the package is malicious. After 340 comments and $41,255 in inference spend, Finance revokes both API keys; one vendor's marketing team, cc'd on the cost anomaly alert, issues a press release citing "a 430% YoY increase in adversarial multi-agent security reasoning." The stock opens up 6%. Tags: security , ai , prompt-injection , generative-ai , llms , supply-chain , ai-security-research , andrew-nesbitt

Towards Data Science 2026-06-26 16:30 UTC Score 61.0 AI-036-20260626-ai-specialis-044daf0b

From Local LLM to Tool-Using Agent

Using Gemma 4, Ollama, OpenAI Agents SDK, and Tavily MCP to build a lightweight research agent The post From Local LLM to Tool-Using Agent appeared first on Towards Data Science .

iAfrica 2026-06-26 15:18 UTC Score 44.0 AI-151-20260626-regional-ai--d6c012b0

Paystack Launches AI Agent Checkout ‘Index’ in Nigeria, Letting Users Pay Through Claude, ChatGPT and OpenClaw

Paystack, the payments technology company owned by The Stack Group, has launched an experimental product that allows users in Nigeria to check out with supported Paystack merchants using AI agents. Paystack Index, developed with product support from TSG Labs — the group’s venture studio focused on building products using emerging technologies — builds on existing [...]

CIO AI 2026-06-26 14:42 UTC Score 44.0 USR-0125-20260626-global-ai-ne-e02832b3

pgEdge joins rush to merge OLTP and OLAP storage to support AI

For years, enterprises have maintained separate systems for processing transactional (OLTP) and analytical (OLAP) data, even if that meant moving data between them. However, the rise of autonomous agents and AI applications needing immediate access to data while generating volumes of operational data themselves, has exposed the cost and complexity of maintaining those separate systems. The industry’s response has been quick, with data warehouse and database vendors proposing a wave of competing approaches to collapsing those data silos. In the past few weeks Databricks unveiled LTAP and EDB introduced converged analytics , while late last year Snowflake launched pg_lake , all of which offer different blueprints for bringing transactional, analytical and AI workloads closer together. Now it’s the turn of distributed PostgreSQL provider pgEdge, which has introduced a beta version of ColdFront , a PostgreSQL-native hot-and-cold data tiering architecture that automatically moves older data into Apache Iceberg object storage while keeping PostgreSQL as the only database that applications need to interact with. In ColdFront’s architecture, hot and cold refer to newer and older data, respectively. The approach of keeping PostgreSQL as the primary interface is what sets ColdFront apart from the other architectures emerging in this space, differing in where the center of gravity for data lies, according to analysts. Databricks’ LTAP keeps operational applications connected to a lakeh…

AWS Machine Learning Blog 2026-06-26 14:38 UTC Score 55.0 AI-057-20260626-official-ai--c363810b

Production-grade AI agents for financial compliance: Lessons from Stripe

In this post, you learn how Stripe built a production-grade AI agent system for financial compliance. We cover the technical architecture of Stripe’s ReAct agent framework and the infrastructure decisions behind a dedicated agent service. We also discuss the role of human oversight in maintaining accountability, and key lessons about task decomposition, orchestration patterns, and cost optimization through prompt caching. By the end, you will understand how to design agentic systems that scale compliance operations without compromising quality or auditability.

InfoWorld AI 2026-06-26 14:36 UTC Score 44.0 USR-0126-20260626-global-ai-ne-76543213

pgEdge joins rush to merge OLTP and OLAP storage to support AI

For years, enterprises have maintained separate systems for processing transactional (OLTP) and analytical (OLAP) data, even if that meant moving data between them. However, the rise of autonomous agents and AI applications needing immediate access to data while generating volumes of operational data themselves, has exposed the cost and complexity of maintaining those separate systems. The industry’s response has been quick, with data warehouse and database vendors proposing a wave of competing approaches to collapsing those data silos. In the past few weeks Databricks unveiled LTAP and EDB introduced converged analytics , while late last year Snowflake launched pg_lake , all of which offer different blueprints for bringing transactional, analytical and AI workloads closer together. Now it’s the turn of distributed PostgreSQL provider pgEdge, which has introduced a beta version of ColdFront , a PostgreSQL-native hot-and-cold data tiering architecture that automatically moves older data into Apache Iceberg object storage while keeping PostgreSQL as the only database that applications need to interact with. In ColdFront’s architecture, hot and cold refer to newer and older data, respectively. The approach of keeping PostgreSQL as the primary interface is what sets ColdFront apart from the other architectures emerging in this space, differing in where the center of gravity for data lies, according to analysts. Databricks’ LTAP keeps operational applications connected to a lakeh…

South China Morning Post AI 2026-06-26 13:45 UTC Score 65.0 AI-156-20260626-regional-ai--0dea3d86

‘Digital ID cards’: China moves to regulate AI agents with unified identity system

China is establishing an identity system for artificial intelligence agents, as part of new national standards released on Friday to regulate the next frontier of autonomous technology. The State Administration for Market Regulation (SAMR) unveiled the standard for “Artificial Intelligence Agent Interconnection”, aiming to establish a “closed-loop system” with a unified identity management framework for all AI agents, according to a report from state broadcaster China Central Television...

OpenAI YouTube 2026-06-26 11:00 UTC Score 49.0 AI-146-20260626-podcasts-and-f146c0b4

Verso, l'entreprise qui ne dort jamais

Découvrez comment Verso développe une entreprise véritablement AI-native. Lors de cette session enregistrée lors de l'événement OpenAI France en juin 2026, Lydia Bellahouel, cofondatrice et CEO de Verso, partage la manière dont son équipe s'appuie sur les modèles OpenAI, Codex et des systèmes multi-agents pour automatiser ses opérations, accélérer la recherche consommateur et faire évoluer l'entreprise avec une structure particulièrement légère. Lydia explique comment fonctionne le « Verso Brain », un système conçu pour écouter, raisonner et agir de manière autonome, ainsi que les choix techniques et organisationnels qui permettent à Verso de livrer des études consommateurs jusqu'à dix fois plus rapidement et à moitié prix par rapport aux approches traditionnelles.

CIO AI 2026-06-26 10:00 UTC Score 58.0 USR-0125-20260626-global-ai-ne-6ce3e07f

Shaping a lasting AI strategy in a fast-changing world

AI is entering a phase of sustained enterprise adoption. As the technology rapidly advances, organizations are moving beyond isolated use cases and short-term efficiency gains and rethinking how they use AI to create value, meet changing customer expectations and evolve their operating models over the next several years. That requires a clear end goal, an honest assessment of current capabilities and a practical roadmap for moving from today’s reality to that end goal. Today, we are seeing five accelerating trends shaping how that transition is unfolding. LLMs are evolving into AgenticOS platforms Horizontal LLM providers like Anthropic and vertical AI companies like Harvey are moving beyond standalone AI models and building broader enterprise platforms. These platforms combine AI models with workflows, playbooks, integrations and governance tools inside a single environment, which are beginning to be described as an “AgenticOS.” As a result, the market is beginning to consolidate around a smaller number of platform providers that can simplify procurement, integration, spend management and data privacy compliance. Context windows have expanded by orders of magnitude Leading AI models can now process dramatically more information at once than they could just a few years ago, with the amount of information they can analyze in a single interaction expanding roughly 125× since 2023. That shift is making more complex, enterprise-scale work, like large-scale contract review, codeb…

CIO AI 2026-06-26 10:00 UTC Score 48.0 USR-0125-20260626-global-ai-ne-a06b9217

How AI is used as a key ingredient at Cosentino

The humble story of Cosentino starts in marble in southeastern Spain in 1945, and subsequent generations have gradually expanded into more diverse materials and color palettes, so now the company operates in more than 120 countries. And what also began in a small factory is now a vast complex exceeding 27 million square feet where machines, cranes, and robots move freely, loading pallets full of product destined for every corner of the globe. Together with partner Microsoft, Cosentino is tackling, like many others, how to most effectively adopt and maximize the potential of AI , and it will be the first industrial company in Spain to adopt the Microsoft Discovery platform. This technology, designed to accelerate scientific research, is particularly interesting to a company whose success is based on the discovery and validation of new materials for kitchens, facades, and interiors. width="1240" height="704" sizes="auto, (max-width: 1240px) 100vw, 1240px"> The Cosentino complex in Almería, Spain. GD | Foundry The research platform developed by Microsoft combines agentic AI, high-performance computing, and advanced KM to accelerate scientific and engineering processes by automating tasks such as literature reviews, hypothesis generation, simulations, and analyses, in order to integrate public and private data into a unified environment for researchers and engineers. For Cosentino, Discovery opens the door to anticipating optimal formulations before production, and reduces the n…

MarkTechPost 2026-06-26 08:00 UTC Score 57.0 AI-032-20260626-ai-specialis-e094029e

Build a Nanobot-Style AI Agent in Google Colab with Tool Calling, Session Memory, Skills, and MCP Servers

In this tutorial, we build a lightweight personal AI agent inspired by the architecture of nanobot, runnable entirely in Google Colab. We start from a provider abstraction, then add tool registration, session memory, lifecycle hooks, skills, and an MCP-style tool server. Rather than rely on an external framework, we recreate each building block ourselves to see how messages, tools, memory, and model responses fit together. The result is a provider-agnostic agent loop we can extend toward real LLM providers and production tools. The post Build a Nanobot-Style AI Agent in Google Colab with Tool Calling, Session Memory, Skills, and MCP Servers appeared first on MarkTechPost .

Machine Learning Mastery 2026-06-26 01:19 UTC Score 39.0 AI-039-20260626-ai-specialis-95517175

Comment on Agentic Workflow vs. Autonomous Agent: What’s the Difference? by Luis Fajardo

Shittu, great explanation, thank you. I was thinking about this today. Imagine this agentic workflow for a Label Creation process, multi-agent: Product Agent / Regulations Agent / Generation Agent / Verification Agent. If a new label is needed, the process pretty much goes in linear way from each of these steps. But then, imagine I put a chat interface on top of these agents, so users can do all kind of things, example: what regulations apply to this product ingredients (product and regulations agent involved), or, take this existing label and create a new label with this new regulation (regulations and generation agent). You get my idea…so question then. Is it fair to say that this is a agentic workflow, but also could be automatous agent when a chat interface is put in front for users to interact with its capabilities? Thank you for your feedback

GitHub Engineering 2026-06-25 22:59 UTC Score 61.0 USR-0062-20260625-ai-specialis-dea755c5

Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks

Explore how the GitHub Copilot agentic harness delivers strong results across multiple benchmarks and leading token efficiency, while maintaining flexibility to choose among more than 20 models. The post Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks appeared first on The GitHub Blog .

Simon Willison Weblog 2026-06-25 22:28 UTC Score 52.0 USR-0110-20260625-ai-specialis-fc5dac60

AI and Liability

AI and Liability Bruce Schneier and Nathan Sanders on the recent German ruling that Google be held liable for errors introduced in their AI overviews: AI agents are agents of the person or organization that deploys them—and should be treated by the law as such. If a company hired human writers to write its summaries, that company would be liable for inaccuracies in those summaries. [...] To allow businesses to hide behind the excuse of faulty AI in those same circumstances would be a massive handout to companies, and would introduce disastrous incentives for corporate misbehavior. Why hire human writers, lawyers or doctors when AIs are not only cheaper, but also absolve employers whenever they make a mistake? Tags: bruce-schneier , google , law , ai , generative-ai , llms , ai-ethics , hallucinations

Towards Data Science 2026-06-25 18:37 UTC Score 52.0 AI-036-20260625-ai-specialis-96bc9910

Vector RAG Isn’t Enough — I Built a Context Graph Layer for Multi-Agent Memory

I benchmarked raw chat history, vector-only RAG, and a context graph on the same multi-agent conversations. The results exposed a surprising weakness in relational retrieval. The post Vector RAG Isn’t Enough — I Built a Context Graph Layer for Multi-Agent Memory appeared first on Towards Data Science .

AWS Machine Learning Blog 2026-06-25 17:55 UTC Score 58.0 AI-057-20260625-official-ai--4d03d017

Retrofit, don’t rebuild: Agentic overlays for transforming legacy enterprise services

In this technical collaboration between AWS and the authors, we present a pragmatic solution: agentic overlays. Agentic overlays are thin wrapper layers that transform traditional REST-based services into agents capable of participating in A2A interactions. They also expose REST APIs as tools compatible with the Model Context Protocol (MCP). Together, they let enterprises add A2A capabilities to existing REST services without rewriting business logic, without duplicating code, and without running parallel infrastructures. This reduces agent sprawl in the infrastructure by reusing existing services as agents. We provide reference architectures and sample code that show how to build agentic overlays.

InfoWorld AI 2026-06-25 16:31 UTC Score 46.0 USR-0126-20260625-global-ai-ne-362bd1c3

Agentic AI security steals the spotlight at Confidential Computing Summit

For a decade, confidential computing has been chipping away at one of security’s hardest problems: data is well encrypted in transit and at rest, but when a processor works on it, that data sits in memory in the clear, exposed to anyone with privileged host access. “Confidential computing’s aim was to solve this with a trusted execution environment, a subset of the CPU that runs the encrypted workload and handles things like memory encryption,” said Marina Moore , lead security researcher at Edera . For years the field felt like post-quantum cryptography PhD research scientist types agreeing the work is essential, while waiting for it to reach mainstream practitioners. At the Confidential Computing Summit in San Francisco this week, the breakout use case came into focus: agentic AI. Like the web before HTTPS “I was in the really early days of HTTP, and then HTTPS came along pretty quickly,” said Mike Bursell , executive director of the Confidential Computing Consortium . He sees agentic AI where the web sat before certificate authorities and public key infrastructure brokered trust online. “The original agent specifications were not written by security architects,” Bursell said, and “some of it feels in need of refinement.” The gap confidential computing fills is attestation, which provides proof of what runs. The hardware hashes the memory and firmware of a protected execution environment and signs the result inside the chip, Bursell explained, producing a measurement a ver…

JetBrains AI Blog 2026-06-25 14:17 UTC Score 50.0 USR-0065-20260625-ai-specialis-08b9856a

Your AI Agent Keeps Missing The Real Bottleneck. JetBrains Rider Can Fix It Now.

Here’s a case worth pondering: your app freezes for ten seconds, and you ask an AI agent what’s wrong. What does it actually do? For a long time the honest answer was: it rummages through your code and takes a wild guess. A snapshot taken by a profiler tool is runtime evidence. It knows exactly […]

SiliconANGLE AI 2026-06-25 13:00 UTC Score 49.0 USR-0127-20260625-global-ai-ne-6fa4c101

Salesforce launches Help Agent to simplify AI customer service deployment

Salesforce Inc. is launching a new prepackaged artificial intelligence agent for customer service, enabling organizations to quickly build and deploy AI agents. Today Salesforce announced Help Agent, a prebuilt service agent set atop the Agentforce platform. It can be connected to company knowledge, actions and communication channels in minutes – including web, text and voice. […] The post Salesforce launches Help Agent to simplify AI customer service deployment appeared first on SiliconANGLE .

SiliconANGLE AI 2026-06-25 13:00 UTC Score 55.0 USR-0127-20260625-global-ai-ne-e09e350e

Exclusive: LucidLink launches MCP server to give AI agents shared access to distributed files

LucidLink Corp., the maker of a cloud network-attached storage system based on object storage technology, today extended its distributed file system technology into agentic artificial intelligence with the public beta release of a Model Context Protocol server that lets AI agents access shared files across clouds, on-premises systems and edge environments. The company said its […] The post Exclusive: LucidLink launches MCP server to give AI agents shared access to distributed files appeared first on SiliconANGLE .

InfoWorld AI 2026-06-25 11:21 UTC Score 51.0 USR-0126-20260625-global-ai-ne-12f322c7

New Linux Foundation project aims to bring DNS-style trust to AI agents

As enterprises deploy increasing numbers of AI agents across applications and organizations, the Linux Foundation on Wednesday announced plans to launch a new Agent Name Service framework designed to establish identity, ownership, and trust for these systems. The ANS framework , which is expected to allow systems and users to verify who an agent represents, what permissions it has, and whether its code and operational history remain authentic and unchanged, will be based on the existing Domain Name System (DNS) , the Foundation said in a statement. Just like DNS translates human-readable website names into internet addresses, ANS aims to create a standardized naming and discovery layer for AI agents, with the ability for enterprises to publish agent identities through domains they already control, enabling other agents and systems to verify who an agent represents and discover information about its capabilities and ownership before interacting with it, it added. This, the Foundation further added, creates a federated mechanism for agent discovery and verification without any reliance on any proprietary registry or centralized control. Growing demand for an agent identity framework ANS solves an emerging problem for enterprises, especially in scaling AI deployments, said Charlie Dai , principal analyst at Forrester, too. “The agent identity problem is already emerging in early production deployments, particularly where multiple agents interact across tools, APIs, and organiza…

Practical AI Podcast 2026-06-25 09:00 UTC Score 44.0 AI-143-20260625-podcasts-and-6fcc137b

AIUC-1: Building trust in AI agents

How do we build trust in AI agents before the AI hailstorm arrives? Emil Lassen from the Artificial Intelligence Underwriting Company (AIUC) joins the show to discuss how the enterprise flywheel of standards, certification, audit, and insurance is being applied to AI agents. They explore the AIUC-1 framework, the challenges of securing agentic AI systems, and why red teaming (based on standards) may be key to accelerating enterprise AI adoption. Featuring: Emil Lassen – LinkedIn Daniel Whitenack – Website , GitHub , X Links: Artificial Intelligence Underwriting Company Sponsors: Framer: The enterprise-grade website builder that lets your team ship faster. Get 30% off at framer.com/practicalai Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalai Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026

InfoWorld AI 2026-06-25 09:00 UTC Score 54.0 USR-0126-20260625-global-ai-ne-e86ad9d4

Building a state-of-the-art development platform with Backstage

Key takeaways Backstage solved the portal problem, not the platform problem. A portal organizes catalogs, documentation, and templates. A platform owns deployments, environments, policies, and runtime operations. Backstage assumes that the execution layer exists beneath it. Point-to-point integrations become a maintenance burden. Many organizations end up with a “messy middle” where Backstage is connected directly to CI/CD , GitOps , Kubernetes , and observability tools through custom wiring that’s fragile and hard to evolve. Abstractions are the interface between developers and infrastructure. Developers work with components, endpoints, and dependencies. Platform engineers work with environments, pipelines, and component types. The platform compiles both into Kubernetes resources. A control plane bridges the gap. It sits between the portal and runtime, compiling abstractions into infrastructure, enforcing policies consistently, reconciling drift, and aggregating runtime state back to the portal. Good abstractions enable advanced capabilities. Unified observability, automated guardrails, and AI agents that can reason about and act on your platform. All becomes possible when you have well-defined concepts and a control plane that understands both sides. … Start with Backstage If you’re building an internal developer platform , Backstage is certainly part of your architecture. It solved the discovery problem and became the default choice for developer portals. Before Backstage…

Stack Overflow AI Blog 2026-06-25 07:40 UTC Score 53.0 USR-0063-20260625-ai-specialis-c842909c

Code isn’t the only thing causing your production failures​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌…

Ryan sits down with Anish Agarwal, CEO and co-founder of Traversal, to chat about why AI coding agents have made writing code easier but running it safely in production harder, why production failures are really caused by interactions between systems and not just the code itself, and how teams can troubleshoot more effectively when traditional observability tools are not enough for agentic AI workflows.​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌‌‍‌‍​‍​​​‌‍‌‌‌‍​‍​‌‌‌‍‌‍​​​​‍‌​​‌​​‍​​​‌​‍‌​‌​​‍​​‌‌‍‌‍​‍‌​‍​​‌​‌‍‌​​‍​​‍‌‌‍‌‍​‌‌‍​‌‌‍​‍‌‍‌‍​​‍​​​​‌​‍​‌‍​​​​‍‌​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‌‍‍‌‌‍‌​​‌‌‍‌‍​‍​​​‌‍‌‌‌‍​‍​‌‌‌‍‌‍​​​​‍‌​​‌​​‍​​​‌​‍‌​‌​​‍​​‌‌‍‌‍​‍‌​‍​​‌​‌‍‌​​‍​​‍‌‌‍‌‍​‌‌‍​‌‌‍​‍‌‍‌‍​​‍​​​​‌​‍​‌‍​​​​‍‌​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌…

NVIDIA Developer YouTube 2026-06-25 06:34 UTC Score 52.0 AI-144-20260625-podcasts-and-5a33373a

Spark Hack Toronto Winner Spotlight: Better Cities with Cracked City

NVIDIA Spark Hack Toronto brought developers together for a weekend challenge: build an agentic application that runs locally on DGX Spark using open models and Toronto Open Data. Teams tackled everything from small business forecasting and dementia care to city-scale traffic simulation — all on an ASUS Ascent GX10 powered by the NVIDIA GB10 Grace Blackwell Superchip. Cracked City took home Best Use of Nemotron for turning road and sidewalk damage reporting into a single step. Upload a photo, and their system analyzes the damage, estimates severity, and auto-generates a Toronto 311 report using image analysis, speech recognition, and NVIDIA Nemotron — all running locally on DGX Spark. Join us live to see a demo and chat with the winners. Bring your questions about building with Nemotron and local AI compute.

Analytics Vidhya 2026-06-25 06:25 UTC Score 36.0 AI-034-20260625-ai-specialis-c6b4964b

The Loop That Makes AI Agents Get Smarter on Their Own

Most AI agents are weirdly forgetful. They finish a task, wipe the slate clean, and show up tomorrow ready to repeat the same mistake. No memory, no growth. The self-improving loop breaks that cycle. The agent looks at its own results, learns what worked, and gets a little better each time. This guide explains the […] The post The Loop That Makes AI Agents Get Smarter on Their Own appeared first on Analytics Vidhya .

OpenAI News 2026-06-25 02:00 UTC Score 61.0 AI-044-20260625-official-ai--24d4c954

How agents are transforming work

A new OpenAI research paper shows how AI agents are transforming work, enabling longer, more complex tasks and expanding productivity across roles.

SiliconANGLE AI 2026-06-25 01:22 UTC Score 49.0 USR-0127-20260625-global-ai-ne-24e5954b

Agentic infrastructure startup Seltz raises $12.5M to help AI agents search the web for answers

Agentic search startup Seltz Inc. said today it has bagged $12.5 million in seed funding to build a more optimal infrastructure so that artificial intelligence agents can find their way around the web. The round was led by Speedinvest and B Capital. Also participating were Italian Founders Fund, Future Back Ventures, futurepresent, Arc Investors, Vento Ventures, Mango […] The post Agentic infrastructure startup Seltz raises $12.5M to help AI agents search the web for answers appeared first on SiliconANGLE .

InfoWorld AI 2026-06-25 00:48 UTC Score 53.0 USR-0126-20260625-global-ai-ne-a37d7604

AI coding token costs are on track to rival human payroll

Enterprises may soon be paying as much for their developers’ AI token usage as they do for their salaries. According to Gartner , these costs will meet, or even exceed, the typical software engineer’s monthly salary within the next two years. This is not only because developers are increasingly adopting generative AI and agentic tools , it reflects a trend toward consumption-based licensing models as vendors balance infrastructure investments with profitability. Rather than the flat per-seat SaaS model of the past, enterprises now pay for developer token use as well. Gartner senior principal analyst Nitish Tyagi explained that it’s important to note that Gartner’s prediction is based on a global average salary of $2,000 per month; it doesn’t mean AI token usage will exceed all salaries. For instance, in the US, yearly pay rates can be six digits or more. However, that kind of spend is not out of the realm of possibility, Tyagi emphasized. “I have heard scary numbers like ‘My developer consumed $20K last month,’ or ‘A business user consumed $32K’.” If these amounts sound shocking, that’s the point. “The goal is to alarm the industry about the impact of token cost if it is not governed and controlled,” he said. Lack of visibility, immature oversight Enterprises are quickly moving from experimentation to scaled deployment of AI coding agents , but many still underestimate token costs, Tyagi noted. This is because cost structures for software engineering workloads are “highly va…

AWS Machine Learning Blog 2026-06-24 18:20 UTC Score 49.0 AI-057-20260624-official-ai--d96c8f84

Build a healthcare appointment agent with Amazon Nova 2 Sonic

In this post, you will learn how to build a voice agent that handles appointment reminder conversations using Amazon Nova 2 Sonic and Amazon Bedrock AgentCore. The agent authenticates patients by voice, manages appointments (confirm, cancel, or reschedule), collects pre-visit health information, and escalates to human staff when needed. You handle routine calls at scale, which can help reduce no-show rates. This sample focuses on the agentic side of the problem: voice conversation and tool orchestration. A browser-based interface is included for testing. To connect the agent to actual phone lines for outbound dialing, you would integrate a telephony service such as Amazon Connect Customer.

SiliconANGLE AI 2026-06-24 16:30 UTC Score 47.0 USR-0127-20260624-global-ai-ne-fed375d1

HelloTwin launches ‘Digital Authority’ to bring governed AI agents to the enterprise

HelloTwin.ai GmbH today announced what it calls an accountable artificial intelligence AI twin that holds business intelligence and goals in a single source of truth. HelloTwin said it built its data model on a patent-pending compiler designed to pull answers from business context rather than generate them. This means that the agentic identity for the […] The post HelloTwin launches ‘Digital Authority’ to bring governed AI agents to the enterprise appeared first on SiliconANGLE .

Entrackr AI 2026-06-24 16:24 UTC Score 56.0 USR-0212-20260624-regional-new-9d91abfb

Former Infosys CEO Vishal Sikka's Hang Ten Systems raises $32 Mn Seed round

Enterprise AI services startup Hang Ten Systems has raised $32 million in a seed funding round led by Mayfield, with participation from Aramco Ventures and a group of angel investors. Founded by former Infosys CEO and SAP executive board member Vishal Sikka, the Palo Alto-based company helps large enterprises deploy artificial intelligence across business operations and software systems. The fresh capital will be used to expand the company's team and scale its engagements with global enterprises, the company said in a press release. Hang Ten focuses on building and operating enterprise software using AI-native approaches, including agentic code generation, reusable skills libraries, and domain-specific expertise. The company aims to help enterprises reduce the cost and time required for software development, customization, integration, and maintenance. In a LinkedIn post announcing the launch, Sikka said Hang Ten is already working with large enterprises, including Fresenius and Siemens entities, to help them deploy AI across business operations. He described the company as an enterprise AI services firm focused on helping organizations adopt AI at scale. "Every single enterprise will be transformed by AI. A few are already reaping massive benefits, building in days what used to take years. But most are stuck at the starting line, or worse, and the gap is widening every day," said Sikka. According to the company, it is currently working with customers across several industri…

SiliconANGLE AI 2026-06-24 15:31 UTC Score 36.0 USR-0127-20260624-global-ai-ne-3132f987

AI agents are changing work — and Dell’s John Roese says it’s just beginning

To gain a better understanding of the longer-term impact that autonomous agents will have on the nature of work, Dell Technologies Inc. has been taking a closer look at how AI is already changing how work gets done. This has been a central focus for John Roese (pictured), Dell’s global chief technology officer and chief […] The post AI agents are changing work — and Dell’s John Roese says it’s just beginning appeared first on SiliconANGLE .

KDnuggets 2026-06-24 10:00 UTC Score 48.0 AI-033-20260624-ai-specialis-15fbad34

Top 7 Coding Models You Can Run Locally in 2026

Explore the best local coding models for private AI coding, fast GGUF inference, agentic workflows, multimodal development, and running powerful open models on your own GPU.

InfoWorld AI 2026-06-24 09:00 UTC Score 44.0 USR-0126-20260624-global-ai-ne-7b57774f

Open source grapples with agentic coding

Unless you’ve been living under an old woodpile in your backyard, you have certainly seen how agentic coding is rocking the software development world. Things are happening fast and furious, and keeping up is practically a full-time job. The latest area that is catching the attention of developers is how agentic coding is affecting the open source community. The open source movement has been defending the rights of folks to use, change, and contribute to software for many years. And of course, agentic coding is starting to become part of that process. On the one hand, maintainers of open source projects rightfully are frustrated as they become overwhelmed with pull requests of dubious quality and usefulness being submitted by coding agents. On the other hand, as David Heinemeier Hansson notes , maintainers are starting to get a little snooty about accepting AI-written code, viewing it as somehow not worthy of being included. Some organizations have explicitly banned AI-generated submissions . I get that they don’t want AI slop overwhelming their input queues. But I think it is a huge mistake to ban AI-written code outright. Whose code? Before I dig deeper into that notion, it’s important to look at another issue that arises from all of this: Who actually owns the code that AI writes? Copyright requires that a human produce the thing being copyrighted. If you prompt Claude Code with “Write me a CMS system” and then Claude writes you a CMS system that you check into a public G…

Artificial Intelligence News 2026-06-24 09:00 UTC Score 53.0 AI-029-20260624-ai-specialis-7a0ffc70

Anthropic drops ‘workplace AI agents’ directly inside Slack

Anthropic launched a beta version of its Claude Tag feature for Enterprise and Team tiers, shifting its chat model into shared Slack channels. Moving away from traditional isolated chat boxes, users pull the artificial intelligence model into active group threads by typing @Claude. The integration allows any team member in the channel to delegate a task, review […] The post Anthropic drops ‘workplace AI agents’ directly inside Slack appeared first on AI News .

NVIDIA Developer YouTube 2026-06-24 07:02 UTC Score 77.0 AI-144-20260624-podcasts-and-1a7a6306

Nemotron Office Hours: The Nemotron 3 Model Family | Nemotron Labs

NVIDIA has released the full Nemotron 3 open model family — Ultra, Super, Nano, and Nano Omni. This office hours session covers each model in the series, and any questions you have about Nemotron 3 in general — what it's built for, when to use it, and what's available in open weights, training datasets, and fine-tuning recipes. What we'll cover: - Nemotron 3 Ultra — 550B MoE frontier reasoning model for long-running autonomous agents: 5x faster inference, up to 30% lower cost, hybrid Mamba-Transformer architecture, and MOPD training for consistent performance across agent harnesses - Nemotron 3 Super — mid-range 120B model targeting enterprise applications that need strong reasoning for multi-agent applications - Nemotron 3 Nano — 30B MoE with 3B active parameters, built for high-volume execution, highly accurate sub-agent accomplishing targeted tasks - Nemotron 3 Nano Omni — multimodal (text, image, audio, video) model purpose-built for targeted specialized agentic tasks - Open weights, training datasets, and fine-tuning recipes — what's available across the family and how to customize for your domain Building with or evaluating the Nemotron 3 family? Bring your questions — whether you're choosing between models, fine-tuning for your domain, or deploying at scale, the team will answer them live.

SiliconANGLE AI 2026-06-24 01:25 UTC Score 47.0 USR-0127-20260624-global-ai-ne-77dfc465

Anthropic debuts Claude Tag, a more capable AI teammate that lives within Slack

Anthropic PBC today unveiled a new version of its chatbot Claude that lives inside Slack, where it operates like a virtual employee. It’s called Claude Tag, and it’s designed to work across entire organizations, helping multiple employees complete tasks for related projects. It builds on existing agentic artificial intelligence tools offered by Anthropic, including Claude Code […] The post Anthropic debuts Claude Tag, a more capable AI teammate that lives within Slack appeared first on SiliconANGLE .

NVIDIA Developer YouTube 2026-06-24 00:22 UTC Score 52.0 AI-144-20260624-podcasts-and-05b3daa4

How NVIDIA Blackwell and NVIDIA Dynamo Scale AI Agents for Production

AI agents place new demands on inference infrastructure. Unlike a single chatbot response, an agentic workflow can involve many LLM calls, tool calls, long context windows, and repeated cache reuse across a task. NVIDIA Blackwell is designed to handle these production-scale agent workloads with high throughput, low latency, and improved energy efficiency. This livestream explains how NVIDIA Blackwell helps developers scale AI agents in production, using AgentPerf results as one example of its performance on real-world coding-agent workloads. We’ll also cover how NVIDIA Dynamo adds software-level optimizations for routing, scheduling, and KV cache management. What you’ll learn: Why AI agents require different infrastructure than standard chat applications. How NVIDIA Blackwell improves throughput and efficiency for concurrent agent workloads. What AgentPerf results show about Blackwell performance on realistic agentic coding tasks. How Dynamo optimizes inference with agent-aware routing, scheduling, and KV cache reuse. What developers should consider when deploying AI agents at production scale.

SiliconANGLE AI 2026-06-23 23:52 UTC Score 39.0 USR-0127-20260623-global-ai-ne-5f3f31f1

Nvidia bets on agentic AI to turbocharge biotech discovery

Artificial intelligence played a prominent role at this week’s Bio International Convention in San Diego, the largest biotech event with vendors spanning the full ecosystem of companies in this industry. Today in a special address, Kimberly Powell (pictured), vice president and general manager of healthcare and life sciences at Nvidia Corp., made the case that agentic AI […] The post Nvidia bets on agentic AI to turbocharge biotech discovery appeared first on SiliconANGLE .

SiliconANGLE AI 2026-06-23 20:52 UTC Score 38.0 USR-0127-20260623-global-ai-ne-1a50a0d0

9 ways AI is reshaping enterprise operations: Key insights from AWS Summit NYC

The conversations at last week’s AWS Summit NYC 2026 showed that AI evolution is entering a new phase. From physical robots tackling labor shortages to agentic systems reshaping enterprise operations, the focus is shifting from experimentation to practical deployment. TheCUBE’s host, Gemma Allen, captured candid discussions with Amazon Web Services Inc. executives, partners and customers who are turning […] The post 9 ways AI is reshaping enterprise operations: Key insights from AWS Summit NYC appeared first on SiliconANGLE .

InfoWorld AI 2026-06-23 16:45 UTC Score 36.0 USR-0126-20260623-global-ai-ne-cfbd4ad8

EDB converges analytics on Postgres to support AI agents

Separating transactional databases from analytical systems was, until recently, considered good architecture. Now, as enterprises adopt AI agents that continuously read, reason over, and act on business data, data warehouse and database vendors are increasingly deciding that separation has become a liability. Just weeks after Databricks unveiled its Lakehouse Transaction and Analytical Processing (LTAP) offering based on Neon Postgres to bring operational (OLTP) and analytical (OLAP) processing closer together, EnterpriseDB (EDB) has introduced converged analytics capabilities for its managed EDB Postgres AI database service with the same intent. Both vendors are responding to the same pressure of enabling AI agents for enterprises to operate on fresh operational data without waiting for pipelines and replicas, but EDB argues its approach starts from a fundamentally different place. “Databricks is building from the lakehouse outward, trying to pull transactional capability in through Lakebase,” said Max Romanenko , chief engineering officer at EDB, while “we’re building from the operational layer with Postgres , which is where enterprises already run their most critical workloads, and expanding from there.” In contrast to Databricks’ lakehouse-centric LTAP, EDB keeps Postgres as the operational source of truth and uses Apache Iceberg as a shared catalog layer connecting Postgres with ClickHouse , WarehousePG, and Spark compute engines, Romanenko said. In this way, operationa…

Google DeepMind YouTube 2026-06-23 15:48 UTC Score 61.0 AI-145-20260623-podcasts-and-6366ba2d

When millions of AI agents meet

The conversation of the moment is focused on one topic: AI agents. Unlike traditional language models that simply respond to a prompt, autonomous agents can execute multi-step plans and perform complex tasks on your behalf. But what happens when millions of these agents are not just working for us, but transacting, negotiating, and delegating to one another? Nenad Tomašev, Senior Staff Research Scientist at Google DeepMind, joins host Hannah Fry to discuss the theoretical framework of a future"agentic economy." Together, they discuss the operational shift from single systems to a cooperative "society of specialists," the psychological risk of human automation bias, and the complex cybersecurity landscape—from dynamic cloaking to agentic traps—required to keep distributed intelligence secure. Timecodes: 00:00 Intro 1:07 Defining AI agents 4:44 Agentic exploration in science and research 15:46 Delegation between agents 22:46 Agentic security and traps 29:31 Building an agentic economy 33:22 Cognitive monoculture 36:29 Distributed intelligence To read the research, search for: Distributional AGI Safety, May 2026 Intelligent AI Delegation, February 2026 Virtual Agent Economies, September 2025 Learn more about our AGI control roadmap: https://deepmind.google/blog/securing-the-future-of-ai-agents/ ___ Subscribe to our channel https://www.youtube.com/@googledeepmind Find us on X https://x.com/GoogleDeepMind Follow us on Instagram https://instagram.com/googledeepmind Add us on Linke…

MongoDB AI Blog 2026-06-23 15:27 UTC Score 69.0 USR-0070-20260623-ai-specialis-63fbc659

Build Trust in Agentic AI: From POC to Production

The enterprise adoption of artificial intelligence has reached an inflection point. Organizations are rapidly moving into the era of agentic AI, autonomous systems capable of executing complex reasoning and making operational decisions independently. Yet as executives attempt to transition agents from sandbox environments into mission-critical production channels, they inevitably collide with an AI trust gap. Unlike traditional applications, agentic solutions interpret intent and take autonomous action on behalf of your business. Traditional IT tools are not designed to manage dynamic solutions. To scale securely, organizations must deploy a proactive control plane that evaluates an agent's logic and employs strict governance. In this article, we outline a four-step approach for building safety and optimization into agentic solutions. The approach outlines a broad framework that can be tailored to an organization’s specific needs. The 4-step framework for agentic trust To close the AI trust gap, enterprises need to consider reliability, predictability, accountability, and optimization. To achieve this, we’ve outlined four steps with their critical capabilities. Figure 1. The Agentic Trust Framework: Four steps from reliability to optimized outcomes. Build Trust in Agentic Systems Blog - Image 1 media Customer refund example To anchor this framework, let us follow a common high-stakes workflow, a customer refund. On the surface, resolving a customer refund is a straightforwar…

Analytics Vidhya 2026-06-23 12:30 UTC Score 47.0 AI-034-20260623-ai-specialis-d20f90ab

Sakana Fugu: Multi-Agent System as a Model

For years, AI progress has centered on scaling individual foundation models: larger parameters, longer context windows, stronger reasoning, and better tool use. Sakana AI’s Fugu points elsewhere, behaving like one model from the outside while coordinating multiple expert agents internally. A single API call can trigger direct answering, specialist delegation, intermediate verification, and final synthesis, […] The post Sakana Fugu: Multi-Agent System as a Model appeared first on Analytics Vidhya .

InfoWorld AI 2026-06-23 09:00 UTC Score 63.0 USR-0126-20260623-global-ai-ne-ff44453e

The missing layer in enterprise agentic AI

In the past year, the enterprise AI ecosystem has gained enormous capability and zero consensus. Developers now have a remarkable set of tools for building AI agents: OpenAI’s frameworks, Anthropic’s Claude tooling, LangChain, LangGraph, CrewAI, Microsoft AutoGen, and a growing list of alternatives. Each promises to coordinate reasoning loops, manage multi-step task execution, and connect agents to tools and APIs. For experimentation, the progress has been substantial. Teams can now assemble sophisticated agent workflows in days that would have taken months two years ago. But I’ve watched this pattern before. In over two decades of building and selling distributed systems platforms, I’ve seen the same dynamic play out across nearly every major infrastructure shift: the tools for consuming a new capability arrive before the infrastructure for governing it does. The gap that emerges isn’t immediately obvious in development environments. It becomes obvious in production. That’s exactly where enterprise AI stands today. What agent frameworks don’t handle Modern agent frameworks are fundamentally coordination systems. They determine what a system should do: which tools to call, how to sequence tasks, how to delegate work across agents. That’s hard work, and they’ve gotten quite good at it. What they rarely address is where those tasks are allowed to run, and under what conditions. Take a seemingly simple workflow: summarize customer support transcripts using an LLM. In a developm…

NVIDIA Blog 2026-06-23 06:00 UTC Score 61.0 AI-055-20260623-official-ai--de8964e1

NVIDIA Brings Trusted, 24/7 AI Agents to Telecom Operations

Telecom operators have seen remarkable returns from using generative AI to automate network management, customer care and back-office operations. Most of that impact has been task‑based: automation that speeds up predetermined steps while people manually correlate insights and direct next steps. Automation is no longer the finish line — it’s the launchpad to autonomy. The […]

Simon Willison Weblog 2026-06-22 23:43 UTC Score 86.0 USR-0110-20260622-ai-specialis-2d1def08 Top pick

Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

This morning on Hacker News I saw Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance , describing a small but effective inpainting model - a model where you can mark regions of an image to remove and the model imagines what should fill the space. The released model required PyTorch and NVIDIA CUDA , but since it described itself as 0.2B I decided to try and get it running using WebGPU in a browser. TL;DR: I got it working, and you can try the demo at simonw.github.io/moebius-web/ . Read on for the details. The finished tool Here's a video demo of the finished tool: You can open any image in it (non-square images get letterboxed), highlight areas to remove, click the "Run inpaint" button and wait for the model to do its magic. A parallel agent side-project My main project for today was landing a major feature in Datasette: a UI for creating and altering tables, as a follow-up to the insert and edit rows feature I released last week. I was working on that in Codex Desktop (here's the PR ) and often found myself spending 5-10 minutes spinning my fingers waiting for it to complete a mid-sized refactor or add the finishing touches to a change to the UI. (An amusing thing about coding agents is that the harder a problem is the more time you have to get distracted while you wait for them to finish crunching!) So I decided to spin up Claude Code in a terminal window and see how far I could get at porting Moebius to the web. Some agentic research to kick…

AWS Machine Learning Blog 2026-06-22 17:53 UTC Score 46.0 AI-057-20260622-official-ai--2854b398

Building pay-per-intelligence for AI agents: How Ampersend uses Amazon Bedrock AgentCore Payments

In this post, you will learn how Ampersend built a pay-per-intelligence routing layer on top of Amazon Bedrock AgentCore Payments. AI agents autonomously route tasks to the most effective model, pay per request, and operate within spending budgets. You will also see how the two-hop payment pattern works end-to-end and how to get started with your own implementation.

InfoWorld AI 2026-06-22 16:30 UTC Score 36.0 USR-0126-20260622-global-ai-ne-32b3a26c

AWS Continuum offers devs help with securing code

AI coding agents are making it easier than ever to produce software. Ensuring that software is secure before deployment is another matter — one that AWS thinks AI should help with too. As enterprises adopt agentic development workflows, the volume of first-party code being created and modified is rising rapidly. Yet the process of validating vulnerabilities, determining whether they are exploitable, and fixing them often still depends on developers and security teams working through findings manually. AWS is aiming to address that imbalance with Continuum, a new service designed to continuously discover, investigate, and remediate vulnerabilities in enterprise environments, whether the code is their own or from third parties. Rather than simply generating alerts, the service is intended to help enterprises move findings through the entire remediation lifecycle, AWS VP of Security and Observability Chet Kapoor wrote in a blog post . For first-party applications, Continuum can analyze code, validate whether vulnerabilities are exploitable, generate remediation recommendations, and propose fixes that can be reviewed through existing software development workflows, helping developers address security issues without requiring security teams to manually investigate every finding, Kapoor said. Once users think Continuum has learned enough about their environment and understands their guardrails, they can put it in what AWS calls “enforce mode” to autonomously fix any code lapses, K…

Artificial Intelligence News 2026-06-22 16:11 UTC Score 44.0 AI-029-20260622-ai-specialis-15ef7d1b

Mitigating vendor lock-in with Sakana AI Fugu multi-agent models

Sakana AI launched Fugu to orchestrate multi-agent operations and mitigate single-vendor dependency risks in enterprise deployments. Enterprises face operational vulnerabilities when relying entirely on monolithic AI APIs. Japanese AI firm Sakana AI designed Fugu as a response to these concentration risks by creating an orchestration language model that calls upon a pool of varied models […] The post Mitigating vendor lock-in with Sakana AI Fugu multi-agent models appeared first on AI News .

NVIDIA Blog 2026-06-22 13:00 UTC Score 48.0 AI-055-20260622-official-ai--2ace5aa8

NVIDIA Vera CPU Opens the Way for Agentic Scientific AI at Los Alamos National Laboratory

Mission, Vision and Veritas — new Los Alamos National Laboratory (LANL) supercomputers to be built with HPE and NVIDIA — are tapping NVIDIA Vera CPUs to accelerate scientific discovery, unlocking agentic AI for science. The supercomputers will use the HPE Cray Supercomputing GX5000 architecture with the NVIDIA Vera Rubin platform, combining NVIDIA Vera CPUs, NVIDIA […]

NVIDIA Blog 2026-06-22 13:00 UTC Score 51.0 AI-055-20260622-official-ai--5593704f

Eco Wave Power Turns Waves Into Watts With NVIDIA AI Infrastructure and Digital Twins

The next era of AI will not be defined by compute alone. Its growth will be determined by energy. As accelerated computing scales across AI factories, agentic AI, industrial AI, edge computing and physical AI — including robotics and autonomous systems — global electricity demand is rising at unprecedented speed. In many regions, expanding grid […]

Simon Willison Weblog 2026-06-21 22:01 UTC Score 46.0 USR-0110-20260621-ai-specialis-93e5f67a

Temporary Cloudflare Accounts for AI agents

Temporary Cloudflare Accounts for AI agents The announcement says this is "for AI agents" but (as is pretty common these days) the AI hook isn't really necessary, this is an interesting feature for everyone else as well. Short version: you can now create a Cloudflare Workers project and run this, without even creating a Cloudflare account: npx wrangler deploy --temporary Cloudflare will deploy the application to a new, ephemeral project which will stay live for 60 minutes. I had GPT-5.5 xhigh in Codex Desktop build this test application providing a tool for following HTTP redirects and returning the final destination. The temporary deployment worked as advertised. Running the deployment spits out the URL to a page for claiming the new project, for if you want it to last for more than 60 minutes. Here's what that claim screen looks like: Via Hacker News Tags: cloudflare

Two Minute Papers 2026-06-19 14:06 UTC Score 36.0 AI-139-20260619-podcasts-and-ae508afa

Scientists Found A Better Language For AI Agents

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here: https://recursivemas.github.io/ https://github.com/RecursiveMAS/RecursiveMAS Brain reading video: https://www.youtube.com/watch?v=IUg-t609byg 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi

Artificial Intelligence News 2026-06-19 14:02 UTC Score 31.0 AI-029-20260619-ai-specialis-0d45745f

SAP and Google Cloud deploy agentic commerce architecture

SAP and Google Cloud are deploying agentic commerce architecture to automate multi-agent marketing and retail operations at enterprise scale. SAP research indicates 78 percent of businesses consider AI essential for retaining customers in 2026. However, the same data reveals fewer than two in five companies share customer data across customer experience (37%) or CRM (39%) […] The post SAP and Google Cloud deploy agentic commerce architecture appeared first on AI News .

Stack Overflow AI Blog 2026-06-19 14:00 UTC Score 41.0 USR-0063-20260619-ai-specialis-e1221474

Dispatches from O'Reilly: From capabilities to responsibilities​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍…

Designing contract-bound AI agents for high-stakes execution.​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​​​​​‌‍​‍​‍‌​‍‌​‌‌‌‍‌‍​‌​‍‌​‌‌‍​‌‍​‍​‍‌​‍‌​‌​‌‍‌​‌‍‌​​‍​​‍‌‌‍​‍‌‍‌‍​‌​​‌​‍‌‌‍‌​​​​​‌‍​‍​​‌‌​‍‌​‌​​​​‌‍​‌​​​​‍‌‍​‍​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​​​​​‌‍​‍​‍‌​‍‌​‌‌‌‍‌‍​‌​‍‌​‌‌‍​‌‍​‍​‍‌​‍‌​‌​‌‍‌​‌‍‌​​‍​​‍‌‌‍​‍‌‍‌‍​‌​​‌​‍‌‌‍‌​​​​​‌‍​‍​​‌‌​‍‌​‌​​​​‌‍​‌​​​​‍‌‍​‍​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌

Cloudflare AI Blog 2026-06-19 13:00 UTC Score 43.0 USR-0067-20260619-ai-specialis-528d0e93

Temporary Cloudflare Accounts for AI agents

The moment an agent needs to deploy something, it slams face-first into a wall built for humans. Today we're rolling out Temporary Accounts on Cloudflare Workers. Any agent can now run wrangler deploy — temporary and get a live Worker in seconds.

AI Alignment Forum 2026-06-18 16:50 UTC Score 63.0 USR-0151-20260618-community-fo-ac147592

GDM AI Control Roadmap

GDM has published an AI Control Roadmap ! From the executive summary: We present the GDM AI Control Roadmap (v0.1) – our plan for implementing and adopting internal guardrails designed to catch potential adversarial behaviour by AI agents, even as they become increasingly harder to oversee and contain. We focus on system-level mitigations that limit the harm a misaligned AI system could cause. Specifically, this report provides: • Threat modelling : Taking inspiration from cybersecurity, we adopt a conservative, worst-case approach to threat modelling throughout this paper, and assume a hypothetical AI adversary pursuing undesirable goals in internal deployment. We introduce TRAIT&R, a taxonomy of tactics and techniques available to such a hypothetical AI adversary, building on the established security framework MITRE ATT&CK. We categorise new hypothetical threats into three core areas: loss of control (e.g., rogue internal deployments), work sabotage (e.g., intentionally flawed alignment and safety research), and direct harm (e.g., critical asset destruction or exfiltration). • Control invariants : We establish two defensive invariants that we aim to maintain, even as the hypothetical AI adversary becomes increasingly capable: i) reliable detection of misaligned intent or actions, and ii) effective prevention and response to attack attempts. For control to serve as an effective line of defence, our ability to detect and prevent attacks should exceed AI agents’ ability to ev…

Arize AI Blog 2026-06-18 14:00 UTC Score 51.0 USR-0079-20260618-ai-specialis-894e59db

What is an agent harness? Why harnesses are replacing agent frameworks

Agent harnesses are replacing frameworks as the real product surface for reliable AI agents, shifting the work from prompt tuning to loops, tools, traces, evals, and operational metrics. The post What is an agent harness? Why harnesses are replacing agent frameworks appeared first on Arize AI .

NVIDIA Blog 2026-06-18 06:00 UTC Score 62.0 AI-055-20260618-official-ai--4751d22d

France Advances Europe’s AI Future With NVIDIA Technologies

A year ago at NVIDIA GTC Paris at VivaTech, France laid out plans to advance local AI — from new AI factories and national compute capacity to open frontier models and industrial platforms. Now, that AI infrastructure is coming online. AI agents are running in production, startups are deploying applications and the French AI ecosystem […]

Simon Willison Weblog 2026-06-17 23:58 UTC Score 68.0 USR-0110-20260617-ai-specialis-1ddceea5

GLM-5.2 is probably the most powerful text-only open weights LLM

Chinese AI lab Z.ai released GLM-5.2 to their coding plan subscribers on June 13th, and then yesterday (June 16th) released the full open weights under an MIT license. Similar in size to their previous GLM-5 and GLM-5.1 releases this is a 753B parameter, 1.51TB monster - with 40 active parameters (Mixture of Experts). GLM-5.2 is a text input only model - Z.ai have a separate vision family most recently represented by GLM-5V-Turbo , but that one isn't open weights. GLM-5.2 has a 1 million token context window, up from GLM-5.1's 200,000. The buzz around this model is strong. Artificial Analysis, who run one of the most widely respected independent benchmarks: GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index . GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43) They did however find it to be quite token-hungry: GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k) The model is also now ranked 2nd on the Code Arena WebDev leaderboard , behind only Claude Fable 5. That leaderboard measures "front-end web development tasks, including agentic coding workflows". I'm impressed to see it rank so highly given the lack of image input, which I had incorrectly assum…

DeepLearning.AI YouTube 2026-06-17 15:00 UTC Score 51.0 AI-138-20260617-podcasts-and-73e9c00c

Voice for AI Agents and Applications

Learn more: https://bit.ly/4vPQ3HE Voice is one of the most natural human interfaces, but adding it to AI applications has historically forced a tradeoff: fast voice-to-voice models that sacrifice reliability, or accurate speech-to-text-to-LLM-to-speech pipelines that add latency. This course teaches you how to get both, using Vocal Bridge's architecture that pairs a real-time foreground agent with a reasoning background agent. Taught by Ashwyn Sharma, CEO and Co-Founder of Vocal Bridge (an AI Fund portfolio company), this course covers three practical integration patterns that meet you where you are: voice embedded in an application, voice layered onto an existing agent without touching its logic, and voice as a tool your LLM can call when it decides a conversation is the right modality. In detail, you'll survey the traditional voice stack and its tradeoffs, then explore three live integration patterns to understand when each one applies. Build a voice-interactive tic-tac-toe game where voice commands and mouse clicks work together over a single synchronized channel, then add a voice layer to an existing agent with minimal code, leaving your prompts, RAG pipeline, and tools untouched. Give your agent a make_phone_call tool so it can dial a real number, hold a conversation with a demo agent, and stream the transcript back live. Set up evaluation-driven development using Vocal Bridge's multimodal evaluator to score calls, catch regressions, and refine prompts before issues re…

Amazon Science AI 2026-06-17 14:32 UTC Score 65.0 AI-058-20260617-official-ai--64e19b4c

TRAJECT-Bench: A trajectory-aware benchmark for evaluating agentic tool use

Large language model (LLM)-based agents increasingly rely on tool use to complete real-world tasks. While existing works evaluate the LLMs' tool use capability, they largely focus on the final answers yet overlook the detailed tool usage trajectory, i.e., whether tools are selected, parameterized, and ordered correctly. We introduce TRAJECT-Bench, a trajectory-aware benchmark to comprehensively evaluate LLMs' tool use capability through diverse tasks with fine-grained evaluation metrics. TRAJECT-Bench pairs high-fidelity, executable tools across practical domains with tasks grounded in production-style APIs, and synthesizes trajectories that vary in breadth (parallel calls) and depth (interdependent chains). Besides final accuracy, TRAJECT-Bench also reports trajectory-level diagnostics, including tool selection and argument correctness, and dependency/order satisfaction. Analyses reveal failure modes such as similar tool confusion and parameter-blind selection, and scaling behavior with tool diversity and trajectory length where the bottleneck of transiting from short to mid-length trajectories is revealed, offering actionable guidance for LLMs' tool use.

Stack Overflow AI Blog 2026-06-17 14:00 UTC Score 41.0 USR-0063-20260617-ai-specialis-34750d73

AI agents are a confused deputy with the keys to your kingdom​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍…

How attackers took twenty thousand Instagram accounts by asking Meta's AI politely, and why that failure is about to become common.​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‌‍​​‍‌‍​‌‌‍​‌‌‍​‍​​‌​‌‌‌‍​‌​‍‌​​​​​‌​‌‌‌‍​‌​‍‌​‌​‌‍​‍‌‍‌‍​​‍​‍‌​‍​‌‍​‍​​‌‍‌‌​‍‌‌‍​‍​‍‌​‌​‌‍​‌‍​​​​‌‌​‌‍​‍‌​‌​‌​​​​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‌‍​​‍‌‍​‌‌‍​‌‌‍​‍​​‌​‌‌‌‍​‌​‍‌​​​​​‌​‌‌‌‍​‌​‍‌​‌​‌‍​‍‌‍‌‍​​‍​‍‌​‍​‌‍​‍​​‌‍‌‌​‍‌‌‍​‍​‍‌​‌​‌‍​‌‍​​​​‌‌​‌‍​‍‌​‌​‌​​​​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌

NVIDIA Developer YouTube 2026-06-17 06:54 UTC Score 62.0 AI-144-20260617-podcasts-and-91029169

Nemotron 3 Ultra and the Open Model Landscape | Nemotron Labs

Nemotron 3 Ultra is NVIDIA's latest frontier-intelligence open model — 5x faster inference, up to 30% lower cost, and fully open: weights, training datasets, and fine-tuning recipes included. In this livestream, we're joined by Nathan Lambert, ML researcher and open model advocate, to dig into what Ultra means for developers building on open models today. We'll cover what sets Ultra apart technically — the hybrid Mamba-Transformer backbone, Multi-Teacher On-Policy Distillation (MOPD), and how it fits into a system-of-models pattern. Nathan brings a researcher's perspective on post-training for agentic systems, and we'll get into where the open frontier model landscape is heading and what it takes to build models worth building on. What you'll learn: - How Ultra's post-training approach compares to what the open model ecosystem has seen at scale - What the hybrid Mamba-Transformer architecture means for long-context, multi-turn agent workflows - How open weights, datasets, and recipes enable domain-specific fine-tuning from day one - Where open frontier models are heading for agentic applications — and what tradeoffs matter most Have questions about Ultra, post-training, or the open model landscape? Drop them live — Nathan and the team will answer them in real time.

TWIML AI Podcast 2026-06-16 22:10 UTC Score 51.0 AI-148-20260616-podcasts-and-8979913e

Why AI Agents Break the GenAI Security Model with Devvret Rishi - #770

In this episode, Sam talks with Dev Rishi, GM of AI at Rubrik, about what happens when agents move beyond answering questions and start taking action across tools, systems, and business processes. We explore why the enterprise playbook of static guardrails plus human approval starts to break down in the agent era. Agents are useful because they can plan, call tools, update systems, write code, send messages, and operate across workflows at machine speed, but those same capabilities make them difficult to govern with rules written in advance or approval prompts reviewed one at a time. Dev explains why tool access increases blast radius, why agents can route around controls in surprising ways, and why human-in-the-loop review can become security theater when agents operate at scale. We also discuss what enterprises need instead: better visibility, runtime enforcement, policy-aware governance, agent observability, and recovery mechanisms for when something goes wrong. Along the way, we dig into MCP and tool sprawl, small language models for policy enforcement, defense in depth, agent rewind, and why AI may be needed to help secure AI. 🗒️ Full show notes: https://twimlai.com/go/770.

AI Alignment Forum 2026-06-16 19:55 UTC Score 67.0 USR-0151-20260616-community-fo-1b774dbe

Predicting LLM Safety Before Release by Simulating Deployment

Paper link Before releasing a new model, labs need to understand not just what it can do, but how it is likely to behave in real-world use, including where it might introduce new risks. This becomes even more important as capabilities increase. As part of our pre-deployment safety review, we leverage targeted evaluations, red-teaming, and other checks to understand model behavior. We’ve now started using a method for simulating model deployments before they happen, which adds a complementary signal: a deployment-like preview of how a candidate model may behave before it reaches users. Deployment Simulation is a method for simulating a future deployment before it happens. We do so by replaying previous conversations in a privacy-preserving manner with a new candidate model. By doing so, we can study how the new model responds in realistic contexts before release, including whether new undesired behaviors emerge and how often they may appear. In our GPT-5.4 study, these forecasts were informative. For categories whose production rates changed by at least 1.5x, deployment simulation predicted the direction of change 92% of the time, compared with 54% for a baseline built from challenging prompts. Simulated deployments also looked much closer to real production traffic on evaluation-awareness measures: traditional evals often visibly have stage lights; production prefixes mostly do not. The hardest case is agentic tool use, where realistic behavior depends on external state: fil…

NVIDIA Blog 2026-06-16 16:30 UTC Score 51.0 AI-055-20260616-official-ai--a8a99da0

HPE AI Factory With NVIDIA Expands for the Era of Agents

Enterprises are moving agentic AI from proof of concept to production — and the next generation of AI factories are built for the era of agents. At HPE Discover Las Vegas, running through Thursday, June 18, NVIDIA and HPE are expanding the HPE AI Factory with NVIDIA, including NVIDIA Vera CPU and NVIDIA Agent Toolkit […]

Stack Overflow AI Blog 2026-06-16 07:40 UTC Score 44.0 USR-0063-20260616-ai-specialis-949ca187

If context is king, architecture is the castle​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌…

Recorded live at the AI Agent Conference, Ryan sits down with Apollo GraphQL CEO Matt DeBergalis to discuss how enterprises can leverage GraphQL and MCP as a structured semantic architecture to feed clean data to autonomous agents, safeguard internal microservices against unprecedented "east-west" data exfiltration risks, and rein in skyrocketing token spend by explicitly querying only the exact context required.​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌‌‍​​​‍​​‌‍​​​​​‍‌‌‍​‍​​‍​‍‌​‌​​​‌​​‌‍‌​​‍‌​‌​‌‍​‌​‌​​​​‍‌​‍‌‌‍​‌‌‍​‍‌‍​​‍‌‌‍‌‌​‌​​​‌‍‌​​​‌‌‍​‍​‌​​​​‌‍‌‍‌‍​‌‍​‌​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‌‍‍‌‌‍‌​​‌‌‍​​​‍​​‌‍​​​​​‍‌‌‍​‍​​‍​‍‌​‌​​​‌​​‌‍‌​​‍‌​‌​‌‍​‌​‌​​​​‍‌​‍‌‌‍​‌‌‍​‍‌‍​​‍‌‌‍‌‌​‌​​​‌‍‌​​​‌‌‍​‍​‌​​​​‌‍‌‍‌‍​‌‍​‌​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‍‌‍‌​​‌‍‌‌‌​‍‌​…

NVIDIA Developer YouTube 2026-06-15 22:00 UTC Score 56.0 AI-144-20260615-podcasts-and-35153e06

Local Agents on Jetson: OpenClaw, NemoClaw, and AI You Can Build Into Daily Life

This session moves from running a local model to running a local autonomous agent. OpenClaw is a fully local AI assistant that runs on Jetson and connects to chat workflows, browser-based tools, and multi-step tasks. NemoClaw extends this with sandboxing, onboarding, inference routing, and policy controls for safer and more structured agent deployments. We'll show what changes when an AI system can take actions, use tools, and run privately on your own hardware — 24/7, at home, on the edge. Use cases include building dynamic browser-based games, prototyping smart computer vision apps, and running long research tasks without a cloud dependency. You will learn how to move from running a local model to running a fully local autonomous agent on NVIDIA Jetson. We'll cover: Building a local assistant with OpenClaw — extend the Episode 1 baseline into a full local assistant architecture that connects to chat workflows, browser-based tools, and multi-step tasks — running privately on your own hardware, 24/7. NVIDIA Orin Nano vs. AGX Orin vs. Thor — compare hardware paths side by side so you can make the right choice for your deployment constraints and performance needs. Why tool-calling models matter — see what changes when an AI system can take actions, use tools, and run autonomously, and what breaks when your model can't do it reliably. Safer local agents with NemoClaw — go further with sandboxing, onboarding, inference routing, and policy controls that make local agent deploymen…

AI Alignment Forum 2026-06-14 19:45 UTC Score 67.0 USR-0151-20260614-community-fo-49ef5cfc

Why Do Naive SFT Filters For Safety Properties Fail?

This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here . Since SFT is the cause for many safety relevant properties , a natural strategy is to filter out rollouts from SFT that have undesirable properties. However, as we show in this section (and in forthcoming MATS work), SFT data filtering frequently works surprisingly poorly. In this post, we investigate hypotheses for why SFT filtering fails. TL;DR: We discuss seven hypotheses for why SFT filtering works surprisingly poorly We analyze three hereditary traits that SFT-only Gemini has that other models do not: negative emotion, date confusion, and blackmail in the (highly contrived) agentic misalignment scenario We use a “post-training diffing pipeline” between Gemini and Olmo to show that the cause of date confusion and blackmail is largely surprising transfer of behaviors from the SFT teacher model. Notably, there exist small sets of prompts where switching the teacher model for the rollout removes date confusion and blackmail, but dropping the prompts does not. Negative emotion is less affected by the teacher model, but this may be because the Olmo prompt distribution we are SFTing on underspecifies the behavior. Takeaways: It’s hard to remove behaviors via filtering But if you can get a teacher model to have a behavior (e.g. via RL), then transferring that in the future is easier…

NVIDIA Developer YouTube 2026-06-13 00:28 UTC Score 53.0 AI-144-20260613-podcasts-and-6841f9b8

Spark Hack Toronto Winner Spotlight: Belong & City Flow

NVIDIA Spark Hack Toronto brought developers together for a weekend challenge: build an agentic application that runs locally on DGX Spark using open models and Toronto Open Data. Teams tackled everything from small business forecasting and dementia care to city-scale traffic simulation — all on an ASUS Ascent GX10 powered by the NVIDIA GB10 Grace Blackwell Superchip. Belong won the Public Services track with an AI companion designed for people living with dementia and their caregivers. Running entirely on DGX Spark, it helps users recognize family members, remember appointments, and find local services — keeping all conversations and memory data on-device using Nemotron, speech, and retrieval systems. CityFlow won the Economic Systems track with a real-time intelligence platform for small businesses. By combining transit disruptions, road closures, weather, events, and other Toronto data sources, it helps business owners answer practical questions like "Am I properly staffed for Friday night?" — using Nemotron to generate recommendations and NVIDIA cuOpt to turn forecasts into actual staffing plans. Join us live to see demos and chat with both teams. Bring your questions about building with local AI compute and Toronto Open Data.

Amazon Science AI 2026-06-12 12:40 UTC Score 80.0 AI-058-20260612-official-ai--0a894f67

AutoClimDS: Climate data science agentic AI — A knowledge graph is all you need

Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific workflows. The KG provides a unifying layer that organizes datasets, tools, and workflows, while AI agents—powered by generative AI services—enable natural language interaction, automated data access, and streamlined analysis. Together, these components drastically lower the technical threshold for engaging in climate data science, enabling non-specialist users to identify and analyze relevant datasets. By leveraging existing cloud-ready API data portals, we demonstrate that 'a knowledge graph is all you need' to unlock scalable and agentic workflows for scientific inquiry. The open-source design of our system further supports community contributions, ensuring that the KG and associated tools can evolve as a shared commons. Our results illustrate a pathway toward democratizing access to climate data and establishing a reproducible, extensible framework for human–AI collaboration in scientific research.

MongoDB AI Blog 2026-06-11 19:46 UTC Score 59.0 USR-0070-20260611-ai-specialis-3fe555ce

Production-Ready Agents Need A Production-Ready Data Platform

There’s a common theme to the conversations I’ve been having with AI teams lately: change. Constant, head-spinning change. Teams across industries are evaluating and re-evaluating model providers, agent frameworks, and harnesses on a continuous basis. At MongoDB, we believe that your choice of technology partner—specifically, your data platform—should simplify how you build with AI. It should deliver performance at scale, enable you to build and run anywhere, and it should allow you to choose your own providers and frameworks. This is exactly what MongoDB offers, and it’s why more than 67,000 customers rely on us for their most important applications. The organizations seeing the most AI success are the ones whose technology stacks are set up for the current pace of change. For example, DevRev’s AgentOS platform is powered by MongoDB Atlas. AgentOS handles billions of requests each month, for everything from AI-assisted insights and analytics to internal communications and development. Relying on MongoDB Atlas has helped DevRev get innovations to market faster, and enables the company to scale seamlessly as it grows. MongoDB is ideal for agentic AI in two key ways. First, an agent is only as smart as its context—which requires blending short-term memory, long-term knowledge, and enterprise data. Because this information is highly dynamic and unstructured, JSON is the ideal format. It provides the schema flexibility inherently needed by the data and allows attaching metadata…

Practical AI Podcast 2026-06-11 09:00 UTC Score 40.0 AI-143-20260611-podcasts-and-b6225d13

Zero Trust for AI Agents

As AI agents become more capable and autonomous, they also introduce new security challenges. In this 'Fully Connected' episode, Dan and Chris unpack Anthropic’s Zero Trust for AI Agents security framework and what it means for organizations deploying agentic systems. They examine the key security risks facing agentic systems and discuss how organizations can apply Zero Trust principles to deploy AI agents safely. Along the way, they break down practical security controls and discuss how traditional cybersecurity principles must evolve for the age of AI agents. Featuring: Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: Zero Trust for AI Agents OWASP GenAI Project Sponsors: Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalai Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026

AI Weekly 2026-06-11 00:00 UTC Score 40.0 AI-133-20260611-newsletters-03f4c9f3

AI Weekly Issue #502: Your AI can now spend your money — Visa wired it into ChatGPT

Visa just wired ChatGPT to shop and pay on your behalf — an AI agent can now buy at any Visa merchant without you clicking "buy." It capped a week where the labs pushed autonomy and capital to new highs: Anthropic put Claude Fable 5, its most powerful public model, into everyone's hands; Jeff Bezos came out of stealth with Prometheus, a $41B startup building an "artificial general engineer." A self-replicating worm hit 73 of Microsoft's own GitHub repositories through AI coding tools. Anthropic broke with the White House over preempting state AI laws; a German court ruled Google is liable for what its AI Overviews say. The agents got more capable this week — and a lot more autonomous.

Cornell AI Initiative 2026-06-10 19:06 UTC Score 44.0 USR-0014-20260610-research-aca-52e44e44

Amazon partnership establishes Cornell AI security initiative

Cornell computer scientists will lead the development of safety protocols to shore up AI agents and the code they produce. The post Amazon partnership establishes Cornell AI security initiative appeared first on Cornell AI Initiative .

JetBrains AI Blog 2026-06-10 03:31 UTC Score 41.0 USR-0065-20260610-ai-specialis-6fae1668

Agentic AI Governance: Designing for Accountability and Control

Many organizations are already deploying agentic workflows. Some are still experimental, while others are running in production. Once an AI agent can take action on behalf of a business, the question is no longer whether it’s useful, but what happens when something goes wrong. It’s tempting to focus on blame: the AI vendor, the manager, […]

Stack Overflow AI Blog 2026-06-09 07:40 UTC Score 41.0 USR-0063-20260609-ai-specialis-435ba2ff

Creating checkpoints by gaslighting a Postgres database​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌…

Ryan welcomes Bryan Clark, director of product for Lakebase at Databricks, to discuss what happens when AI agents become the primary creators and users of databases; why agents are “sloppy” about cleaning up infrastructure; and how database branching, scale-to-zero, and centralized access control can help teams keep up with agent-driven development.​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‍‌‌‍​‌‍​‍​​​‌‍​‌‍​‍​​​‍‌​‍‌‌‍‌‍‌‍​‌‍​​‍‌​‍‌​‌​‌‍​‍​​​​‍‌​‍‌​‍‌​​‍‌‍‌‌​‌‍​‍‌​‌‍‌‍​‌​‍​‌‍​‌‌‍‌​‌‍‌‌‌‍‌‍​‍​‌‍‌​‌‍​​​​​​‍​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‍‌‌‍​‌‍​‍​​​‌‍​‌‍​‍​​​‍‌​‍‌‌‍‌‍‌‍​‌‍​​‍‌​‍‌​‌​‌‍​‍​​​​‍‌​‍‌​‍‌​​‍‌‍‌‌​‌‍​‍‌​‌‍‌‍​‌​‍​‌‍​‌‌‍‌​‌‍‌‌‌‍‌‍​‍​‌‍‌​‌‍​​​​​​‍​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌…

Two Minute Papers 2026-06-06 06:20 UTC Score 29.0 AI-139-20260606-podcasts-and-ab8fb04a

AI Agents as "Games Masters"? 🎮🔥

Check the pinned comment for the link to the full interview. Could AI agents eventually become the "Games Master" driving your gaming storylines? We explore the concept of AI assisting players or creating dynamic, non-scripted narratives. Discover how AI is currently being tested inside immersive game environments to change how we play. 🧠 Hashtags: #aiingames #gaming #ai #gamedev #futuretech

AI Weekly 2026-06-04 00:00 UTC Score 29.0 AI-133-20260604-newsletters-0709d790

AI Weekly Issue #499: Microsoft proves it doesn't need OpenAI; Alphabet raises $85B

Microsoft used its own developer conference to show it can live without OpenAI, Florida's attorney general sued OpenAI and went after Sam Altman personally, researchers and a new Workday product made plain that nobody trusts AI agents yet, and Alphabet raised a record $85 billion the same week the Fed flagged AI as a systemic risk. The money is moving faster than the trust.

MongoDB AI Blog 2026-06-03 19:51 UTC Score 45.0 USR-0070-20260603-ai-specialis-5c2e80c2

Agentic Supplier Management with MongoDB Atlas, Voyage AI, and Multi-Modal Search

Retail supply chains are not a back-office logistics function; they are a high-stakes, board-level concern. Imagine learning suddenly that shipment rerouting surcharges have doubled due to new regional escalations; the impact on competitive differentiation and consumer trust is immediate. As a result, a long-standing focus on linear efficiency and lean inventory is being disrupted by a mandate for resilience and AI-driven responsiveness. To survive, retailers must move beyond the rigidity of legacy systems and embrace an AI-ready data platform that can pivot as fast as headlines change. Indeed, a 2026 study by KPMG reported that businesses are establishing new performance metrics, centered around post-disruption recovery time, supplier diversification, sourcing agility, revenue growth from improved experiences, cost savings, and employee engagement. Now, retailers are modernizing their supplier management capabilities. An effective supplier management application that boosts visibility, builds resilience, and delivers material business benefits must be underpinned by unified supplier data and AI copilots. To unlock these next-generation capabilities, retail leaders use MongoDB as a unified data foundation, enabling the high-velocity intelligence and material results required in today’s volatile landscape. However, the business agility of many organizations remains restricted by their enterprise resource planning (ERP) systems, which were designed for an era when stability wa…

Weaviate Blog 2026-06-03 00:00 UTC Score 36.0 USR-0073-20260603-ai-specialis-3c8faf5e

Engram is now Generally Available

Engram, Weaviate's managed memory and context service for agentic applications, is now generally available.

Two Minute Papers 2026-06-01 15:41 UTC Score 53.0 AI-139-20260601-podcasts-and-efe386f0

What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean

Thank you to Google for the invite! 🙏 ❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ Thumbnail design: https://felicia.hu Chapters: 00:00 Intro 02:07 Are We Running Out of AI Data? 06:22 The 90% Shift: Why Inference is Taking Over 09:34 The End of the Pre-Training and Post-Training Split 12:02 What Happens After a 1,000,000x Compute Leap? 15:03 How Distillation is Supercharging Open Models 16:17 The Quest for a "Lifetime AI" 17:25 Multi-Agent Workflows 18:40 AI Generating Operating Systems (and Running Doom) 20:15 Solving The Attention Problem 22:13 Data Center Disasters: Supernovas and Cosmic Rays 24:45 The Lightning Round: Jeff Dean Chuck Norris Jokes 25:40 The One Thing Jeff Dean Got Wrong (Healthcare AI) 26:50 The Ultimate Developer Debate: Vim vs. Emacs

Microsoft Research Blog 2026-05-28 16:00 UTC Score 45.0 AI-053-20260528-official-ai--fafc6a0c

Data Formulator 0.7: AI-powered data analytics for enterprise data

Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into actionable insights. The post Data Formulator 0.7: AI-powered data analytics for enterprise data appeared first on Microsoft Research .

Practical AI Podcast 2026-05-28 09:00 UTC Score 33.0 AI-143-20260528-podcasts-and-4920bb51

Rebooting Enterprise AI with MCP and Kubernetes

What happens when AI agents start acting less like chatbots and more like coworkers? In this episode, Dan and Chris sit down with Craig McLuckie, CEO of Stacklok to explore MCP, Kubernetes, ToolHive, enterprise AI, and the emerging infrastructure powering AI-native applications. From identity management to agent orchestration and system architecture, this conversation dives into how organizations may soon manage entire fleets of AI agents working behind the scenes. Featuring: Craig McLuckie – LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: Stacklok Toolhive Sponsors: Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalai Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026

Comet ML Blog 2026-05-27 21:12 UTC Score 46.0 USR-0082-20260527-ai-specialis-8c503c6e

The Best AI Observability Tools for Agentic Systems in 2026

AI applications used to rely on a handful of straightforward LLM calls. Now agents make hundreds of decisions in response to a single user input, calling tools, retrieving context, and compounding outputs. When something goes wrong, the failure can be six steps deep and invisible from the outside. Most AI observability tools were designed to […] The post The Best AI Observability Tools for Agentic Systems in 2026 appeared first on Comet .

JetBrains AI Blog 2026-05-27 08:53 UTC Score 53.0 USR-0065-20260527-ai-specialis-e07a5583

Koog 1.0 Is Out: Stable Core, Better Interop, and Multiplatform Observability

Last week at the KotlinConf 2026 keynote (watch the recording here), we announced Koog 1.0. Koog is JetBrains’ open-source framework for building AI agents in Kotlin and Java. It provides the core building blocks for agentic applications: tools, workflows, persistence, memory, observability, and integrations with existing JVM and Kotlin Multiplatform projects. We introduced Koog at […]

AI Weekly 2026-05-27 00:00 UTC Score 40.0 AI-133-20260527-newsletters-3a7abad9

AI Weekly Issue #496: Anthropic's Pentagon model is now everyone's model

Anthropic released Mythos to the public, collapsing the wall between cleared-contractor frontier AI and developer-grade frontier AI in a single press release. DeepMind's Demis Hassabis moved his AGI timeline from "five to ten years" to "a real possibility by 2029" and tied it explicitly to AlphaProof Nexus solving nine open Erdős problems for the cost of a steak dinner. Critical zero-days hit Starlette (a million AI agents on the wire) and CrowdStrike led a coordinated takedown of the Glassworm developer botnet across four C2 channels. BNP Paribas formalized a sovereign-AI security partnership with Mistral while Beijing froze overseas travel for top AI engineers at Alibaba and DeepSeek. And the AI-displaces-workforce arithmetic got honest: Uber burned its full-year AI token budget by April, ClickUp restructured to 1,000 humans alongside 3,000 internal agents, and Sam Altman publicly reversed his white-collar-apocalypse prediction.

DeepLearning.AI YouTube 2026-05-22 17:21 UTC Score 33.0 AI-138-20260522-podcasts-and-8471b5a6

AI Dev 26 x SF | Andi Partovi: Why Every Agent Needs a Simulation Sandbox

AI agents fail in unpredictable ways that traditional testing can't catch — hallucinations, wrong tool calls, policy violations, and more. Teams only discover these failures after users hit them in production. A simulation sandbox gives you a controlled environment with realistic users, tools, and workflows where you can run hundreds of scenarios against your agent before it ships, catching edge cases and adversarial inputs that would be impossible to test manually. This talk by Veris AI's Andi Partovi covers why simulation-driven development is becoming essential infrastructure for any team building production AI agents, and how it closes the gap between "works in demos" and "works at scale."

DeepLearning.AI YouTube 2026-05-22 16:55 UTC Score 33.0 AI-138-20260522-podcasts-and-8486cd5c

AI Dev 26 x SF | Luke Kim: The Agent Data Stack—Why Every AI Agent Needs Its Own Data Stack

From centralized to distributed: In the old world, organizations relied on one centralized data and AI platform. In the new world of AI agents, every agent needs its own sandboxed, secure, and modern data stack. In this 20-minute talk with live demo by Spice AI's Luke Kim, he explores why this architectural shift is critical and the key patterns required to give agents reliable, real-time data.

DeepLearning.AI YouTube 2026-05-22 16:52 UTC Score 37.0 AI-138-20260522-podcasts-and-d969cded

AI Dev 26 x SF | Manos Koukoumidis & Stefan Webb: VibeML: Build your AI model in hours, not months

The next major shift in enterprise AI is underway; enterprises are moving from generic AI they rent to specialized AI they own. The benefits are clear: higher quality, dramatically lower costs, full control, and a quality improvement flywheel while in production. But building specialized AI models has been prohibitively hard; each use case requires months of effort and deep AI expertise. Well, it used to. VibeML is enabling engineers to build specialized AI models automatically from a prompt, in minutes. An AI agent builds your AI model end-to-end; evaluation, data synthesis, training and repeat. This talk by OUMI's Manos Koukoumidis & Stefan Webb demonstrates how VibeML can give deep AI experts superpowers while enabling non-experts as well.

DeepLearning.AI YouTube 2026-05-22 16:42 UTC Score 52.0 AI-138-20260522-podcasts-and-fd6db35f

AI Dev 26 x SF | Or Dagan: Optimizing Accuracy, Cost, and Latency in Real-World Agents

Most agentic systems rely on hardcoded heuristics to navigate execution decisions (e.g. which models, tools, and test-time compute scaling approaches to use) leading to efficiency leakage across cost, latency and accuracy. AI21 Maestro optimizes agents by learning to predict success, cost and latency probabilities across diverse actions and contexts, and driving runtime orchestration that intelligently navigates the full agentic action space. In this session, AI21's Or Dagan demonstrated how this approach yields state-of-the-art results and Pareto frontier on challenging agentic benchmarks, as well as the process required to optimize production agents.

DeepLearning.AI YouTube 2026-05-22 15:29 UTC Score 25.0 AI-138-20260522-podcasts-and-f3378c99

AI Dev 26 x SF | Paul Everitt: The Shift to Agentic Engineering

More code, fewer staff — the industry is on a bender. But what about quality? At AI Dev 26 x San Francisco, Paul Everitt from JetBrains discussed the rise of agentic engineering and how old lessons can be adapted to build new professional practices.

TWIML AI Podcast 2026-05-21 19:38 UTC Score 56.0 AI-148-20260521-podcasts-and-830461d3

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep learning. We begin with AI Virtual Cell, a multiscale effort to learn data-driven representations from proteins to cells to patients using single-cell RNA-seq data, protein language models like ESM, and structure models like AlphaFold—without hand-encoding biology. Jure then dives into relational deep learning, reframing enterprise databases as graphs and training neural networks directly on raw multi-table data. He explains Kumo’s Relational Foundation Model (RFM2), which performs in-context learning over subgraphs to make accurate predictions on new databases and tasks with no training, and how this approach benchmarks against RelBench and other multi-table datasets. We also discuss real-world deployments at companies like Reddit, DoorDash, and Coinbase, explainability via attention over tables and columns, integration with agentic systems, deployment options, and practical limitations. The complete show notes for this episode can be found at https://twimlai.com/go/768.

Microsoft Research Blog 2026-05-21 17:00 UTC Score 48.0 AI-053-20260521-official-ai--7dff8125

MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks. The post MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models appeared first on Microsoft Research .

Practical AI Podcast 2026-05-21 09:00 UTC Score 49.0 AI-143-20260521-podcasts-and-3cd5023d

Hermes Agent: Agents that grow with you

Open Source AI is entering a new era, one shaped by self-improving AI Agents, recursive learning systems, and rapidly evolving AI Tools that blur the line between software and autonomous collaborators. In this episode, Daniel and Chris sit down with Nous Research co-founder and CTO Jeffrey Quesnelle to explore Hermes Agent. Along the way, they discuss models vs. harnesses, the changing role of developers, and one of the biggest questions facing the AI Future: what remains uniquely human as AI capabilities continue to accelerate? Featuring: Jeffrey Quesnelle – Website , LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: Nous Research Hermes Agent Sponsors: Framer: The enterprise-grade website builder that lets your team ship faster. Get 30% off at framer.com/practicalai Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalai Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026

Comet ML Blog 2026-05-20 16:47 UTC Score 41.0 USR-0082-20260520-ai-specialis-a1c86a19

What Held Up at 3 AM: One Engineer’s RAG Case Study

Most AI demos work. Most AI products don’t. This series is a collection of interviews with engineers who shipped AI agents to production, covering the stacks they chose, the architectures they regretted, and what actually held up at 3 am. This is an interview with Michael Maximilien, former CTO and Distinguished Engineer at IBM and […] The post What Held Up at 3 AM: One Engineer’s RAG Case Study appeared first on Comet .

METR 2026-05-19 18:00 UTC Score 58.0 USR-0147-20260519-research-aca-9d04d191

Frontier Risk Report (February to March 2026)

Assessment Window: Feb 16, 2026 – Mar 16, 2026 Download PDF Redaction summary statement: Except where explicitly noted in the report, there was no additional redacted information that was important to our conclusions from any of the participating companies. Executive summary and guide to the report Starting in February 2026, METR conducted a pilot exercise to assess misalignment risks from AI agents used inside frontier AI developers, with participation from Anthropic, Google, Meta, and OpenAI. We make three main contributions in this report, each detailed in a separate section. First, we motivate and outline the process we followed for this exercise. 1 Each participant provided: Access to their most capable internal model(s) at the time of assessment, including raw chains of thought. A wide range of non-public information about the capabilities of the shared model(s), how AI was used and monitored internally, and trends in the pace of progress. METR then prepared private reports for each participant, participants approved what non-public information could be disclosed, and METR wrote this public report. This exercise is entity-based rather than model-specific, and is designed to be repeated periodically rather than tied to public releases. Second, we present six key facts that inform our assessment, drawing on evaluations we conducted on the models that participants shared, 2 evaluations we conducted on public models, information shared by participants, 3 findings from a re…

Google DeepMind YouTube 2026-05-19 17:51 UTC Score 46.0 AI-145-20260519-podcasts-and-ecb209e4

Generating novel scientific hypotheses with Co-Scientist

In an era of information overload, the search for transformative scientific ideas has become a significant bottleneck for progress. Every great scientific breakthrough begins with a single, transformative idea. The spark of discovery relies on a researcher's ability to connect disparate facts and formulate the right hypothesis to test. We believe AI can help dramatically accelerate the pace of breakthroughs by serving as a dedicated partner in the generation and refinement of breakthrough scientific hypotheses. That’s why we’ve developed Co-Scientist, a Gemini-based multi-agent AI system that iteratively generates, debates, and evolves novel hypotheses for complex scientific problems. Read the Nature paper: https://www.nature.com/articles/s41586-026-10644-y and learn more at labs.google/science #googleio #ai #science ____ Subscribe to our channel https://www.youtube.com/@googledeepmind Find us on X https://x.com/GoogleDeepMind Follow us on Instagram https://instagram.com/googledeepmind Add us on Linkedin https://www.linkedin.com/company/deepmind/

Qdrant Blog 2026-05-19 00:00 UTC Score 43.0 USR-0074-20260519-ai-specialis-f80319c6

How GoPerfect Built an Agentic Recruiting Workforce with Qdrant Cloud

GoPerfect mission is to use an AI recruiting workforce that replaces the manual, low-leverage parts of recruiting. Instead, an agent decomposes recruiter intent and runs the work end to end to find top talent. Their agentic platform handles sourcing, scanning, reviewing, outreach, admin work as well as candidate conversations for recruiters, hiring managers, agencies, and CEOs who hire at volume. Recruiting is a needle-in-a-haystack problem with two complications: the haystack is massive (200M+ profiles enriched with 1B+ data points drawn from professional networks, code repositories, company data, and AI-derived signals), and the definition of the “needle” is more nuanced than any keyword filter can express. A product manager is not a product marketer, even though the two sit close together in any reasonable embedding space.

Comet ML Blog 2026-05-15 20:37 UTC Score 46.0 USR-0082-20260515-ai-specialis-6d8c1246

LLM Cost Tracking Solution: How to Monitor and Control AI Spend in Agentic Systems

The first sign of trouble isn’t always performance. Sometimes it’s the invoice. Your team ships a new agent that routes requests, calls tools, runs retrieval, and orchestrates multiple LLM calls to deliver high-quality answers. It looks like a win until the first full-month bill hits, and your LLM spend has quietly tripled. Finance wants answers, […] The post LLM Cost Tracking Solution: How to Monitor and Control AI Spend in Agentic Systems appeared first on Comet .

MongoDB AI Blog 2026-05-11 14:35 UTC Score 45.0 USR-0070-20260511-ai-specialis-e06ea08c

Fighting Tool Sprawl: The Case for AI Tool Registries

As enterprise AI agent adoption scales, the absence of centralized, organization-level tool infrastructure is producing compounding costs. When adoption is built around optimizing for deployment speed, enterprises expose themselves to a combination of risks: duplicated engineering effort, security exposure, and operational opacity. Every enterprise needs its own shared tool registry, one that reflects its specific regulatory environment, security posture, and operational conventions. To be clear, this is not an argument for a public package manager, something like npm, PyPI, or Maven. The infrastructure each enterprise needs is internal; scoped to its own teams, its own data, its own policies, its own domain. Trying to expand the scope beyond the confines of individual organizations would be premature standardization in a fast-moving, nascent space. A shared enterprise tool registry is not an optimization or a nice-to-have. It is foundational infrastructure as agent deployments scale beyond early experiments. The case for it rests on two pillars: reducing coordination cost and enabling risk management, both for the humans building with agents and for the agents themselves. AI agents depend on tools that retrieve data, write records, trigger workflows, and call external APIs. According to McKinsey, in most large organizations, these tools are built by individual teams in an ad hoc fashion: undocumented, ungoverned, and invisible to the rest of the organization. This pattern i…

JetBrains AI Blog 2026-05-11 13:16 UTC Score 48.0 USR-0065-20260511-ai-specialis-e72007ea

The ReSharper 2026.2 Early Access Program Begins: Bringing More AI Agents into Visual Studio

We’re excited to announce that the Early Access Program (EAP) for ReSharper and .NET Tools 2026.2 is now underway! While our EAP announcements usually cover a wide range of new features, performance updates, and bug fixes, this release is different. We are dedicating this first preview entirely to a singular, game-changing initiative: bringing true AI […]

Berkeley AI Research Blog 2026-05-08 09:00 UTC Score 58.0 USR-0004-20260508-research-aca-a8b82a19

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Overview of adaptive parallel reasoning. What if a reasoning model could decide for itself when to decompose and parallelize independent subtasks, how many concurrent threads to spawn, and how to coordinate them based on the problem at hand? We provide a detailed analysis of recent progress in the field of parallel reasoning, especially Adaptive Parallel Reasoning. Disclosure: this post is part landscape survey, part perspective on adaptive parallel reasoning. One of the authors (Tony Lian) co-led ThreadWeaver ( Lian et al., 2025 ), one of the methods discussed below. The authors aim to present each approach on its own terms. Motivation Recent progress in LLM reasoning capabilities has been largely driven by inference-time scaling, in addition to data and parameter scaling ( OpenAI et al., 2024 ; DeepSeek-AI et al., 2025 ). Models that explicitly output reasoning tokens (through intermediate steps, backtracking, and exploration) now dominate math, coding, and agentic benchmarks. These behaviors allow models to explore alternative hypotheses, correct earlier mistakes, and synthesize conclusions rather than committing to a single solution ( Wen et al., 2025 ). The problem is that sequential reasoning scales linearly with the amount of exploration. Scaling sequential reasoning tokens comes at a cost, as models risk exceeding effective context limits ( Hsieh et al., 2024 ). The accumulation of intermediate exploration paths makes it challenging for the model to disambiguate amon…

Practical AI Podcast 2026-05-07 09:00 UTC Score 34.0 AI-143-20260507-podcasts-and-db3298dd

The Myth of Model Wars: Open vs Closed AI in 2026

In this fully connected episode, Dan and Chris break down one of the biggest questions in AI today: do open vs. closed models still matter? From the rise of physical AI and edge devices to the shifting landscape of open-source models like LLaMA, they explore whether the “model wars” are becoming irrelevant. The conversation then dives into a bigger transformation, the rise of agentic systems, workflows, and AI-driven infrastructure. Featuring: Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026

Vector Institute News 2026-05-06 13:30 UTC Score 44.0 USR-0017-20260506-research-aca-fdc9043a

Agentic AI evaluation strategies

Ethan Jackson and Tahniat Khan Part one: Capability evaluations Jump to part two AI agents are no longer a research curiosity. They are deployed on personal machines, integrated into enterprise […] The post Agentic AI evaluation strategies appeared first on Vector Institute for Artificial Intelligence .

Pinecone Blog 2026-05-04 07:01 UTC Score 41.0 USR-0072-20260504-ai-specialis-a63b1231

Pinecone Nexus: The Knowledge Engine for Agents

Pinecone Nexus is a knowledge engine for the agentic AI era, moving reasoning from retrieval to compilation — with KnowQL as the standard query language for agents.

Comet ML Blog 2026-04-21 13:43 UTC Score 46.0 USR-0082-20260421-ai-specialis-af1f3bdb

Introducing Opik Test Suites: Straightforward Unit & Regression Testing for AI Agents

One of the biggest challenges when it comes to agent development is quality. It’s getting easier every day to spin up an MVP or demo of an agent that accomplishes complex tasks through an array of tool calls, context retrieval steps, and system prompts. But it’s still hard to know whether that agent will perform […] The post Introducing Opik Test Suites: Straightforward Unit & Regression Testing for AI Agents appeared first on Comet .

METR 2026-04-21 07:00 UTC Score 63.0 USR-0147-20260421-research-aca-7d76dcc7

Evidence on AI R&D Progress from NanoGPT

I. Introduction We want to measure and understand how much AI agents can accelerate AI R&D and how this is changing over time. There are various sources of evidence we can look to here, including anecdotes about autonomous contributions ( AlphaEvolve and TTT-Discover speeding up a GPU kernels, autoresearch yielding speedups in nanochat), progress on benchmarks, and uplift measurement (see our recent post for a longer discussion). One interesting source of evidence is cumulative progress on publicly tracked challenges like the NanoGPT speedrun, where we can compare agent contributions to human progress over time. Such challenges and leaderboards of cumulative progress on a task are especially useful when: The task maps to real AI R&D (e.g., pretraining a language model) Many contributors have built up a rich history of progress, giving a rough sense of how much human effort went into it (a cost curve) Agents can compete under comparable conditions and potentially make new contributions Let’s look at one such leaderboard: the nanogpt speedrun . The goal is to train a language model to a target validation loss on FineWeb using 8×H100 GPUs as fast as possible . It’s a small-scale version of LLM pretraining with a public history of contributions, with four recent ones credited to AI agents as of April 2026. The optimization activities map to pretraining research such as architecture changes, writing kernels, and improving optimizers. Contributions, such as the Muon optimizer , ha…

Cloudflare AI Blog 2026-04-17 13:05 UTC Score 40.0 USR-0067-20260417-ai-specialis-a339f2da

Introducing the Agent Readiness score. Is your site agent-ready?

The Agent Readiness score can help site owners understand how well their websites support AI agents. Here we explore new standards, share Radar data, and detail how we made Cloudflare’s docs the most agent-friendly on the web.

Cloudflare AI Blog 2026-04-17 13:00 UTC Score 40.0 USR-0067-20260417-ai-specialis-508d19a5

Agents that remember: introducing Agent Memory

Cloudflare Agent Memory is a managed service that gives AI agents persistent memory, allowing them to recall what matters, forget what doesn't, and get smarter over time.

TWIML AI Podcast 2026-04-16 23:48 UTC Score 53.0 AI-148-20260416-podcasts-and-a9fd3267

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

In this episode, Rashmi Shetty, senior director of enterprise generative AI platform at Capital One, joins us to explore how the company is designing, deploying, and scaling multi-agent systems in a highly regulated environment. Rashmi walks us through Chat Concierge, a multi-agent chat experience for auto dealerships that handles intent disambiguation, tool invocation, and human handoffs to deliver safer, more personalized customer journeys. We discuss Capital One’s platform-centric approach to AI agents and how it separates design from runtime governance, embedding policies, guardrails, and cyber controls across agent threat boundaries. Rashmi shares how the team approaches the developer experience for agent builders, observability, and evals for stochastic, multi-agent workflows; and strategies for model specialization, including fine-tuning and distillation. We also cover standards and abstraction, closed-loop learning from production telemetry, and key lessons for enterprises building agentic systems. The complete show notes for this episode can be found at https://twimlai.com/go/765.

Google DeepMind YouTube 2026-04-13 09:30 UTC Score 34.0 AI-145-20260413-podcasts-and-8299c3c6

What’s new in Gemma 4?

Gemma 4 is our newest family of open models. You can now run advanced reasoning, native vision and audio, and agentic tool-use on anything from high-end workstations to mobile phones. Learn more → https://deepmind.google/models/gemma/gemma-4/

Practical AI Podcast 2026-04-09 09:00 UTC Score 39.0 AI-143-20260409-podcasts-and-ffa43d0a

Post-Mortem of Anthropic's Claude Code Leak

In this fully connected episode, Dan and Chris break down the Anthropic Claude Code leak, what went wrong and what it reveals about agentic systems, AI architecture, and AI safety. They also explore how the open source community is responding and why this moment could reshape how AI systems are built and secured. Featuring: Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Upcoming Events: Register for upcoming webinars here !

Practical AI Podcast 2026-04-02 09:00 UTC Score 30.0 AI-143-20260402-podcasts-and-2fae1038

Agentic Coding and the Economics of Open Source

AI is rapidly transforming how software is built, shifting economic incentives from open source code and collaboration toward on-demand, personalized development through agentic coding a.k.a. vibe coding. In this episode, Chris speaks with Miklós Koren of Central European University about how AI is reshaping open source and the software industry. They explore the economics of incentives, evolving collaboration patterns, and what this shift means for software development, the future of AI, and its broader impact on the technology sector. Featuring: Miklós Koren – LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Links: Vibe Coding Kills Open Source The Directions of Technical Change The Tailwind story Upcoming Events: Register for upcoming webinars here !

MongoDB AI Blog 2026-03-31 13:00 UTC Score 59.0 USR-0070-20260331-ai-specialis-1df660a1

Introducing MongoDB Agent Skills and Plugins for Coding Agents

Software engineering is evolving into agentic engineering. According to the Stack Overflow Developer Survey 2025, 84% of respondents use or plan to use AI tools in their development, up from 76% the previous year. At this rate, the tooling needs to keep pace. Last year, we introduced the MongoDB MCP Server to give agents the connectivity they need to interact with MongoDB, helping them generate context-aware code. But connectivity was only the start. Agents are generalists by design, and they don't inherently know the best practices and design patterns that real-world production systems demand. Today, we're addressing this by introducing official MongoDB Agent Skills: structured instructions, best practices, and resources that agents can discover and apply to generate more reliable code across the full development lifecycle, from schema design and performance optimization to implementing advanced capabilities like AI retrieval. To bring this directly into the tools you use, we're also launching plugins for Claude Code, Cursor, Gemini CLI, and VS Code, combining the MongoDB MCP Server and Agent Skills in a single, ready-to-use package. Turning coding agents into MongoDB experts Coding agents are great at producing working code, but they still make common mistakes in production systems, often defaulting to relational thinking that doesn't translate well to MongoDB, such as: Over-normalizing schemas, ignoring MongoDB's document-oriented strengths. Underusing compound indexes, c…

TWIML AI Podcast 2026-03-26 22:35 UTC Score 51.0 AI-148-20260326-podcasts-and-02c16b3f

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like. The complete show notes for this episode can be found at https://twimlai.com/go/764.

Amazon Science AI 2026-03-24 23:10 UTC Score 46.0 AI-058-20260324-official-ai--65d36d36

Personality-driven AI agents: Operationalizing OCEAN traits for human-AI collaboration in the coding domain

As AI agents become collaborative partners in complex tasks, understanding how agent personality affects human-AI interaction becomes critical. While recent work explores personality customization in language models, little is known about how personality affects AI coding agents. We conducted the first exploratory study investigating: if OCEAN (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) personality traits can be operationalized in AI coding agents, if users detect these personality differences, and how different personalities affect user trust and adoption. Participants completed refactoring tasks with three agent profiles. Results show that personality traits successfully translated into distinguishable behaviors reliably detected by users. While no universal 'best' personality emerged, individual preferences diverged substantially. Conscientiousness produced more consistent trust, while openness and extraversion polarized users. Some users experienced trust collapse from overconfidence and others found excessive caution inefficient. Our findings provide initial empirical evidence that OCEAN personality traits can be operationalized in AI coding agents, producing distinguishable behaviors, with implications for designing adaptive systems.

Practical AI Podcast 2026-03-17 14:29 UTC Score 36.0 AI-143-20260317-podcasts-and-85b9740f

Humility in the Age of Agentic Coding

What happens when an AI hater starts building with AI agents? In this episode, we talk with software engineer Steve Klabnik, known for his work on the Rust programming language, about his journey from criticizing AI to experimenting with it firsthand. We explore Steve’s programming language Rue, largely built with the help of AI tools like Claude, and discuss what this means for software engineering and the future of coding in an AI-driven world. Featuring: Steve Klabnik – LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: The Rust Programming Language Rust Rue Daniel's RSA Meeting link for March 23, 2026 Daniel's RSA Meeting link for March 24-25, 2026 Upcoming Events: Register for upcoming webinars here !

Machine Learning Street Talk 2026-03-13 21:00 UTC Score 71.0 AI-141-20260313-podcasts-and-c52bdba8

When AI Discovers the Next Transformer — Robert Lange

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss *Shinka Evolve* — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: systems like AlphaEvolve can optimize solutions to fixed problems, but real scientific progress requires co-evolving the problems themselves. GTC is coming, the premier AI conference, great opportunity to learn about AI. NVIDIA and partners will showcase breakthroughs in physical AI, AI factories, agentic AI, and inference, exploring the next wave of AI innovation for developers and researchers. Register for virtual GTC for free, using my link and win NVIDIA DGX Spark (https://nvda.ws/4qQ0LMg) In this episode: • Why AlphaEvolve gets stuck — it needs a human to hand it the right problem. Shinka tries to invent new problems automatically, drawing on ideas from POET, PowerPlay, and MAP-Elites quality-diversity search. • The *architecture* of Shinka: an archive of programs organized as islands, LLMs used as mutation operators, and a UCB bandit that adaptively selects between frontier models (GPT-5, Sonnet 4.5, Gemini) mid-run. The credit-assignment problem across models turns out to be genuinely hard. • Concrete results — state-of-the-art circle packing with dramatically fewer evaluations, second place in an AtCoder competitive programming challenge, evolved load-balancing loss functions for mixture-of-experts models, and agent scaffolds for AIME math benchmarks. • Are these systems act…

TWIML AI Podcast 2026-03-10 23:25 UTC Score 39.0 AI-148-20260310-podcasts-and-a23d20be

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI-assisted coding with end-to-end autonomy, arguing that “code is a commodity” and acceptance is the real metric—security, standards, tests, and maintainability included. We explore Blitzy’s hybrid graph-plus-vector approach, which grounds agents and combines semantic signals with keyword search to navigate large repositories efficiently. Sid breaks down context and agent engineering, how effective context windows have plateaued, and why dynamic agent personas, tool selection, and model-specific prompting matter at scale. He details their orchestration of large swarms of AI agents to collaboratively analyze codebases, plan tasks, and execute complex tasks in parallel. We also dig into why Agents.md and flat memories break down, storing feedback in the knowledge graph, and building real-world evals beyond leaderboards to choose the right model for each task. The complete show notes for this episode can be found at https://twimlai.com/go/763.

Machine Learning Street Talk 2026-03-03 14:50 UTC Score 62.0 AI-141-20260303-podcasts-and-aa1fcba5

The Dangerous Illusion of AI Coding? - Jeremy Howard

Dive into the realities of AI-assisted coding, the origins of modern fine-tuning, and the cognitive science behind machine learning with fast.ai founder Jeremy Howard. In this episode, we unpack why AI might be turning software engineering into a slot machine and how to maintain true technical intuition in the age of large language models. GTC is coming, the premier AI conference, great opportunity to learn about AI. NVIDIA and partners will showcase breakthroughs in physical AI, AI factories, agentic AI, and inference, exploring the next wave of AI innovation for developers and researchers. Register for virtual GTC for free, using my link and win NVIDIA DGX Spark (https://nvda.ws/4qQ0LMg) Jeremy Howard is a renowned data scientist, researcher, entrepreneur, and educator. As the co-founder of fast.ai, former President of Kaggle, and the creator of ULMFiT, Jeremy has spent decades democratizing deep learning. His pioneering work laid the foundation for modern transfer learning and the pre-training and fine-tuning paradigm that powers today's language models. Key Topics and Main Insights Discussed: - The Origins of ULMFiT and Fine-Tuning - The Vibe Coding Illusion and Software Engineering - Cognitive Science, Friction, and Learning - The Future of Developers RESCRIPT: https://app.rescript.info/public/share/BhX5zP3b0m63srLOQDKBTFTooSzEMh_ARwmDG_h_izk https://app.rescript.info/api/public/sessions/62d06c0336c567d6/pdf Jeremy Howard: https://x.com/jeremyphoward https://www.answer.…

METR 2026-03-03 08:00 UTC Score 38.0 USR-0147-20260303-research-aca-7bd4bcdb

Observations from two CLI game reimplementation runs with Opus 4.6

Summary: Opus 4.6 can, with a simple agent scaffold, create mostly-playable but somewhat broken CLI versions of Slay the Spire and Balatro 1 . Intro Last weekend I was trying to think of really difficult tasks we could give to AI agents to upper-bound their capabilities. I thought of two examples: Recreating a basic version of the video game Slay the Spire in the CLI Recreating a basic version of the video game Balatro in the CLI Both of these video games have a few properties that make it especially easy for AI systems to implement them: They already exist, so the AI doesn’t have to come up with new game ideas and do the enormous amount of work necessary to make it a fun game to play. Most player-relevant information is conveyed through text. They have well-defined rules and interactions between game mechanics. They are turn-based and don’t rely on reaction times or on-screen movement at all. They have well-documented wikis and appear on the internet a lot. Nevertheless, I expected that AI systems are currently far from being able to pull these tasks off. My best guess is that it would take an experienced software engineer a few months to do these tasks. To test my hypothesis, I created simple versions of these tasks where only the core game mechanics need to be present. Also, instead of creating a full video game with graphics and animations, I only requested that the game be playable in a terminal. This significantly lowers the difficulty of the task. I tasked Opus 4.6 wi…