Latest AI/ML News
8613 matching items
👀 The AI backlash is the only thing growing faster than AI revenues
Humans are angry
Advice on Dataset Choice for Two-Way Sign Language App in Flutter
I am developing a Flutter app called Talk to Deaf , which aims to enable real-time two-way communication between deaf and hearing users. The app will allow normal users to input text or voice and the deaf user will respond in sign language, while the app will convert those signs back into text or speech. I am unsure about which type of dataset to use for training my machine learning model: a dataset with individual alphabets (A-Z) or a dataset with complete words/phrases. I want to ensure accurate and smooth communication. Which type of dataset would be more suitable for building a robust real-time sign language interpreter, and what are the trade-offs of each approach? Any guidance on dataset selection or best practices for training a model for this type of two-way communication app would be highly appreciated.
AI Dev 26 x SF | Ara Khan: Evals Are Broken Use Them Anyway
This talk by Cline's Ara Khan explains why they went from "evals are useless" to using them as a core part of my agent improvement loop. I share practical heuristics for interpreting, running, and creating evals, and why doing them anyway is better than pure "vibes".
Did Google’s AI agents really build an operating system for $916?
The importance of independent evaluation
Semantic Search Starts With Embeddings
“Budget” and “financials” are different words, but embeddings understand they’re related. That’s the foundation behind semantic search and one of the core building blocks of modern multimodal systems. Learn how embeddings power retrieval across text, audio, images, and video in Building Multimodal Data Pipelines: https://hubs.la/Q04hJ9w10
Nuclear Ethics
This "Values & Interests" panel discussion, held in partnership with PBS and moderated by acclaimed journalist Ann Curry, is available to view in full.
How AI Changes the Role of Applied Scientists
Levi Boxell, Tilman Drerup, Alexandr Lenk The Economics Team at Instacart is an applied science team that operates at the intersection of machine learning engineering and economics. Similar to other applied science teams, our work involves a good chunk of engineering, steeped in statistics, math, theory, and strategy. And while that is still at the heart of what we do today, the surprisingly rapid emergence of artificial intelligence has also fundamentally altered our work in ways that we did not see coming. With this post, we want to provide a brief check-in and share an analysis of the patterns we are seeing from a distinctly economic perspective. To do so, we analyze the empirical dynamics of our project portfolio between 2023 and today, looking at the evolution of both the nature and quantity of our work over time. To start, let’s have a quick refresher of what economists at Instacart do and provide a theoretical framework to think about the impact of technological change through AI. Background & Theoretical Framework At Instacart, economists spend their day-to-day on a diverse portfolio of tasks and activities. Similar to other applied science teams within the company, our work relies on a blend of skills, including economics, statistics, math, machine learning, data manipulation, coding, and AI. Due to this versatility in tasks, the team’s work provides a particularly rich testing ground for predictions derived from economic theories concerning the impact of technologi…
AI Dev 26 x SF | Andi Partovi: Why Every Agent Needs a Simulation Sandbox
AI agents fail in unpredictable ways that traditional testing can't catch — hallucinations, wrong tool calls, policy violations, and more. Teams only discover these failures after users hit them in production. A simulation sandbox gives you a controlled environment with realistic users, tools, and workflows where you can run hundreds of scenarios against your agent before it ships, catching edge cases and adversarial inputs that would be impossible to test manually. This talk by Veris AI's Andi Partovi covers why simulation-driven development is becoming essential infrastructure for any team building production AI agents, and how it closes the gap between "works in demos" and "works at scale."
AI Dev 26 x SF | João Moura: Building Recurring, Governed, and Embedded Enterprise Workflows
Modern enterprises don't struggle to experiment with AI — they struggle to operationalize it reliably. In this talk, CrewAI's CEO outlines how leading organizations are moving beyond one-off automations to build recurring, governed, and deeply embedded workflows that drive real business outcomes. Drawing on lessons from production deployments, João explores how to design systems that are auditable, scalable, and aligned with enterprise controls — without sacrificing speed.
AI Dev 26 x SF | Luke Kim: The Agent Data Stack—Why Every AI Agent Needs Its Own Data Stack
From centralized to distributed: In the old world, organizations relied on one centralized data and AI platform. In the new world of AI agents, every agent needs its own sandboxed, secure, and modern data stack. In this 20-minute talk with live demo by Spice AI's Luke Kim, he explores why this architectural shift is critical and the key patterns required to give agents reliable, real-time data.
AI Dev 26 x SF | Manos Koukoumidis & Stefan Webb: VibeML: Build your AI model in hours, not months
The next major shift in enterprise AI is underway; enterprises are moving from generic AI they rent to specialized AI they own. The benefits are clear: higher quality, dramatically lower costs, full control, and a quality improvement flywheel while in production. But building specialized AI models has been prohibitively hard; each use case requires months of effort and deep AI expertise. Well, it used to. VibeML is enabling engineers to build specialized AI models automatically from a prompt, in minutes. An AI agent builds your AI model end-to-end; evaluation, data synthesis, training and repeat. This talk by OUMI's Manos Koukoumidis & Stefan Webb demonstrates how VibeML can give deep AI experts superpowers while enabling non-experts as well.
AI Dev 26 x SF | Daniel Beutel: Flower SuperGrid Agents
At AI Dev 26 x San Francisco, Flower Lab's Daniel Beutel talked about Flower SuperGrid, the industry standard for Federated AI. With SuperGrid Agents, you can now build and run context-rich agents that learn from interactions, access sensitive data and (soon) collaborate with other SuperGrid Agents.
AI Dev 26 x SF | Or Dagan: Optimizing Accuracy, Cost, and Latency in Real-World Agents
Most agentic systems rely on hardcoded heuristics to navigate execution decisions (e.g. which models, tools, and test-time compute scaling approaches to use) leading to efficiency leakage across cost, latency and accuracy. AI21 Maestro optimizes agents by learning to predict success, cost and latency probabilities across diverse actions and contexts, and driving runtime orchestration that intelligently navigates the full agentic action space. In this session, AI21's Or Dagan demonstrated how this approach yields state-of-the-art results and Pareto frontier on challenging agentic benchmarks, as well as the process required to optimize production agents.
AI Dev 26 x SF | Andrew Filev: Multi Model Pipelines—How to Get Better AI Results for Less
In this talk by Zencoder's Andrew Filev, attendees learned how decomposing tasks into pipelines and dynamically routing them across models improves quality, reduces cost, and makes AI systems more reliable.
AI Dev 26 x SF | Diamond Bishop: The Next 100 Agents. Building the Agent Native Office
Building your first agent is exciting. Building a platform that can evolve into an office where dozens of teams can safely deploy their own agents is a different beast entirely. In this talk, Diamond Bishop from Datadog shared lessons learned building production agents, then turning this into an agent office/platform made to power the next-gen enterprise with diverse agent workloads.
Latin American journalists invited to apply for 2026 JournalismAI Skills Lab
"The 2026 JournalismAI Skills Lab is a 14-week, free, virtual program designed for professionals to learn how to practically implement LLMs, GenAI and agents in their work. The programme helps individuals upskill in using AI technologies in a hands-on manner. It equips participants to develop their own AI-based tools, prototypes or proofs-of-concept. The ultimate outcome […] The post Latin American journalists invited to apply for 2026 JournalismAI Skills Lab appeared first on LatAm Journalism Review by the Knight Center .
Latin American journalists invited to apply for 2026 JournalismAI Skills Lab
"The 2026 JournalismAI Skills Lab is a 14-week, free, virtual program designed for professionals to learn how to practically implement LLMs, GenAI and agents in their work. The programme helps individuals upskill in using AI technologies in a hands-on manner. It equips participants to develop their own AI-based tools, prototypes or proofs-of-concept. The ultimate outcome […] The post Latin American journalists invited to apply for 2026 JournalismAI Skills Lab appeared first on LatAm Journalism Review by the Knight Center .
AI Dev 26 x SF | Paul Everitt: The Shift to Agentic Engineering
More code, fewer staff — the industry is on a bender. But what about quality? At AI Dev 26 x San Francisco, Paul Everitt from JetBrains discussed the rise of agentic engineering and how old lessons can be adapted to build new professional practices.
AI’s Public Relations Emergency
A generation is being told AI is their enemy. And they’re starting to believe it.
Remote agents in Vibe. Powered by Mistral Medium 3.5.
Introducing Mistral Medium 3.5, remote coding agents in Vibe, plus new Work mode in Le Chat for complex tasks.
Connect the dots: Build with built-in and custom MCPs in Studio
Connect enterprise data to your AI applications with reusable connectors, direct tool calling, and human-in-the-loop approval controls.
Believe It Or Not, The Government Is Adopting AI to Make Your Life Easier
The public sector moves slowly by design. That might actually help it get AI right.
Moving Fast Doesn’t Have to Break Things: The U.S. Must Stop Compromising Critical Infrastructure with Patchwork AI Security Approaches
PETs offer U.S. critical-infrastructure AI a path beyond patchwork security. Why Attribution-Based Control should be the standard. The post Moving Fast Doesn’t Have to Break Things: The U.S. Must Stop Compromising Critical Infrastructure with Patchwork AI Security Approaches appeared first on OpenMined .
Rationale for StandardScaler over MinMaxScaler in spatiotemporal tree-based ensemble models with SHAP interpretability
I am developing a spatiotemporal tree-based ensemble framework (utilizing LightGBM, XGBoost, and CatBoost) to forecast dengue outbreaks based on climate variables (temperature, precipitation, humidity) and lagged historical case counts. While tree-based algorithms are theoretically invariant to monotonic feature scaling, I am implementing scaling primarily because: I am calculating SHAP (Shapley Additive Explanations) values for post-hoc model interpretability and global feature importance. I am applying forward aggregation across temporal slices to prevent data leakage, meaning the range and variance of features dynamically shift across training validation windows. I am debating between StandardScaler (Z-score normalization) and MinMaxScaler (0-1 normalization). Given the spatiotemporal and epidemiological nature of the data, StandardScaler appears to behave more robustly, but I want to ensure my architectural justification is sound. Here is a minimal visualization of how the choice impacts extreme climate outliers (e.g., a massive monsoon rainfall anomaly): import numpy as np import pandas as pd from sklearn.preprocessing import MinMaxScaler, StandardScaler # Simulating a climate feature with a severe anomaly (monsoon spike) np.random.seed(42) weekly_rainfall = np.random.normal(loc=150, scale=30, size=100) weekly_rainfall = np.append(weekly_rainfall, [650]) # Extreme outlier event df = pd.DataFrame({"Rainfall": weekly_rainfall}) # Applying both scalers df["MinMax"] = MinMa…
DeepSeek’s New AI Is A Game Changer
❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here: https://github.com/ailuntx/Thinking-with-Visual-Primitives https://huggingface.co/datasets/NodeLinker/deepseek-ai-Thinking-with-Visual-Primitives-deleted-repo/blob/main/Thinking_with_Visual_Primitives.pdf Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ Thumbnail design: https://felicia.hu #deepseek
Dependency prefixes are a supply chain risk: let's fix them
Dependency prefixes like ^ and ~ make updates easy, but the version ranges they create widen the path a compromised package can take into production.
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
Streaming vision-language models (VLMs) continuously generate responses given an instruction prompt and an online stream of input frames. This is a core mechanism for real-time visual assistants. Existing VLM frameworks predominantly assess models in offline settings. In contrast, the performance of a streaming VLM depends on additional metrics beyond pure video understanding, including proactiveness, which reflects the timeliness of the model’s responses, and consistency, which captures the robustness of its responses over time. To address this limitation, we propose VSAS-Bench, a new…
Relational Foundation Models for Enterprise Data with Jure Leskovec - #768
In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep learning. We begin with AI Virtual Cell, a multiscale effort to learn data-driven representations from proteins to cells to patients using single-cell RNA-seq data, protein language models like ESM, and structure models like AlphaFold—without hand-encoding biology. Jure then dives into relational deep learning, reframing enterprise databases as graphs and training neural networks directly on raw multi-table data. He explains Kumo’s Relational Foundation Model (RFM2), which performs in-context learning over subgraphs to make accurate predictions on new databases and tasks with no training, and how this approach benchmarks against RelBench and other multi-table datasets. We also discuss real-world deployments at companies like Reddit, DoorDash, and Coinbase, explainability via attention over tables and columns, integration with agentic systems, deployment options, and practical limitations. The complete show notes for this episode can be found at https://twimlai.com/go/768.
MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models
MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks. The post MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models appeared first on Microsoft Research .
Access Now urges the Ninth Circuit to protect encryption from NSO’s spyware
Yesterday, Access Now and ten other civil society organizations filed an amicus brief in the U.S.’ Ninth Circuit Court of Appeals calling to protect encryption from NSO Group’s Pegasus spyware and to keep the lower court’s permanent injunction forbidding NSO from ever targeting WhatsApp or its customers’ devices ever again. The post Access Now urges the Ninth Circuit to protect encryption from NSO’s spyware appeared first on Access Now .
Director of Operations - Expressions of Interest | GovAI Blog
Submission deadline: rolling
Vega: Zero-knowledge proofs for digital identity in the age of AI
Vega turns a full credential into a single proof, sharing only what is needed and nothing more, with performance that works in real apps. The post Vega: Zero-knowledge proofs for digital identity in the age of AI appeared first on Microsoft Research .
Do AI Risks Require Extraordinary Government Intervention?
Let’s not skip the hard work of AI governance
Google's take on openclaw
it's Anthropic's time for the mandate of heaven
Xi's summits with Trump and Putin + China's economy loses momentum + Hong Kong dissidents
Xi's summits with Trump and Putin + China's economy loses momentum + Hong Kong dissidents c.groth Thu, 05/21/2026 - 11:57 picture alliance / Photoshot Download (pdf - 973.09 KB) MERICS Briefs MERICS China Essentials May 21, 2026 11 min read Xi's summits with Trump and Putin + China's economy loses momentum + Hong Kong dissidents Top Story Xi’s summits with Trump and Putin project Beijing as a hub of global diplomacy By hosting US President Donald Trump and Russian President Vladimir Putin in back-to-back summits in Beijing, Xi Jinping was able to project China’s unprecedented global influence and advance its preferred worldview: building what he calls “constructive strategic stability” with the US, while enlisting Russia to push for a multipolar world order through the doctrine of “a new type of international relations.” Xi was helped by his guests appearing keen to impress. After Xi calling Taiwan “the most important issue in China-US relations,” Trump said he was “not looking to have somebody go independent” – and more generally seemed willing to finally treat China as a peer major power. After Xi implicitly criticized the US by noting that “unilateral hegemonic currents are running rampant,” Putin said China-Russia relations had reached “unprecedentedly high levels” and were “key stabilizing factors on the international stage.” Xi treated Trump with generous courtesy, managing to project confidence rather than deference. Compared with Trump’s 2017 visit to the imposing Fo…
Hermes Agent: Agents that grow with you
Open Source AI is entering a new era, one shaped by self-improving AI Agents, recursive learning systems, and rapidly evolving AI Tools that blur the line between software and autonomous collaborators. In this episode, Daniel and Chris sit down with Nous Research co-founder and CTO Jeffrey Quesnelle to explore Hermes Agent. Along the way, they discuss models vs. harnesses, the changing role of developers, and one of the biggest questions facing the AI Future: what remains uniquely human as AI capabilities continue to accelerate? Featuring: Jeffrey Quesnelle – Website , LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: Nous Research Hermes Agent Sponsors: Framer: The enterprise-grade website builder that lets your team ship faster. Get 30% off at framer.com/practicalai Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalai Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026
Building accessibility tools on a truly open foundation
PointCheck, an independent project, uses Molmo, MolmoWeb, and Olmo 3 to test web accessibility the way a keyboard user would—by navigating real pages and inspecting what's actually on screen.
Build a Coding Assistant with Weaviate MCP: RAG over Code & Docs
Use Weaviate's built-in MCP server to give Claude Code, Cursor, and VS Code hybrid search over your codebase and docs. No glue code.
How Sunny Health Built an AI Healthcare Concierge with Qdrant
Most people don’t read their insurance pamphlet. The benefits are there: deductibles, copays, in-network providers, what dental covers, what dermatology covers, when an optometry visit is included in the medical plan. But the document is dense, the website is worse, and the result is that patients pay for plans they barely understand and delay care because finding an in-network provider with availability takes more energy than they have. Sunny Health is building a healthcare concierge that insurance companies and care providers offer to their members as part of the existing plan experience. When a member signs in (typically through SSO from their payer), Sunny Health already knows who they are and what their plan covers. They land in a chat experience where they can ask “show me dermatologists nearby,” get matched to in-network options, and have Sunny Health book the appointment on their behalf. Three things on one retrieval layer: benefits navigation, provider matching, and appointment booking.
Modal's Series C: Raising $355M at a $4.65B valuation
We've raised $355M at a $4.65B valuation to continue building the production cloud for AI.
AI Weekly Issue #494: SpaceX wants $80 billion. OpenAI wants a trillion.
For nine years the AI boom has been a private bet, priced by a small circle of venture funds and sovereign wealth in rounds most people could never touch. This week it started going public. SpaceX filed an $80 billion IPO prospectus on Wednesday, the largest in history, with a chatbot company and $6.4 billion in AI losses folded inside it. OpenAI is days from filing its own, aiming for a trillion-dollar debut by September. The public markets are about to answer the question private investors kept waving away: at what price?
Enterprise AI Security with ClearML: A Complete Series Summary
By Adam Wolf & Damian Erangey Over a seven-part series of posts and videos, ClearML’s Enterprise AI Security series covered every layer of securing an AI platform in production, from who gets in to what gets recorded. This post brings it all together in one place: what each layer does, why it matters, and how […]
Deep Learning Indaba Impact Report 2025
Our mission to Strengthen African AI, for Africans, by Africans remains as necessary and as valued as ever. This impact report sets out how the Deep Learning Indaba continues to deliver on that mission, and the change we are enabling across Africa’s AI ecosystem. As always, we are deeply grateful to our funders, partners, and […] The post Deep Learning Indaba Impact Report 2025 appeared first on Deep Learning Indaba .
Magnificent Humanity – The Pope’s First Encyclical Concerns AI
Everything you need to know about the upcoming encyclical on AI.
What Held Up at 3 AM: One Engineer’s RAG Case Study
Most AI demos work. Most AI products don’t. This series is a collection of interviews with engineers who shipped AI agents to production, covering the stacks they chose, the architectures they regretted, and what actually held up at 3 am. This is an interview with Michael Maximilien, former CTO and Distinguished Engineer at IBM and […] The post What Held Up at 3 AM: One Engineer’s RAG Case Study appeared first on Comet .
Partnership on AI Announce New Series of Short Films on AI and Society
The post Partnership on AI Announce New Series of Short Films on AI and Society appeared first on Partnership on AI .