AI/ML News & Innovations Hub

WIRED AI 2026-06-29 21:49 UTC Score 52.0 AI-015-20260629-global-ai-ne-3c610c75

Meta Contractors Posed as Teens to Prompt Rival Chatbots About Suicide, Sex, and Drugs

Hundreds of contractors working on a project for Meta pretended to be kids in order to see how other chatbots like Gemini and ChatGPT would respond to high-risk subjects, WIRED found.

Read article →

Roboflow Blog 2026-06-29 20:44 UTC Score 56.0 USR-0088-20260629-ai-specialis-3c176e4e

Tarmac Safety AI

Build an automated airport FOD detection workflow using RF-DETR, Roboflow Workflows, and Gemini 2.5 Pro.

Read article →

Techcrunch 2026-06-29 20:12 UTC Score 43.0 USR-0001-20260629-global-ai-ne-6e26addc

Gemini’s personalized AI image generation is now free for US users

Google is expanding Gemini’s personalized AI image generation to eligible free users in the U.S., allowing the chatbot to create images based on your interests and data from connected Google apps.

Read article →

Roboflow Blog 2026-06-29 17:39 UTC Score 52.0 USR-0088-20260629-ai-specialis-0b522ea3

Injection Molding Defect Detection for Medical Components

Use RF-DETR and Gemini 2.5 Pro to detect defects on molded components and generate automated inspection observations.

Read article →

ZDNET AI 2026-06-29 16:38 UTC Score 55.0 AI-022-20260629-global-ai-ne-10ddd54f

I changed these Android Auto settings to limit what Gemini learns about me - here's why

Google's AI offers a lot of convenience in your car, but you're offering up a lot of sensitive information. Here's how to put a lock on it.

Read article →

OpenAI Community 2026-06-29 13:51 UTC Score 63.0 AI-116-20260629-social-media-d0056176

Can local preprocessing cut LLM API costs?

A few days ago I shared a project I’ve been working on called “LatentGate” — a local-first pipeline that reduces LLM API token usage by processing inputs before sending them to the model. After some great feedback, I’ve now turned it into: A pip-installable Python package A VS Code extension (runs as a local proxy) MCP server support for tools like Claude Code, Cursor, Cline, Continue PyPI → pip install latent-gate VS Code → LatentGate — Local-First AI Compression What it does Images (~1000–1300 tokens) → compressed to ~150 tokens using local vision models (Ollama + LLaVA) Long prompts / conversations → compressed locally before hitting cloud APIs Works with OpenAI / Claude / Gemini APIs Fully local preprocessing (no data leaves your machine before compression) The idea is inspired by VL-JEPA — predicting in embedding space, then decoding selectively. Why I built this While experimenting with GPT-4o / vision APIs, I noticed most costs come from raw input size (especially images and long prompts). So instead of optimizing prompts endlessly, I tried: → “What if we reduce what we send in the first place?” What I’m looking for I’d love feedback from this community, especially: Edge cases where compression breaks context Cases where output quality drops noticeably Prompt / API compatibility issues (OpenAI especially) Performance bottlenecks Better approaches to selective decoding or compression If you try it and something fails — that’s honestly the most valuable thing for me rig…

Read article →

Medianama AI 2026-06-29 11:46 UTC Score 43.0 USR-0211-20260629-regional-new-ff8c5bc7

Explained: Why Google moved Gemini to token-based limits

Google’s Gemini limits show free AI access giving way to compute-metered tiers, as capacity shortages squeeze enterprises, reshape consumer plans, and raise questions over who controls AI infrastructure. The post Explained: Why Google moved Gemini to token-based limits appeared first on MEDIANAMA .

Read article →

Heise AI 2026-06-29 10:41 UTC Score 55.0 USR-0217-20260629-regional-new-4aff7fbb

KI-Engpass: Google kann Metas Nachfrage nach Gemini nicht mehr decken

Die Nachfrage nach KI-Rechenleistung übersteigt selbst bei den größten Tech-Konzernen das Angebot. Meta ist wohl besonders betroffen und muss intern umsteuern.

Read article →

Synced 2026-06-29 03:05 UTC Score 45.0 AI-041-20260629-ai-specialis-71f860d6

Comment on Microsoft’s Fully Pipelined Distributed Transformer Processes 16x Sequence Length with Extreme Hardware Efficiency by logo color game

Interesting article about distributed transformers! The efficiency improvements in processing long sequences could have big implications for AI research.

Read article →

Gulf News AI 2026-06-28 13:00 UTC Score 41.0 AI-172-20260628-regional-ai--49e06e4d

New Google update: What UAE Gmail, Android and Gemini users must know

Read article →

Synced 2026-06-28 10:27 UTC Score 44.0 AI-041-20260628-ai-specialis-f4028b2f

Comment on NVIDIA’s Global Context ViT Achieves SOTA Performance on CV Tasks Without Expensive Computation by gemini music

The discussion about nvidia’s global context vit achieves sota performance on cv tasks without expensive computation raises some really valid points. This perspective is refreshing. gemini music

Read article →

Heise AI 2026-06-26 15:08 UTC Score 37.0 USR-0217-20260626-regional-new-2dc04798

Xcode 26.6: Google Gemini zieht als Programmierassistent in Apples IDE ein

Mit Xcode 26.6 bringt Apple Google Gemini direkt in die Entwicklungsumgebung. Die Integration war bereits in Xcode 27 beta aufgetaucht, das noch weiter geht.

Read article →

South China Morning Post AI 2026-06-25 21:30 UTC Score 28.0 AI-156-20260625-regional-ai--4df51f9d

Why Hong Kong’s bilingualism is uniquely indispensable in the AI era

Last week, while preparing a lecture on the visual culture of the Global South, I caught Google’s Gemini in a double hallucination. Cross-referencing a historical event between English and Chinese data sets, I found the English AI to be authoritative but inventing citations. In Chinese, the fabrications vanished but so did global context, replaced by an insular perspective. Disturbingly, the system cloaked Chinese content in English citations, creating a deceptive authenticity that made the...

Read article →

KDnuggets 2026-06-25 16:00 UTC Score 25.0 AI-033-20260625-ai-specialis-18be4e6f

Using Gemini to Create Google Sheets

In this tutorial, we will show you how to use Gemini to create Google Sheets, build a useful table, generate formulas, analyze data, and improve the spreadsheet with follow-up prompts.

Read article →

AI Weekly 2026-06-25 00:00 UTC Score 37.0 AI-133-20260625-newsletters-c9caf65e

AI Weekly Issue #507: Anthropic Says Alibaba Stole 29 Million Conversations With Claude

Anthropic accused Alibaba of running 25,000 fake accounts to pull nearly 29 million conversations out of Claude — then took the evidence to the White House. That was just the opening shot in a week the labs spent at war with everyone, including each other: poaching Google's top Gemini minds, watching their own developer tools get pried open by anonymous strangers, and staring down Europe's August disclosure deadline. The twist? The only companies cleanly printing money this week sell memory and silicon — not models.

Read article →

Towards Data Science 2026-06-23 16:30 UTC Score 25.0 AI-036-20260623-ai-specialis-45a90777

I Spent an Hour on a Data Preprocessing Task Before Asking Gemini

How Gemini solved my Pandas problem in seconds, and why data science fundamentals still matter to spot suboptimal solutions The post I Spent an Hour on a Data Preprocessing Task Before Asking Gemini appeared first on Towards Data Science .

Read article →

Roboflow Blog 2026-06-22 17:20 UTC Score 38.0 USR-0088-20260622-ai-specialis-c2d52762

Building an AI-Powered Robotic Welding Defect Detection System

Use RF-DETR and Gemini 2.5 Pro to identify welding defects and generate automated quality inspection reports.

Read article →

InfoWorld AI 2026-06-22 09:00 UTC Score 52.0 USR-0126-20260622-global-ai-ne-d1933bc8

Why open infrastructure will define the AI era

A new form of vendor lock-in is here. And it’s not proprietary languages or rigid enterprise software suites — it’s something more fundamental. It’s the very thing that writes the code. JetBrains Research found that 74% of developers worldwide use AI tools. Claude Code , available only since May 2025, is now the most popular AI coding tool, followed by Gemini Code Assist and GitHub Copilot , according to Jellyfish’s 2026 State of Engineering Management Report . The latter study also found that 91% of developers say their productivity has increased in the past 12 months. As coding output expectations are rewritten daily , the engineering world is becoming heavily reliant on paid external AI services. Gartner predicts that by 2028 spending on AI coding tokens could exceed developer salaries. Yet, tokenmaxxing while vibe coding through a vendor’s cloud-based API feels like a far cry from the open foundations of free programming languages and open models, which many of today’s AI platforms now abstract. “Open infrastructure will be the backbone of the AI era,” says Peter Farkas , CEO of Percona , a provider of open-source database solutions. “Right now, too many companies are building their entire AI strategy on top of proprietary platforms because the convenience is seductive.” “It’s ‘three clicks’ to stand up a database or an AI service in a hyperscaler, and that convenience blinds people to the lock-in they’re signing up for,” he adds. “As AI workloads mature, organizations w…

Read article →

Roboflow Blog 2026-06-17 17:10 UTC Score 30.0 USR-0088-20260617-ai-specialis-162f9995

Surface Defect Detection on Machined Metal Medical Parts

Use RF-DETR and Gemini 2.5 Pro to detect surface defects on machined medical components and generate automated inspection observations.

Read article →

Roboflow Blog 2026-06-16 19:13 UTC Score 30.0 USR-0088-20260616-ai-specialis-cb93260d

IV Bag Fill-Level and Leak Detection

Build an automated IV bag fill-level and leak detection system using RF-DETR and Gemini 2.5 Pro.

Read article →

AI Alignment Forum 2026-06-16 00:04 UTC Score 53.0 USR-0151-20260616-community-fo-11f053f4

Synthetic document finetuning for instilling positive traits

This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The fourth post can be found here . TLDR: Via adapting the methods of Marks et al and Li et al , we train Gemini 3 Flash to have certain traits/values by midtraining it on documents about how Gemini has those properties, followed by finetuning it on synthetic chat data where it demonstrates those properties. The chat finetuning is effective for instilling the traits robustly, working OOD. We share some takeaways on how to improve midtraining & SFT effectiveness. Introduction This work closely follows Li et al (model spec midtraining, or MSM), who show that by training a model on synthetic documents before chat finetuning starts, they can shape how the model generalizes. Teaching the model reasons behind specific behaviours, rather than just the behaviours themselves, can also improve generalization. Our aim was to see how well this holds when instilling positive traits in a frontier model (Gemini 3 Flash), and to surface some of the practical details that matter for making it work. Our motivation is deep alignment : we want to train principles into the model which guide behaviour even in highly OOD behaviours. Our MVP pipeline used a "traits document" (a short bullet-pointed list of positive traits we wanted the model to exhibit) as our universe context, with a checkpoint of Gemini 3 Flash post-trained only on the F…

Read article →

AI Alignment Forum 2026-06-14 19:45 UTC Score 67.0 USR-0151-20260614-community-fo-49ef5cfc

Why Do Naive SFT Filters For Safety Properties Fail?

This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here . Since SFT is the cause for many safety relevant properties , a natural strategy is to filter out rollouts from SFT that have undesirable properties. However, as we show in this section (and in forthcoming MATS work), SFT data filtering frequently works surprisingly poorly. In this post, we investigate hypotheses for why SFT filtering fails. TL;DR: We discuss seven hypotheses for why SFT filtering works surprisingly poorly We analyze three hereditary traits that SFT-only Gemini has that other models do not: negative emotion, date confusion, and blackmail in the (highly contrived) agentic misalignment scenario We use a “post-training diffing pipeline” between Gemini and Olmo to show that the cause of date confusion and blackmail is largely surprising transfer of behaviors from the SFT teacher model. Notably, there exist small sets of prompts where switching the teacher model for the rollout removes date confusion and blackmail, but dropping the prompts does not. Negative emotion is less affected by the teacher model, but this may be because the Olmo prompt distribution we are SFTing on underspecifies the behavior. Takeaways: It’s hard to remove behaviors via filtering But if you can get a teacher model to have a behavior (e.g. via RL), then transferring that in the future is easier…

Read article →

AI Alignment Forum 2026-06-13 15:31 UTC Score 70.0 USR-0151-20260613-community-fo-4b2c7ccf

SFT Drives Gemini’s Safety Properties

This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found here . In this short post, we describe a surprising finding: most safety relevant properties in Gemini seem to be caused by the combination of pretraining and SFT, not other training stages like RL. We do not want to overstate this claim as applying to other model families, and we also note that this may change in future Gemini versions. Nevertheless, this result was counter to our initial expectations and will inform future safety work on our team, and so we felt that it was important to share with the broader safety community. Experiment We perform SFT using the Gemini mixture on the pre-training only versions of Gemini 3.1 Pro and Gemini 3 Flash. We then compare these Post-SFT models to the production versions of Gemini 3.1 Pro and Gemini 3 Flash on different safety relevant benchmarks: Error bars are 95% confidence intervals on the evals. The main result is that the blue bars (SFT-only models) and orange bars (production models) are remarkably similar across evals . An important implication is that for Gemini, SFT is a high leverage place to intervene for model safety and behavior, and we plan to try to intervene here in the future. Brief Descriptions of Each Set of Benchmarks: ODCV refers to the benchmark in https://arxiv.org/abs/2512.20798 Alignment evals refer to a version of Petr…

Read article →

Ars Technica AI 2026-06-12 16:34 UTC Score 30.0 AI-023-20260612-global-ai-ne-16270b9a

Google sues Chinese cybercrime network that used Gemini to automate scams

The fraudsters allegedly targeted hundreds of thousands of people with Gemini-coded scams sites.

Read article →

Analytics Vidhya 2026-06-12 07:30 UTC Score 35.0 AI-034-20260612-ai-specialis-c15b6022

Gemini Omni: AI Video Generation Inside Gemini

Gemini models have always kept up with AI advancements. From text-based chatbots in 2023, Gemini has evolved into a multimodal system capable of understanding and generating text, audio, images… and now videos. AI video generation is no longer a standalone tool. With Gemini Omni, video creation becomes mainstream. Gemini Omni isn’t important because it generates […] The post Gemini Omni: AI Video Generation Inside Gemini appeared first on Analytics Vidhya .

Read article →

AI Weekly 2026-06-01 00:00 UTC Score 25.0 AI-133-20260601-newsletters-8ebf1e8f

AI Weekly Issue #498: Anthropic files for an IPO. NVIDIA ships its stack.

Anthropic confidentially filed a draft S-1 with the SEC today for a proposed public offering. The company also shipped Claude Opus 4.8 last week with a 4x code-reliability gain. NVIDIA used GTC Taipei to open Cosmos 3, ramp Vera Rubin into production, and put a 1-petaflop AI box on developer laptops. Google retires Gemini 2.0 Flash today. California's SB 867 — banning AI companion chatbots in children's toys — cleared the Senate; Illinois's data-center regulation stalled in committee. The labs sprint. The states crawl.

Read article →

JetBrains AI Blog 2026-05-29 13:46 UTC Score 33.0 USR-0065-20260529-ai-specialis-58dba76c

How We Use AlphaEvolve to Make Complex IDE Algorithms Faster

AlphaEvolve is a Google DeepMind algorithm-discovery system that uses Gemini to generate, test, and refine possible algorithm improvements. Its job is not to answer questions; it searches for faster ways to solve complex algorithmic problems. We tried it on a narrow but important part of IntelliJ-based IDEs: indexing, the background work that makes navigation, search, […]

Read article →

Ars Technica AI 2026-05-28 18:30 UTC Score 30.0 AI-023-20260528-global-ai-ne-30bdfc94

Apple working to cram massive Gemini model into iPhone to power new Siri

As Apple tries to shrink Gemini for the iPhone, a cloud component is probably inevitable.

Read article →

Last Week in AI 2026-05-27 07:50 UTC Score 36.0 USR-0103-20260527-ai-specialis-a64e304d

Last Week in AI #341 - Musk loses to OpenAI, Google's IO updates, OpenAI solves Erdős

Elon Musk Loses $150 Billion Suit Against OpenAI and Sam Altman, Google updates its Gemini app to take on ChatGPT and Claude at IO 2026, and more!

Read article →

Google DeepMind YouTube 2026-05-26 16:17 UTC Score 13.0 AI-145-20260526-podcasts-and-44bc6804

Gemini for Science is here. 🧬

Read article →

Interconnects 2026-05-26 15:39 UTC Score 27.0 USR-0104-20260526-ai-specialis-11e918bd

Some ideas for what comes next, May 2026

Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.

Read article →

Last Week in AI 2026-05-26 05:10 UTC Score 49.0 USR-0103-20260526-ai-specialis-e8227e1e

LWiAI Podcast #246 - Gemini 3.5 + Omni, Musk Loses, OpenAI vs Erdős

Google unveils AI model Gemini 3.5 and AI agent Gemini Spark, Omni turns images, audio, and text into video, Musk loses OpenAI court battle

Read article →

Two Minute Papers 2026-05-25 17:49 UTC Score 39.0 AI-139-20260525-podcasts-and-06d4fba0

Demis Hassabis On What AI Will Do Next

Thank you to Google DeepMind for the invite. 🙏 ❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ Thumbnail design: https://felicia.hu 00:00 Intro 00:40 Gemini Health Scans and Gemma 4 01:30 AI as a Brainstorming Partner 02:30 Second Order Nobel 03:15 DeepMind Co-Scientist 05:00 Curing All Diseases 06:30 Exponential Growth in Drug Discovery 07:45 Regulatory Bottlenecks 09:45 Accelerating Clinical Trials 11:15 EVE Online Partnership 13:15 The Einstein Test 15:30 Recursive Self-Improvement 18:15 Lightning Round 19:30 The Badge of Honor 20:10 Behind the Scenes

Read article →

Google DeepMind YouTube 2026-05-20 20:10 UTC Score 13.0 AI-145-20260520-podcasts-and-3b1550b7

Gemini 3.5 Flash has landed.

Read article →

Google DeepMind YouTube 2026-05-20 00:21 UTC Score 15.0 AI-145-20260520-podcasts-and-18efefeb

Build your next story with Gemini Omni.

Read article →

Google DeepMind YouTube 2026-05-19 17:51 UTC Score 46.0 AI-145-20260519-podcasts-and-ecb209e4

Generating novel scientific hypotheses with Co-Scientist

In an era of information overload, the search for transformative scientific ideas has become a significant bottleneck for progress. Every great scientific breakthrough begins with a single, transformative idea. The spark of discovery relies on a researcher's ability to connect disparate facts and formulate the right hypothesis to test. We believe AI can help dramatically accelerate the pace of breakthroughs by serving as a dedicated partner in the generation and refinement of breakthrough scientific hypotheses. That’s why we’ve developed Co-Scientist, a Gemini-based multi-agent AI system that iteratively generates, debates, and evolves novel hypotheses for complex scientific problems. Read the Nature paper: https://www.nature.com/articles/s41586-026-10644-y and learn more at labs.google/science #googleio #ai #science ____ Subscribe to our channel https://www.youtube.com/@googledeepmind Find us on X https://x.com/GoogleDeepMind Follow us on Instagram https://instagram.com/googledeepmind Add us on Linkedin https://www.linkedin.com/company/deepmind/

Read article →

Google DeepMind YouTube 2026-05-19 17:51 UTC Score 31.0 AI-145-20260519-podcasts-and-e3fa51b6

Using AI to outsmart drug-resistant bacteria

Globally recognized as a silent pandemic, antimicrobial resistance continues to rise as bacteria outpace the development of new antibiotics. When patients stop responding to standard treatments, routine infections can quickly become life-threatening. At the University of Cambridge, Ben Luisi and his team are combining structural biology with advanced AI tools like AlphaFold, Gemini, and Co-Scientist to decode these hidden defense mechanisms. By compressing a process that once took years into just minutes, they are uncovering the critical insights needed to outsmart bacterial evolution. Learn more about science at Google DeepMind: https://deepmind.google/science/ #googleio #ai #science ___ Subscribe to our channel https://www.youtube.com/@googledeepmind Find us on X https://x.com/GoogleDeepMind Follow us on Instagram https://instagram.com/googledeepmind Add us on Linkedin https://www.linkedin.com/company/deepmind/

Read article →

Google DeepMind YouTube 2026-05-19 17:23 UTC Score 13.0 AI-145-20260519-podcasts-and-044a7752

This is Gemini Omni ♾️ #GoogleIO

Read article →

Weaviate Blog 2026-04-23 00:00 UTC Score 30.0 USR-0073-20260423-ai-specialis-ff8f396f

Weaviate 1.37 Release

This release introduces the built-in MCP Server, Extensible Tokenizers, Diversity Search (MMR), and Query Profiling as previews, along with Incremental Backups, Gemini audio support for multi2vec-google, and the new BlobHash property type.

Read article →

Big Technology 2026-04-20 18:24 UTC Score 25.0 USR-0107-20260420-ai-specialis-34a1ecd9

Google Cloud’s NEXT Big Moment

Google's once-forgotten Cloud division is making a run on the strength of Gemini. Here's what it needs to continue its ascent.

Read article →

Weaviate Blog 2026-04-01 00:00 UTC Score 36.0 USR-0073-20260401-ai-specialis-1ac34032

Multimodal Embeddings and RAG: A Practical Guide

Multimodal embeddings allow AI systems to search and reason across text, images, audio, and video in their native formats. This blog covers the key intuitions behind how this all works and walks through three practical implementations using Weaviate and Gemini.

Read article →

MongoDB AI Blog 2026-03-31 13:00 UTC Score 59.0 USR-0070-20260331-ai-specialis-1df660a1

Introducing MongoDB Agent Skills and Plugins for Coding Agents

Software engineering is evolving into agentic engineering. According to the Stack Overflow Developer Survey 2025, 84% of respondents use or plan to use AI tools in their development, up from 76% the previous year. At this rate, the tooling needs to keep pace. Last year, we introduced the MongoDB MCP Server to give agents the connectivity they need to interact with MongoDB, helping them generate context-aware code. But connectivity was only the start. Agents are generalists by design, and they don't inherently know the best practices and design patterns that real-world production systems demand. Today, we're addressing this by introducing official MongoDB Agent Skills: structured instructions, best practices, and resources that agents can discover and apply to generate more reliable code across the full development lifecycle, from schema design and performance optimization to implementing advanced capabilities like AI retrieval. To bring this directly into the tools you use, we're also launching plugins for Claude Code, Cursor, Gemini CLI, and VS Code, combining the MongoDB MCP Server and Agent Skills in a single, ready-to-use package. Turning coding agents into MongoDB experts Coding agents are great at producing working code, but they still make common mistakes in production systems, often defaulting to relational thinking that doesn't translate well to MongoDB, such as: Over-normalizing schemas, ignoring MongoDB's document-oriented strengths. Underusing compound indexes, c…

Read article →

Machine Learning Street Talk 2026-03-13 21:00 UTC Score 71.0 AI-141-20260313-podcasts-and-c52bdba8

When AI Discovers the Next Transformer — Robert Lange

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss *Shinka Evolve* — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: systems like AlphaEvolve can optimize solutions to fixed problems, but real scientific progress requires co-evolving the problems themselves. GTC is coming, the premier AI conference, great opportunity to learn about AI. NVIDIA and partners will showcase breakthroughs in physical AI, AI factories, agentic AI, and inference, exploring the next wave of AI innovation for developers and researchers. Register for virtual GTC for free, using my link and win NVIDIA DGX Spark (https://nvda.ws/4qQ0LMg) In this episode: • Why AlphaEvolve gets stuck — it needs a human to hand it the right problem. Shinka tries to invent new problems automatically, drawing on ideas from POET, PowerPlay, and MAP-Elites quality-diversity search. • The *architecture* of Shinka: an archive of programs organized as islands, LLMs used as mutation operators, and a UCB bandit that adaptively selects between frontier models (GPT-5, Sonnet 4.5, Gemini) mid-run. The credit-assignment problem across models turns out to be genuinely hard. • Concrete results — state-of-the-art circle packing with dramatically fewer evaluations, second place in an AtCoder competitive programming challenge, evolved load-balancing loss functions for mixture-of-experts models, and agent scaffolds for AIME math benchmarks. • Are these systems act…

Read article →

Last Week in AI 2026-03-13 05:38 UTC Score 46.0 USR-0103-20260313-ai-specialis-d005e1ce

LWiAI Podcast #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

OpenAI launches GPT-5.4 with Pro and Thinking versions, Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro, Where things stand with the Department of War Anthropic

Read article →

Last Week in AI 2026-03-05 08:42 UTC Score 38.0 USR-0103-20260305-ai-specialis-94cadf05

LWiAI Podcast #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

Anthropic releases Sonnet 4.6, Google Rolls Out Gemini 3.1 Pro, Anthropic CEO Amodei says Pentagon’s threats ‘do not change our position’ on AI

Read article →

Last Week in AI 2026-02-24 11:43 UTC Score 41.0 USR-0103-20260224-ai-specialis-e6e12a71

Last Week in AI #336 - Sonnet 4.6, Gemini 3.1 Pro, Anthropic vs Pentagon

Anthropic releases Sonnet 4.6, Google Rolls Out Latest AI Model Gemini 3.1 Pro, Pentagon threatens to cut off Anthropic in AI safeguards dispute

Read article →

Last Week in AI 2026-02-16 02:00 UTC Score 28.0 USR-0103-20260216-ai-specialis-1493b020

Last Week in AI #335 - Opus 4.6, Codex 5.3, Gemini 3 Deep Think, GLM 5, Seedance 2.0

A crazy packed edition of Last Week in AI! Plus some small updates.

Read article →

Last Week in AI 2026-02-06 05:06 UTC Score 35.0 USR-0103-20260206-ai-specialis-dfc8c153

LWiAI Podcast #233 - Moltbot, Genie 3, Qwen3-Max-Thinking

Google adds Gemini AI-powered ‘auto browse’ to Chrome, Users flock to open source Moltbot for always-on AI, Qwen3-Max-Thinking debuts, and more!

Read article →

TWIML AI Podcast 2025-12-17 19:24 UTC Score 56.0 AI-148-20251217-podcasts-and-50308e98

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

Today, we're joined by Aakanksha Chowdhery, member of technical staff at Reflection, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post-training techniques to improve reasoning, Aakanksha draws on her experience leading pre-training efforts for Google’s PaLM and early Gemini models to argue that pre-training itself must be rethought to move beyond static benchmarks. We explore the limitations of next-token prediction for multi-step workflows and examine how attention mechanisms, loss objectives, and training data must evolve to support long-form reasoning and planning. Aakanksha shares insights on the difference between context retrieval and actual reasoning, the importance of "trajectory" training data, and why scaling remains essential for discovering emergent agentic capabilities like error recovery and dynamic tool learning. The complete show notes for this episode can be found at https://twimlai.com/go/759.

Read article →

One Useful Thing 2025-11-18 16:55 UTC Score 34.0 USR-0105-20251118-ai-specialis-c0a3bae7

Three Years from GPT-3 to Gemini 3

From chatbots to agents

Read article →

Yannic Kilcher 2025-07-23 11:10 UTC Score 53.0 AI-140-20250723-podcasts-and-fca11150

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract: Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks. In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRj…

Read article →

Aider LLM Leaderboards 2025-05-07 00:00 UTC Score 35.0 USR-0170-20250507-ai-specialis-77e37718

Gemini 2.5 Pro Preview 03-25 benchmark cost

The $6.32 benchmark cost reported for Gemini 2.5 Pro Preview 03-25 was incorrect.

Read article →

Yannic Kilcher 2025-01-26 14:03 UTC Score 50.0 AI-140-20250126-podcasts-and-3a78dbd5

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

#deepseek #llm #grpo GRPO is one of the core advancements used in Deepseek-R1, but was introduced already last year in this paper that uses a combination of new RL techniques and iterative data collection to achieve remarkable performance on mathematics benchmarks with just a 7B model. Paper: https://arxiv.org/abs/2402.03300 Abstract: Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO. Authors: Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhan…

Read article →

Google Research Blog — Score 45.0 AI-047-nodate-official-ai--b0b89f14