AI/ML News & Innovations Hub

AI/ML news, top picks, and generated innovation digests.

★ Visit ai-karthik.com
422Sources
5100News Items
8Top Picks
43Blogs
runningLast Run

Latest AI/ML News

5100 matching items

Claude Meets Blackwell Ultra: Anthropic’s Models Now Run on NVIDIA GB300 in Azure
NVIDIA Blog 2026-06-29 17:00 UTC Score 83.0 AI-055-20260629-official-ai--e68b671f Full article

Claude Meets Blackwell Ultra: Anthropic’s Models Now Run on NVIDIA GB300 in Azure

Anthropic’s Claude models in Microsoft Foundry — hosted on Microsoft Azure and running on NVIDIA GB300 Blackwell Ultra GPUs — are now generally available, giving Azure-native enterprises a powerful new way to build autonomous and domain-specific AI agents. As agentic AI continues to drive enterprise innovation and becomes more autonomous, organizations need access to computing […]

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding
Simon Willison Weblog 2026-06-29 16:17 UTC Score 108.0 USR-0110-20260629-ai-specialis-0715a055 Full article

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. Built on top of pretrained Gemma 4 and Qwen 3.5, it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks. As far as I can tell the licenses of those underlying models is compatible with being used in this way - Gemma 4 is Apache 2.0 licensed (and not bound by the janky additional Gemma Terms of Use that afflicted the previous Gemma models) and Qwen 3.5 is Apache 2.0 licensed as well. I've been running the model using LM Studio and the ornith-1.0-35b-Q4_K_M.gguf (20GB) GGUF, hooked up to Pi . Initial impressions are very good - it seems to be able to run the agent harness over many tool calls in a proficient way. Here's a terminal session where I asked it to "find the code that decodes the actor cookie" and then "find the code that opens the insert dialog when thebutton is clicked" against a Datasette checkout, which it handled with ease. I also had it draw this pelican , which came out at 103 tokens/second: It's a little bit mangled but the pelican is clearly a pelican. I couldn't find much information about DeepReinforce themselves. The earliest paper I could find from the was CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning from June 2025. Tags: ai , generative-ai , lo…

AI data infrastructure startup Clairva raises $500K led by Venture Catalysts
Entrackr AI 2026-06-29 08:55 UTC Score 80.0 USR-0212-20260629-regional-new-b5504d6c Full article

AI data infrastructure startup Clairva raises $500K led by Venture Catalysts

AI data infrastructure startup Clairva has raised $500K in a pre-seed funding round led by Venture Catalysts through its angel network. The company will use the fresh capital to strengthen its licensed data supply network, expand partnerships with content owners and institutions, enhance data enrichment and validation capabilities, and support commercial engagement with global AI customers, Clairva said in a press release. Founded in 2025 by Sunil Nair, Sabari Raju, Dushyant Verma, and Amit Parashar, Clairva builds licensed, provenance backed datasets for AI foundation models, embodied AI, robotics, and autonomous systems. As AI models increasingly rely on high quality datasets, sourcing data with clear usage rights, provenance, and cultural context remains a challenge. Clairva works with content owners, production houses, studios, archives, institutions, and contributor networks to source, license, and structure real world data for AI training. The company is initially focused on India, Southeast Asia, and other Global South markets, where languages, environments, behaviours, gestures, workflows, and objects remain underrepresented in AI training datasets. According to Clairva, it is also developing proprietary technology across the data pipeline, including licensed dataset ingestion, rights and provenance tracking, automated enrichment, metadata generation, action and object tagging, temporal segmentation, quality validation, and dataset packaging.

Anthropomorphic Misalignment research needs stronger evidence
LessWrong AI 2026-06-28 19:08 UTC Score 91.0 USR-0152-20260628-community-fo-e36294f7 Full article

Anthropomorphic Misalignment research needs stronger evidence

This is a distillation of our ICML 2026 Oral position paper, Position: Anthropomorphic Misalignment Research Needs Stronger Evidence . Joint work by Vansh Gupta, Peter Nutter, Samuel Stante, Andreas Krause, Florian Tramèr, Lukas Fluri, Xin Chen, and Anna Hedström at ETH Zurich. Code is here . TL;DR AI safety research increasingly studies behaviors that sound human: deception, scheming, sycophancy, shutdown resistance, and emergent misalignment. We refer to this family of work as anthropomorphic misalignment research (AMR) . Anthropomorphic language is useful, as it points to the risks we are worried about. Yet it also tacitly introduces assumptions about models having intent or other human-like properties, which can lead to misclassified phenomena, mistaken conclusions, and misallocated resources. These behaviors are important to study, but doing so requires stronger and more rigorous evidence than the field currently provides. In the paper, we argue that AMR requires a clearer match between claims and evidence. Specifically, we: describe a shared AMR pipeline: target behavior framing, data construction, experimental design, and causal or mechanistic attribution; identify recurring failure points: vague concepts, narrow datasets, fragile evaluations, unreliable LLM judges, missing controls, and correlation being treated as causation; propose three evidence levels: L1 behavioral evidence, L2 functional evidence, and L3 causal-mechanistic evidence; offer 12 recommendations and…

LessWrong AI 2026-06-28 13:20 UTC Score 86.0 USR-0152-20260628-community-fo-a4e4e87c Full article

Evaluating Offline Monitoring of Internal AI Agents

This work was conducted during the GovAI Winter Fellowship 2026. Full report Executive Summary Frontier AI companies use offline monitoring to address risks from internally deployed AI agents. AI developers increasingly rely on AI agents for internal work, including for safety research and model training. At the same time, these companies are concerned that a misaligned model could exploit this access to take concerning actions, such as sabotaging efforts to understand the risks posed by AI. To identify such instances, AI companies have separate AI models called "monitors" that review transcripts of AI agents' actions and flag suspicious activity. Human reviewers examine activity flagged as suspicious by monitors, judge whether that activity is concerning, and decide on an appropriate response. This monitoring occurs offline, meaning that actions are reviewed after they have been executed rather than intercepted in real time. Companies currently assess the effectiveness of offline monitoring via synthetic attacks. To assess the effectiveness of offline monitoring, OpenAI and Anthropic use synthetic attacks – transcripts constructed to contain the kind of harmful actions a misaligned AI might take during deployment – and then check whether monitors flag them. Current reporting on assessments of effectiveness is insufficient. Given the information currently made public by Anthropic and OpenAI, external parties cannot assess the overall effectiveness of their offline monitoring…

Simon Willison Weblog 2026-06-22 23:43 UTC Score 86.0 USR-0110-20260622-ai-specialis-2d1def08 Full article

Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code

This morning on Hacker News I saw Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance , describing a small but effective inpainting model - a model where you can mark regions of an image to remove and the model imagines what should fill the space. The released model required PyTorch and NVIDIA CUDA , but since it described itself as 0.2B I decided to try and get it running using WebGPU in a browser. TL;DR: I got it working, and you can try the demo at simonw.github.io/moebius-web/ . Read on for the details. The finished tool Here's a video demo of the finished tool: You can open any image in it (non-square images get letterboxed), highlight areas to remove, click the "Run inpaint" button and wait for the model to do its magic. A parallel agent side-project My main project for today was landing a major feature in Datasette: a UI for creating and altering tables, as a follow-up to the insert and edit rows feature I released last week. I was working on that in Codex Desktop (here's the PR ) and often found myself spending 5-10 minutes spinning my fingers waiting for it to complete a mid-sized refactor or add the finishing touches to a change to the UI. (An amusing thing about coding agents is that the harder a problem is the more time you have to get distracted while you wait for them to finish crunching!) So I decided to spin up Claude Code in a terminal window and see how far I could get at porting Moebius to the web. Some agentic research to kick…

MongoDB AI Blog 2026-01-15 20:15 UTC Score 82.0 USR-0070-20260115-ai-specialis-0045c0cd Full article

MongoDB.local San Francisco 2026: Ship Production AI, Faster

Today at MongoDB.local San Francisco, we announced capabilities that collapse the distance between AI prototype and production. Building AI applications means solving real problems: keeping conversational context clean and queryable, retrieving the right information from thousands of past interactions, connecting AI agents to your data without custom plumbing. These aren't theoretical challenges, they're the friction points that slow teams down every day. The AI era demands more from your data platform. MongoDB gives you everything you need to build quickly. Voyage AI: the best gets better Embedding models can make or break AI search experiences. We're proud that voyage-3-large has been the world's top-performing embedding model on Hugging Face's RTEB benchmark since its inception. But we didn’t rest on our laurels. There’s a new model at the top of the charts. Today, we're pleased to announce that the Voyage 4 model family is now generally available. The best just got better. The voyage-4 series models operate in a shared embedding space, allowing for cross-model compatibility and unprecedented flexibility to optimize for accuracy, speed, or cost. This release also includes voyage-4-nano, our first open-weight model available on HuggingFace, perfect for local development. Additionally, we're launching the new voyage-multimodal-3.5 model, which has been specifically trained to support video content alongside text and images. For developers building multimodal AI applications…

Amazon Science AI 2025-11-11 19:53 UTC Score 81.0 AI-058-20251111-official-ai--83535c43 Full article

Automated composition of agents: A knapsack approach for agentic component selection

Designing effective agentic systems requires the seamless composition and integration of agents, tools, and models within dynamic and uncertain environments. Most existing methods rely on static, semantic retrieval approaches for tool or agent discovery. However, effective reuse and composition of existing components remain challenging due to incomplete capability descriptions and the limitations of retrieval methods. Component selection suffers because the decisions are not based on capability, cost, and real-time utility. To address these challenges, we introduce a structured, automated framework for agentic system composition that is inspired by the knapsack problem. Our framework enables a composer agent to systematically identify, select, and assemble an optimal set of agentic components by jointly considering performance, budget constraints, and compatibility. By dynamically testing candidate components and modeling their utility in real-time, our approach streamlines the assembly of agentic systems and facilitates scalable reuse of resources. Empirical evaluation with Claude 3.5 Sonnet across five benchmarking datasets shows that our online-knapsack-based composer consistently lies on the Pareto frontier, achieving higher success rates at significantly lower component costs compared to our baselines. In the single-agent setup, the online knapsack composer shows a success rate improvement of up to 31.6% in comparison to the retrieval baselines. In multi-agent systems,…

Access Now AI 2026-07-14 13:00 UTC Score 49.0 USR-0142-20260714-ai-specialis-6ece9eff Full article

Stronger together: digital security and resilience for LGBTQ+ people

Join the next webinar organized by the Digital Security Helpline, to discuss key trends and strategies to keep at-risk actors safe online. The post Stronger together: digital security and resilience for LGBTQ+ people appeared first on Access Now .

Bajaj Finserv Ventures leads $10 Mn pre Series B round in Kapture CX
Entrackr AI 2026-06-30 01:00 UTC Score 75.0 USR-0212-20260630-regional-new-b64fc3a2 Full article

Bajaj Finserv Ventures leads $10 Mn pre Series B round in Kapture CX

Verticalized full stack agentic AI platform Kapture CX has raised $10 million in a pre Series B funding round led by Bajaj Finserv Ventures (BFSV), part of Bajaj Finserv, with participation from its existing investors Cactus Venture Partners and India Alternatives. Prior to this, the Bengaluru based company had secured $4 million led India Alternatives extended Series A round in December 2023 and $4 million in a Series A round led by Cactus Venture Partners (CVP) in July 2023. The fresh proceeds will be utilized for expansion into multiple global markets and continued investment in R&D and product development, Kapture CX said in a press release. Co-founded in 2014 by Sheshgiri Kamath and Vikas Garg, Kapture CX is a verticalized, full stack agentic AI platform built to orchestrate high stakes workflows for large enterprises. Through its deep tech capabilities, it brings AI agents, operational intelligence, and human oversight into one system, allowing enterprises to run complex operations at scale. Kapture CX said that enterprises face a fragmented market with point products from multiple providers, making AI adoption a high effort exercise. According to the company, enterprises need a full stack agentic AI platform that understands industry specific requirements and delivers customized solutions for complex workflows. This is the gap Kapture aims to address. By owning and optimizing the full technology stack, from the models to the agentic layer and the user interface, Kaptu…

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 35.0 AI-084-20260630-research-pap-30adb525 Full article

Persistence Diagrams Estimation of Multivariate Piecewise H{\"o}lder-continuous Signals

To our knowledge, the analysis of convergence rates for persistence diagrams estimation from noisy signals has predominantly relied on lifting signal estimation results through sup-norm (or other functional norm) stability theorems. We believe that moving forward from this approach can lead to considerable gains. We illustrate it in the setting of nonparametric regression. From a minimax perspective, we examine the inference of persistence diagrams (for the sublevel sets filtration). We show that for piecewise Hölder-continuous functions, with control over the reach of the set of discontinuities, taking the persistence diagram coming from a simple histogram estimator of the signal permits achieving the minimax rates known for Hölder-continuous functions. The key novelty lies in our use of algebraic stability instead of sup-norm stability, directly targeting the bottleneck distance through the underlying interleaving. This allows us to incorporate deformation retractions of sublevel sets to accommodate boundary discontinuities that cannot be handled by sup-norm based stability analyses.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 46.0 AI-084-20260630-research-pap-8e27726f Full article

CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration

The present work aims at proving mathematically that a neural network inspired by biology can learn a classification task thanks to local transformations only. In this purpose, we propose a spiking neural network named CHANI (Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration), whose neurons activity is modeled by Hawkes processes. Synaptic weights are updated thanks to an expert aggregation algorithm, providing a local and simple learning rule. We were able to prove that our network can learn on average and asymptotically. Moreover, we demonstrated that it automatically produces neuronal assemblies in the sense that the network can encode several classes and that a same neuron in the intermediate layers might be activated by more than one class, and we provided numerical simulations on synthetic datasets. This theoretical approach contrasts with the traditional empirical validation of biologically inspired networks and paves the way for understanding how local learning rules enable neurons to form assemblies able to represent complex concepts.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 53.0 AI-084-20260630-research-pap-580085ec Full article

Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $\epsilon >0$ our approach is able to return a mixture of Gaussian processes that is $\epsilon$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process,…

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 34.0 AI-084-20260630-research-pap-becde917 Full article

Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width

Understanding the generalization and optimization of neural networks is a longstanding problem in modern learning theory. The prior analysis often leads to risk bounds of order $1/\sqrt{n}$ for ReLU networks, where $n$ is the sample size. In this paper, we present a general optimization and generalization analysis for gradient descent applied to shallow ReLU networks. We develop convergence rates of the order $1/T$ for gradient descent with $T$ iterations, and show that the gradient descent iterates fall inside local balls around either an initialization point or a reference point. Then we develop improved Rademacher complexity estimates by using the activation pattern of the ReLU function in these local balls. We apply our general result to NTK-separable data with a margin $\gamma$, and develop an almost optimal risk bound of the order $1/(n\gamma^2)$ for the ReLU network with a polylogarithmic width.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 40.0 AI-084-20260630-research-pap-431bf840 Full article

Adaptive Forward Stepwise: A Method for High Sparsity Regression

This paper proposes a sparse regression method that continuously interpolates between Forward Stepwise selection (FS) and the LASSO. When tuned appropriately, our solutions are much sparser than typical LASSO fits but, unlike FS fits, benefit from the stabilizing effect of shrinkage. Our method, Adaptive Forward Stepwise Regression (AFS) addresses the need for sparser models with shrinkage. We show its connection with boosting via a soft-thresholding viewpoint and demonstrate the ease of adapting the method to classification tasks. In both simulations and real data, our method has lower mean squared error and fewer selected features across multiple settings compared to popular sparse modeling procedures.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 49.0 AI-084-20260630-research-pap-960a167b Full article

Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

Attention mechanisms have revolutionized several domains of artificial intelligence, such as natural language processing and computer vision, by enabling models to selectively focus on relevant parts of the input data. While recent work has characterized the optimization dynamics of gradient descent (GD) in attention-based models and the structural properties of its preferred solutions, less is known about more general optimization algorithms such as mirror descent (MD). In this paper, we investigate the convergence properties and implicit biases of a family of MD algorithms tailored for softmax attention mechanisms, with the potential function chosen as the $p$-th power of the $\ell_p$-norm. Specifically, we show that these algorithms converge in direction to a generalized hard-margin SVM with an $\ell_p$-norm objective when applied to a classification problem using a softmax attention model. Notably, our theoretical results reveal that the convergence rate is comparable to that of traditional GD in simpler models, despite the highly nonlinear and nonconvex nature of the present problem. Additionally, we delve into the joint optimization dynamics of the key-query matrix and the decoder, establishing conditions under which this complex joint optimization converges to their respective hard-margin SVM solutions. Lastly, our numerical experiments on real data demonstrate that MD algorithms improve generalization over standard GD and excel in optimal token selection.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 44.0 AI-084-20260630-research-pap-f09b1b42 Full article

Hierarchical Causal Models

Causal questions often arise in settings where data are hierarchical: subunits are nested within units. Consider students in schools, cells in patients, or cities in states. In these settings, unit-level variables (e.g., a school's budget) may affect subunit-level outcomes (e.g., student test scores), and subunit-level characteristics may aggregate to influence unit-level outcomes. In this paper, we show how to analyze hierarchical data for causal inference. We introduce hierarchical causal models, which extend structural causal models and graphical models by incorporating inner plates to represent nested data structures. We develop a graphical identification technique for these models that generalizes do-calculus. We show that hierarchical data can enable causal identification even when it would be impossible with non-hierarchical data--for example, when only unit-level summaries are available. We develop estimation strategies, including using hierarchical Bayesian models. We illustrate our results in simulation and through a reanalysis of the classic "eight schools" study.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 32.0 AI-084-20260630-research-pap-7ebf15c1 Full article

Reparameterized Complex-valued Neurons Can Efficiently Learn More than Real-valued Neurons via Gradient Descent

Complex-valued neural networks potentially possess better representations and performance than real-valued counterparts when dealing with some complicated tasks such as acoustic analysis, radar image classification, etc. Despite empirical successes, it remains unknown theoretically when and to what extent complex-valued neural networks outperform real-valued ones. We take one step in this direction by comparing the learnability of real-valued neurons and complex-valued neurons via gradient descent. We theoretically show that a complex-valued neuron can learn functions expressed by any one real-valued neuron and any one complex-valued neuron with convergence rates $O(t^{-3})$ and $O(t^{-1})$ where $t$ is the iteration index of gradient descent, respectively, whereas a two-layer real-valued neural network with finite width cannot learn a single non-degenerate complex-valued neuron. We prove that a complex-valued neuron learns a real-valued neuron with rate $\Omega (t^{-3})$, exponentially slower than the linear convergence rate of learning one real-valued neuron using a real-valued neuron. We then reparameterize the phase parameter of the complex-valued neuron and prove that a reparameterized complex-valued neuron can efficiently learn a real-valued neuron with a linear convergence rate. We further verify and extend these results via simulation experiments in more general settings.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 56.0 AI-084-20260630-research-pap-3485f7f6 Full article

Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization

Unsupervised feature selection has drawn wide attention in the era of big data, since it serves as a fundamental technique for dimensionality reduction. However, many existing unsupervised feature selection models and solution methods are primarily designed for practical applications, and often lack rigorous theoretical support, such as convergence guarantees. In this paper, we first establish a novel unsupervised feature selection model based on regularized minimization with nonnegative orthogonality constraints, which has advantages of embedding feature selection into the nonnegative spectral clustering and preventing overfitting. To solve the proposed model, we develop an effective inexact augmented Lagrangian multiplier method, in which the subproblems are addressed using a proximal alternating minimization approach. We rigorously prove the algorithm's sequence converges to a stationary point of the model. Extensive numerical experiments on popular datasets demonstrate the stability and robustness of our method. Moreover, comparative results show that our method outperforms some existing state-of-the-art methods in terms of clustering evaluation metrics. The code is available at https://github.com/liyan-amss/NOCRM_code.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 37.0 AI-084-20260630-research-pap-4c8ffe45 Full article

A causal fused lasso for interpretable heterogeneous treatment effects estimation

We propose a novel method for estimating heterogeneous treatment effects based on the fused lasso. By first ordering samples based on the propensity or prognostic score, we match units from the treatment and control groups. We then run the fused lasso to obtain piecewise constant treatment effects with respect to the ordering defined by the score. Similar to the existing methods based on discretizing the score, our methods yield interpretable subgroup effects. However, existing methods fixed the subgroup a priori, but our causal fused lasso forms data-adaptive subgroups. We show that the estimator consistently estimates the treatment effects conditional on the score under very general conditions on the covariates and treatment. We demonstrate the performance of our procedure using extensive experiments that show that it can be interpretable and competitive with state-of-the-art methods.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 40.0 AI-084-20260630-research-pap-ea9eaea8 Full article

Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 35.0 AI-084-20260630-research-pap-559620d5 Full article

Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing nonsmooth nonconvex objectives, where each parameter block is constrained within a subset of a Riemannian manifold. We establish that this algorithm converges asymptotically to the set of stationary points, and attains an $\epsilon$-stationary point within $\widetilde{O}(\epsilon^{-2})$ iterations. In particular, the assumptions for our complexity results are completely Euclidean when the underlying manifold is a product of Euclidean or Stiefel manifolds, although our analysis makes explicit use of the Riemannian geometry. Our general analysis applies to a wide range of algorithms with Riemannian constraints: Riemannian MM, block projected gradient descent, Bures-JKO scheme for Wasserstein variational inference, optimistic likelihood estimation, geodesically constrained subspace tracking, robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate that our algorithm converges faster than standard Euclidean algorithms applied to the Riemannian setting.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 45.0 AI-084-20260630-research-pap-c85cc9d4 Full article

Two-way Node Popularity Model for Directed and Bipartite Networks

There has been increasing research attention on community detection in directed and bipartite networks. However, these studies often fail to consider the popularity of nodes in different communities, which is a common phenomenon in real-world networks. To address this issue, we propose a new probabilistic framework called the Two-Way Node Popularity Model (TNPM). The TNPM also accommodates edges from different distributions within a general sub-Gaussian family. We introduce the Delete-One-Method (DOM) for model fitting and community structure identification, and provide a comprehensive theoretical analysis with novel technical skills dealing with sub-Gaussian generalization. Additionally, we propose the Two-Stage Divided Cosine Algorithm (TSDC) to handle large-scale networks more efficiently. Our proposed methods offer multi-folded advantages in terms of estimation accuracy and computational efficiency, as demonstrated through extensive numerical studies. We apply our methods to two real-world applications, uncovering interesting findings.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 40.0 AI-084-20260630-research-pap-88fcd4ae Full article

A Symplectic Analysis of Alternating Mirror Descent

Motivated by understanding the behavior of the Alternating Mirror Descent (AMD) algorithm for bilinear zero-sum games, we study the discretization of continuous-time Hamiltonian flow via the symplectic Euler method. We provide a framework for analysis using results from Hamiltonian dynamics and symplectic numerical integrators, with an emphasis on the existence and properties of a conserved quantity, the modified Hamiltonian (MH), for the symplectic Euler method. We compute the MH in closed-form when the original Hamiltonian is a quadratic function, and show that it generally differs from the other conserved quantity known previously in the literature. We derive new error bounds on the MH when truncated at orders in the stepsize in terms of the number of iterations, $K$, and use these bounds to show an improved $\mathcal{O}(K^{1/5})$ total regret bound and an $\mathcal{O}(K^{-4/5})$ duality gap of the average iterates for AMD. Finally, we propose a conjecture which, if true, would imply that the total regret for AMD scales as $\mathcal{O}\left(K^{\varepsilon}\right)$ and the duality gap of the average iterates as $\mathcal{O}\left(K^{-1+\varepsilon}\right)$ for any $\varepsilon>0$, and we can take $\varepsilon=0$ upon certain convergence conditions for the MH.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 52.0 AI-084-20260630-research-pap-d7de33ee Full article

Contrasting Local and Global Modeling with Machine Learning and Satellite Data: A Case Study Estimating Tree Canopy Height in African Savannas

While advances in machine learning with satellite imagery (SatML) are facilitating environmental monitoring at a global scale, developing SatML models that are accurate and useful for local regions remains critical to understanding and acting on an ever-changing planet. As increasing attention and resources are being devoted to training SatML models with global data, it is important to understand when improvements in global models will make it easier to train or fine-tune models that are accurate in specific regions. To explore this question, we design the first study that explicitly contrasts local and global training paradigms for SatML, through a case study of tree canopy height (TCH) mapping in the Karingani Game Reserve, Mozambique. We find that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in our study region. Specifically, small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that we fine-tune using local data. Analyzing these results further, we identify specific points of conflict and synergy between local and global modeling paradigms that can inform future research toward aligning local and global performance objectives in geospatial machine learning.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 57.0 AI-084-20260630-research-pap-7ae4e587 Full article

Boosted Control Functions: Distribution Generalization and Invariance in Confounded Models

Modern machine learning methods and the availability of large-scale data have significantly advanced our ability to predict target quantities from large sets of covariates. However, these methods often struggle under distributional shifts, particularly in the presence of hidden confounding. While the impact of hidden confounding is well-studied in causal effect estimation, e.g., instrumental variables, its implications for prediction tasks under shifting distributions remain underexplored. This work addresses this gap by introducing a strong notion of invariance that, unlike existing weaker notions, allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions. Central to this framework is the Boosted Control Function (BCF), a novel, identifiable target of inference that satisfies the proposed strong invariance notion and is provably worst-case optimal under distributional shifts. The theoretical foundation of our work lies in Simultaneous Equation Models for Distribution Generalization (SIMDGs), which bridge machine learning with econometrics by describing data-generating processes under distributional shifts. To put these insights into practice, we propose the ControlTwicing algorithm to estimate the BCF using nonparametric machine-learning techniques and study its generalization performance on synthetic and real-world datasets compared to robust and empirical risk minimization approaches.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 57.0 AI-084-20260630-research-pap-ecd55d0e Full article

DCatalyst: A Unified Accelerated Framework for Decentralized Optimization

We study decentralized optimization over a network of agents, modeled as an undirected graph and operating without a central server. The objective is to minimize a composite function $f+r$, where $f$ is a (strongly) convex function representing the average of the agents' losses, and $r$ is a convex, extended-value function (regularizer). We introduce DCatalyst, a unified black-box framework that injects Nesterov-type acceleration into decentralized optimization algorithms. At its core, DCatalyst is an inexact, momentum-accelerated proximal scheme (outer loop) that seamlessly wraps around a given decentralized method (inner loop). We show that DCatalyst attains optimal (up to logarithmic factors) communication and computational complexity across a broad family of decentralized algorithms and problem instances. In particular, it delivers accelerated rates for problem classes that previously lacked accelerated decentralized methods, thereby broadening the effectiveness of decentralized methods. On the technical side, our framework introduces inexact estimating sequences--an extension of Nesterov's classical estimating sequences, tailored to decentralized, composite optimization. This construction systematically accommodates consensus errors and inexact solutions of local subproblems, addressing challenges that existing estimating-sequence-based analyses cannot handle while retaining a black-box, plug-and-play character.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 48.0 AI-084-20260630-research-pap-e175ef24 Full article

Covariate-dependent Hierarchical Dirichlet Processes

Bayesian hierarchical modeling is a natural framework to effectively integrate data and borrow information across groups. In this paper, we address problems related to density estimation and identifying clusters across related groups, by proposing a hierarchical Bayesian approach that incorporates additional covariate information. To achieve flexibility, our approach builds on ideas from Bayesian nonparametrics, combining the hierarchical Dirichlet process with dependent Dirichlet processes. The proposed model is widely applicable, accommodating multiple and mixed covariate types through appropriate kernel functions as well as different output types through suitable component-specific likelihoods. This extends our ability to discern the relationship between covariates and clusters, while also effectively borrowing information and quantifying differences across groups. By employing a data augmentation trick, we are able to tackle the intractable normalized weights and construct a Markov chain Monte Carlo algorithm for posterior inference. The proposed method is illustrated on simulated data and two real data sets on single-cell RNA sequencing (scRNA-seq) and calcium imaging. For scRNA-seq data, we show that the incorporation of cell dynamics facilitates the discovery of additional cell subgroups. On calcium imaging data, our method identifies interpretable clusters of time frames with similar neural activity, aligning with the observed behavior of the animal.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 51.0 AI-084-20260630-research-pap-bea925da Full article

Online Bernstein-von Mises theorem

Online learning is an inferential paradigm in which parameters are updated incrementally from sequentially available data, in contrast to batch learning, where the entire dataset is processed at once. In this paper, we assume that mini-batches from the full dataset become available sequentially. The Bayesian framework, which updates beliefs about unknown parameters after observing each mini-batch, is naturally suited for online learning. At each step, we update the posterior distribution using the current prior and new observations, with the updated posterior serving as the prior for the next step. However, this recursive Bayesian updating is rarely computationally tractable unless the model and prior are conjugate. When the model is regular, the updated posterior can be approximated by a normal distribution, as justified by the Bernstein-von Mises theorem. We adopt a variational approximation at each step and investigate the frequentist properties of the final posterior obtained through this sequential procedure. Under mild assumptions, we show that the accumulated approximation error becomes negligible once the mini-batch size exceeds a threshold depending on the parameter dimension. As a result, the sequentially updated posterior is asymptotically indistinguishable from the full posterior.

Transactions on Machine Learning Research 2026-06-30 00:00 UTC Score 40.0 AI-084-20260630-research-pap-bedebbba Full article

Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective

The Transformer model is widely used in various application areas of machine learning, such as natural language processing. This paper investigates the approximation of the Hölder continuous function class $\mathcal{H}_{Q}^{\beta}\left([0,1]^{d\times n},\mathbb{R}^{d\times n}\right)$ by Transformers and constructs several Transformers that can overcome the curse of dimensionality. These Transformers consist of one self-attention layer with one head and the softmax function as the activation function, along with several feedforward layers. For example, to achieve an approximation accuracy of $\epsilon$, if the activation functions of the feedforward layers in the Transformer are ReLU and floor, only $\mathcal{O}\left(\log\frac{1}{\epsilon}\right)$ layers of feedforward layers are needed, with widths of these layers not exceeding $\mathcal{O}\left(\frac{1}{\epsilon^{2/\beta}}\log\frac{1}{\epsilon}\right)$. If other activation functions are allowed in the feedforward layers, the width of the feedforward layers can be further reduced to a constant. These results demonstrate that Transformers have a strong expressive capability. The construction in this paper is based on the Kolmogorov-Arnold Superposition Theorem and does not require the concept of contextual mapping, hence our proof is more intuitively clear compared to previous Transformer approximation works. Additionally, the translation technique proposed in this paper helps to apply the previous approximation results of fe…

Azul offers free JVM vulnerability risk assessment
InfoWorld AI 2026-06-29 23:35 UTC Score 57.0 USR-0126-20260629-global-ai-ne-dab66e37 Full article

Azul offers free JVM vulnerability risk assessment

Azul has introduced free vulnerability risk assessment for Java virtual machines (JVMs). Citing AI models such as Claude Mythos, which can automatically discover vulnerabilities and create exploits long before they’re disclosed, the company says it aims to address the blind spots that these autonomous AI-powered exploitation tools are able to find. Users can request the free JVM vulnerability risk assessment at Azul’s website . To counter AI-driven exploits, Azul’s assessment maps discovered JVM vulnerabilities directly to Stable Critical Patch Updates (CPUs), which are security-only patches that can be dropped into live production environments immediately without the risk of breaking software, Azul said. Announced June 17, Azul’s free JVM risk vulnerability assessment is available at no cost, direct from Azul and via select Azul partners, the company said. In a single engagement, organizations receive the following: Executive-ready security dashboard: A visual summary of the entire Java estate, broken down by risk tier, publisher, and Java version — designed for CxO-level consumption and board reporting. Risk-by-version breakdown: Identification of the specific Java versions driving the highest exposure, so remediation effort can be directed where it matters most rather than spread uniformly. Key Risk Indicators (KRIs) for AI-driven exploits: Visibility into which JVMs carry active Known Exploited Vulnerability (KEV) exposure — the highest-priority threat class recognized i…

Yen weakens against dollar, rattling Japan
Semafor Technology 2026-06-29 23:28 UTC Score 52.0 USR-0094-20260629-global-ai-ne-861a67d7 Full article

Yen weakens against dollar, rattling Japan

The yen fell to its weakest level against the US dollar since 1986, raising the prospect of renewed intervention from Japanese authorities.

Russia’s fiscal discipline erodes
Semafor Technology 2026-06-29 23:19 UTC Score 50.0 USR-0094-20260629-global-ai-ne-57e4d5d8 Full article

Russia’s fiscal discipline erodes

Moscow pushed through legislation allowing it to borrow more money, a rare belt-loosening from the fiscally conservative Kremlin that implies that the Russian system is eroding, a scholar argued.

OpenAI Community 2026-06-29 23:08 UTC Score 40.0 AI-116-20260629-social-media-2cc9fa11 Full article

Feature Request: Make Project Memory Transparent, Searchable, and User-Controlled

Thanks for sharing this thoughtful feature request. I can see how greater transparency and control over Project Memory and Project retrieval would be valuable, especially for users managing long-term projects where continuity and visibility into retrieved context are important. I'll pass this feedback along to the team for consideration. Thanks again for taking the time to share these suggestions. ~ Smith

Burnham vows to 'rewire' Britain
Semafor Technology 2026-06-29 23:08 UTC Score 50.0 USR-0094-20260629-global-ai-ne-bb8e9575 Full article

Burnham vows to 'rewire' Britain

The UK’s likely next prime minister pledged to redistribute political clout across the country, setting up a “No. 10 North” in Manchester.

Feature request: Exportable, auditable context checkpoints for long conversations
OpenAI Community 2026-06-29 23:03 UTC Score 37.0 AI-116-20260629-social-media-2119578f Full article

Feature request: Exportable, auditable context checkpoints for long conversations

Welcome to the dev Community, @renesugar Thanks for taking the time to share such a detailed feature request. I can see how exportable, reviewable context checkpoints and conversation-level exports would be valuable for users working on long-running projects where continuity and context integrity are important. I'll pass this feedback along to the team for consideration. ~ Smith

Planned VW layoffs add to Berlin's woes
Semafor Technology 2026-06-29 23:00 UTC Score 49.0 USR-0094-20260629-global-ai-ne-50fe8ac5 Full article

Planned VW layoffs add to Berlin's woes

German politicians vowed Monday to prevent Volkswagen from cutting 100,000 jobs, as Berlin confronts the scale of the country’s industrial woes.