Highlights from Git 2.55
The open source Git project just released Git 2.55. Here is GitHub’s look at some of the most interesting features and changes introduced since last time. The post Highlights from Git 2.55 appeared first on The GitHub Blog .
AI/ML news, top picks, and generated innovation digests.
120 articles tagged with this keyword, sorted by most recent first.
The open source Git project just released Git 2.55. Here is GitHub’s look at some of the most interesting features and changes introduced since last time. The post Highlights from Git 2.55 appeared first on The GitHub Blog .
Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding This is an interesting new open weights (MIT licensed) model, the first model release from DeepReinforce. [...] with variants including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. Built on top of pretrained Gemma 4 and Qwen 3.5, it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks. As far as I can tell the licenses of those underlying models is compatible with being used in this way - Gemma 4 is Apache 2.0 licensed (and not bound by the janky additional Gemma Terms of Use that afflicted the previous Gemma models) and Qwen 3.5 is Apache 2.0 licensed as well. I've been running the model using LM Studio and the ornith-1.0-35b-Q4_K_M.gguf (20GB) GGUF, hooked up to Pi . Initial impressions are very good - it seems to be able to run the agent harness over many tool calls in a proficient way. Here's a terminal session where I asked it to "find the code that decodes the actor cookie" and then "find the code that opens the insert dialog when thebutton is clicked" against a Datasette checkout, which it handled with ease. I also had it draw this pelican , which came out at 103 tokens/second: It's a little bit mangled but the pelican is clearly a pelican. I couldn't find much information about DeepReinforce themselves. The earliest paper I could find from the was CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning from June 2025. Tags: ai , generative-ai , lo…
Starting from IntelliJ IDEA 2026.2, JetBrains will sunset Kotlin Notebook as a product and will no longer maintain it. The plugin will remain available on an open-source model so the community can continue its development. Below, we explain why we’re making this change, how it affects current Kotlin Notebook users, what comes next, and how […]
Showcasing the importance of open source innovation in American AI, Palantir’s new intelligent engine — introduced today — uses NVIDIA Nemotron open models to serve the needs of U.S. government agencies. Open source software has long been a pillar of U.S. technology leadership. In 1969, DARPA connected four university computers — from UCLA, Stanford, UCSB […]
Die Linux Foundation und Tech-Giganten starten Akrites, um Open-Source-Sicherheitslücken zentral und vertraulich zu beheben.
I recently made this great repo for polygenic prediction and embryo selection which I want to share with people. I've wanted something like this for almost a decade, and it's so easy now that we have these superhuman coding models. Note that I also have this longer technical essay attached to the repo, as well as these slides (I think they're both very nice!) Let's look at how everything works now. Data My repo pulls in data for existing predictors from the pgs (polygenic score) catalog, and filters to the best weights for each feature using claude's best judgment (this worked better than using simpler heuristics like recency and dataset size). There are predictors for intelligence, height, and many disease traits. Across adults these correlate with measured phenotype at around 0.3, 0.65, and 0.15-0.3 after accounting for obvious confounders like sex and age, so pretty nontrivial. In addition to uploading those final prediction weights, researchers will also upload per-snp (single-nucleotide polymorphism) correlations for each trait. Remarkably, those open-source gwas (genome-wide association study) sumstats are sufficient to rederive state of the art predictors. The field has rallied around developing techniques like lassosum or LDpred or SBayesRC for learning pgs weights, each of which assumes that all you have access to is these gwas sumstats, along with population-level linkage-disequilibrium matrices encoding how frequently neighboring snp's occur together compared to c…
A game engine that might be interesting for newcomers is Godot . I’ve just started using it, so I can’t say much yet, but it’s interesting. I wish the engine had existed a few years earlier. I’m interested if there are developers who have already worked with Godot and can share their opinions about the engine. (When Unity betrayed its customers, I left. That was very painful because I had invested a lot of time. Unfortunately, chatbots didn’t exist back then and setting up took a lot of time. If there had been chatbots at that time, I probably would have had at least one completed game. After these experiences, I decided to only use open-source tools now. Finally I’m switching to Blender as well. Unfortunately there are limitations regarding programming languages for me, because I absolutely hate C. But let’s see what happens when I actually find the time and motivation to start such a demanding project again.) Hope some team will take up Mono development again, since it’s now much easier to code… And hope chatty helps to transcode some of the unity codes. This was a test for a object generator for blender, spend months for such things in the past, transcoded in only some days with gpt. (Yes not a game, but a object for one.) I think it is easy possible to create a story telling Game with llm. But i would use a fast offline model for this. (I still like old fashion jump and run games with story the most. I am not so much a online gamer, so not a zombie killer, and not enough…
China's Zhipu AI (Z.ai) released its open-weight GLM-5.2, and some researchers have claimed that it matches Mythos in certain bug-finding and cybersecurity scenarios. While GLM lags behind models from Anthropic and OpenAI in other, more general tasks, it seems that China has dramatically reduced the gap in the capabilities between its models and those of […]
TL;DR It would make sense to briefly skim through our previous post that introduces our experiments on refusal in LLMs . There we explain how it started, here we’ll tell how it’s going. The primary goal of this text is to try and structure the list of whack-a-mole research questions. The secondary goal is to get some outside perspective, so if you run a similar research or have seen a similar research, please lend us a hand. Feel free to jump straight to the section that looks most appealing. We recommend skimming through “The Main Question” as this section provides a broader perspective. Then we listed all other questions that arose during research. You’ll find them under headers “Another Question: …” and “Wording Also Matters”. The first one discusses how refusal is represented in different layers and what it might mean. The second one is dedicated to two parts of refusal – its wording and actual detection of a potentially harmful request. “The Main Question” is split into two parts: in “Our suggestion” we outline our main hypothesis and proofs we found during our experiments; in “An Alternative Suggestion” we highlight the opposing point of view and proofs behind it. The Main Question (MQ) We experiment on open-weight small (~9B) instruct models trying to understand what exactly happens when they refuse to provide an answer given different contexts. One of the core observations is, refusal looks different for different categories of potential harm (for example, a request…
Liquid AI released LFM2.5-230M, its smallest model yet. The 230M-parameter, open-weight model runs on-device at 213 tok/s on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5. Built on the LFM2 architecture, it targets tool use and data extraction, beating larger models like Qwen3.5-0.8B and Gemma 3 1B on instruction following. The post Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference appeared first on MarkTechPost .
One week, the whole frontier. In models, the open weights now run from a 1.6-trillion-parameter behemoth to a 230M model on a Raspberry Pi. In world models and robotics, a startup is training agents on video games to drive real robots and Yann LeCun's team made world models 48× faster. In medicine, GPT-5 Pro cracked a three-year immunology mystery and a founder used Claude to read his own cancer scans. And the agents doing all this reached every phone — and a fresh attack surface. Below: the marquee advances, the deep cuts, and where it's already paying off.
DeepSeek open-sourced DSpark, a speculative decoding framework that attaches a draft module to existing DeepSeek-V4 weights. It pairs a parallel draft backbone with a lightweight Markov head to cut suffix decay, then adds confidence-scheduled verification that tailors how many tokens get checked to real-time GPU load. Offline, accepted length rises 16–31% over DFlash and Eagle3; in production it speeds per-user generation 57–85% over the MTP-1 baseline, losslessly. The training repo, DeepSpec, ships under MIT. The post DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1 appeared first on MarkTechPost .
Using Open-Weight Models in Local Coding Harnesses as an Alternative to Claude Code and Codex Subscriptions
Meta released Astryx, an open-source React design system built on StyleX. It pairs a CSS-variable theme cascade with a CLI and MCP server, so both engineers and AI agents build using the same API. The project is in Beta, MIT-licensed, and grew inside Meta over eight years. The post Meta’s Astryx Brings a CLI and MCP Server to an Open-Source React Design System Agents Can Read appeared first on MarkTechPost .
Corgi became embroiled in controversy when Papermark accused it of stealing its software. Corgi says it did not, raising new questions about vibe coding.
AI has really changed the game around software development. More people are leveraging AI than ever to contribute patches to projects they use. To me, this is a good thing as more folks will contribute patches rather than fork or not fix them. The main problem is that AI has made generating code fast but there has been very little improvement in maintaining code bases. In this post, we will highlight the ways the Kubernetes community is adapting to the world of AI assisted coding. The first step of this journey was to develop an AI policy. This seems mundane and bureaucratic but there were many PRs that derailed into discussions around AI usage. The AI policy helps steer the conversation around the project's stance on AI and provides a clear signal to contributors on how to use these tools responsibly. Kubernetes AI policy The Kubernetes project has established clear guidelines for AI-assisted contributions that balance innovation with accountability. These policies are designed to maintain code quality and ensure human oversight while acknowledging that AI tools can be valuable aids in the development process. Transparency first Contributors must disclose when AI tools have been used to assist with a pull request. A simple statement in the PR description such as "This PR was written in part with the assistance of generative AI" is sufficient. This transparency helps reviewers understand the context and apply appropriate scrutiny. Human accountability While AI tools can assi…
GitHub joined the United Nations Development Programme in Ghana to explore how open source governance can support one of West Africa's most ambitious digital reform efforts. The post GitHub and UNDP team up to advance development priorities in Ghana with open source appeared first on The GitHub Blog .
Open source advocates remain concerned over lack of binding commitments
Short note on trying local open-weight LLMs across Qwen-Code, Codex, and Claude Code harnesses.
Apple released container 1.0, an open-source Swift tool running Linux containers as lightweight virtual machines on Apple silicon. The post Meet container: Apple’s Open-Source Swift Tool for Running Linux Containers as Lightweight VMs on Apple Silicon appeared first on MarkTechPost .
Nearly a year and a half after China’s DeepSeek shook Silicon Valley with its powerful yet affordable artificial intelligence model, Beijing-based Zhipu AI has delivered another jolt to the US tech industry. American entrepreneurs and researchers are praising the coding performance and cost-effectiveness of Zhipu’s new flagship model, GLM-5.2. Released earlier this month, the model’s release is being hailed by some as a new “DeepSeek moment”, with users calling it the first-ever open-weight...
Headlamp is an open-source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources directly from a browser. Cluster API (CAPI) is a Kubernetes sub-project that brings declarative, Kubernetes-style APIs to cluster lifecycle management. It lets platform teams provision, upgrade, and manage the lifecycle of Kubernetes clusters using standard Kubernetes objects stored and reconciled in a management cluster. Managing Cluster API resources has historically required raw kubectl commands and deep familiarity with ownership hierarchies. The Headlamp Cluster API plugin brings visual clarity, faster debugging, and simplified operations for platform teams, directly inside Headlamp. What this plugin provides The Cluster API plugin adds a dedicated Cluster API section to Headlamp and brings full visibility into core CAPI resources through consistent list and detail views. Feature Description Cluster overview View clusters with live control plane and worker replica status. Machine visibility Inspect MachineDeployments, MachineSets, Machines, and MachinePools with status and conditions. Cluster API dashboard Get a centralized view of Cluster API resource health, active condition issues, provider information, and remediation guidance. Control plane monitoring Track KubeadmControlPlane replicas, versions, and associated Machines. Scale from the UI Scale MachineDeployments and MachineSets directly from Headlamp. Owned resource hierarchy Trace rela…
Headlamp is an open-source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources. Knative brings serverless workloads to Kubernetes, handling traffic routing, autoscaling, and revision management so teams can deploy and iterate without fighting infrastructure. But operating Knative workloads day-to-day can be difficult, there's still a lot of jumping between the kn CLI, kubectl , and the Kubernetes UI to get a full picture of what's running. We built the Headlamp Knative plugin to bridge that very gap, allowing operators to inspect, understand and act on their workloads all from a single place. This plugin was built as part of the LFX mentorship. Here's a tour of what we shipped. Here is a short walkthrough of the Knative plugin for Headlamp: Integrating Knative resources with Headlamp's map view Headlamp's resource mapping works for Knative CRDs too. You can see how KServices, Revisions, and DomainMappings relate to each other in a single graph view. KService management: edit traffic splits, restart pods, and view logs A KService is the top-level resource in Knative: it manages the lifecycle of Routes, Configurations, Revisions, and everything needed to run and expose your application. The plugin gives KServices a full detail view with an Edit Mode toggle for making live changes to traffic splits, autoscaling annotations, and more. Common actions like viewing the YAML, opening logs, triggering a redeploy, or restarting backin…
In this post, we show you how to build Chaplin (Customer Health and Planned Lifecycle Intelligence Nexus), an open source solution that uses AI agents exposed through the Model Context Protocol (MCP) to provide self-service health event analytics.
Let’s walk through the NVIDIA cuFOLIO Developer Example. This open source, customizable notebook enables GPU accelerated portfolio optimization by constructing an optimal portfolio from the S&P 500 universe and then backtesting against customizable parameters and portfolios. ➡️ Start now: https://build.nvidia.com/nvidia/quantitative-portfolio-optimization 📥 Code: https://github.com/NVIDIA-AI-Blueprints/quantitative-portfolio-optimization/ 📝 Tech blog: https://developer.nvidia.com/blog/accelerate-large-linear-programming-problems-with-nvidia-cuopt 00:00 Interactive Backtesting Intro 00:11 Quantitative Portfolio Optimization 00:26 Deploy on Cloud (Brev) 00:57 Launchable Setup 01:50 Github 01:57 Run Notebook 02:42 2. CVaR Formulation 03:00 3. Data and Model Setup 04:26 4. Solve CVaR Optimization 07:15 5. Backtest Portfolio 08:09 6. GPU v CPU 09:40 7. Appendix 10:05 Outro #quantfinance #portfoliooptimization #algorithmictrading
TL;DR The TokenSpeed-kernel is a standalone, open-source subsystem designed to solve backend complexity in LLM inference. It introduces a clean, layered API and registry system that decouples the high-level runtime...
Take a practical look at multimodal, any-to-any systems for vision-language reasoning, speech interaction, document intelligence, real-time assistants, local deployment.
Talos was built to help resolve a major bottleneck in genomic medicine: human review time. The open-source system recovered 90% of in-scope diagnoses while surfacing just 1.3 candidate variants per patient for expert review. The post Talos: Scaling rare disease diagnosis with automated, iterative genomic reanalysis appeared first on Microsoft Research .
Unless you’ve been living under an old woodpile in your backyard, you have certainly seen how agentic coding is rocking the software development world. Things are happening fast and furious, and keeping up is practically a full-time job. The latest area that is catching the attention of developers is how agentic coding is affecting the open source community. The open source movement has been defending the rights of folks to use, change, and contribute to software for many years. And of course, agentic coding is starting to become part of that process. On the one hand, maintainers of open source projects rightfully are frustrated as they become overwhelmed with pull requests of dubious quality and usefulness being submitted by coding agents. On the other hand, as David Heinemeier Hansson notes , maintainers are starting to get a little snooty about accepting AI-written code, viewing it as somehow not worthy of being included. Some organizations have explicitly banned AI-generated submissions . I get that they don’t want AI slop overwhelming their input queues. But I think it is a huge mistake to ban AI-written code outright. Whose code? Before I dig deeper into that notion, it’s important to look at another issue that arises from all of this: Who actually owns the code that AI writes? Copyright requires that a human produce the thing being copyrighted. If you prompt Claude Code with “Write me a CMS system” and then Claude writes you a CMS system that you check into a public G…
NVIDIA has released the full Nemotron 3 open model family — Ultra, Super, Nano, and Nano Omni. This office hours session covers each model in the series, and any questions you have about Nemotron 3 in general — what it's built for, when to use it, and what's available in open weights, training datasets, and fine-tuning recipes. What we'll cover: - Nemotron 3 Ultra — 550B MoE frontier reasoning model for long-running autonomous agents: 5x faster inference, up to 30% lower cost, hybrid Mamba-Transformer architecture, and MOPD training for consistent performance across agent harnesses - Nemotron 3 Super — mid-range 120B model targeting enterprise applications that need strong reasoning for multi-agent applications - Nemotron 3 Nano — 30B MoE with 3B active parameters, built for high-volume execution, highly accurate sub-agent accomplishing targeted tasks - Nemotron 3 Nano Omni — multimodal (text, image, audio, video) model purpose-built for targeted specialized agentic tasks - Open weights, training datasets, and fine-tuning recipes — what's available across the family and how to customize for your domain Building with or evaluating the Nemotron 3 family? Bring your questions — whether you're choosing between models, fine-tuning for your domain, or deploying at scale, the team will answer them live.
Upbound Inc. today released Modelplane, a new open-source tool for managing artificial intelligence inference clusters. San Francisco-based Upbound is backed by $69 million from Alphabet Inc.’s GV fund, Intel Capital and others. It’s best known as the creator of Crossplane, an open-source infrastructure management engine. It’s an upgraded version of the Kubernetes control plane, a […] The post Upbound open-sources Modelplane to optimize inference clusters appeared first on SiliconANGLE .
We’re calling for targeted amendments to resolve conflicts with open source licensing and align with international transparency frameworks while preserving regulatory intent. The post GitHub joins coalition advocating for fixes to California AI Transparency Act to protect open source appeared first on The GitHub Blog .
OpenAI has launched a program with cybersecurity firm Trail of Bits to use AI to find and fix vulnerabilities in widely used open-source software, as enterprises face growing risks from flaws buried deep in their software supply chains. The initiative, called Patch the Planet , uses AI-assisted vulnerability research alongside human review to help turn security findings into tested fixes that can be disclosed through existing project channels. Initial participants include Python, Go, cURL, Sigstore, NATS Server, aiohttp, freenginx, pyca/cryptography, and python.org. These projects support software development, networking, cryptography, and supply chain infrastructure used across a wide range of enterprise applications and services. OpenAI said each engagement will begin with consultation with maintainers to identify where security support is most needed. Researchers will then investigate potential vulnerabilities, validate meaningful issues, develop or refine patches, support testing, and coordinate disclosure through the project’s existing channels. Participating security researchers will use the company’s models and Codex Security to analyze code and help move fixes toward release. Trail of Bits engineers will review findings before they are sent to maintainers, a step meant to filter out false positives and duplicate reports before they add to the workload of open-source projects. The company is also working with HackerOne and Calif to support vulnerability triage, coordi…
Learn about the progress we’ve made toward our accessibility goals and how you can help make open source more inclusive. The post From pledge to practice: Building a more inclusive open source ecosystem appeared first on The GitHub Blog .
OpenAI introduces Patch the Planet, a Daybreak initiative helping open-source maintainers find, validate, and fix vulnerabilities with AI and expert review.
A new form of vendor lock-in is here. And it’s not proprietary languages or rigid enterprise software suites — it’s something more fundamental. It’s the very thing that writes the code. JetBrains Research found that 74% of developers worldwide use AI tools. Claude Code , available only since May 2025, is now the most popular AI coding tool, followed by Gemini Code Assist and GitHub Copilot , according to Jellyfish’s 2026 State of Engineering Management Report . The latter study also found that 91% of developers say their productivity has increased in the past 12 months. As coding output expectations are rewritten daily , the engineering world is becoming heavily reliant on paid external AI services. Gartner predicts that by 2028 spending on AI coding tokens could exceed developer salaries. Yet, tokenmaxxing while vibe coding through a vendor’s cloud-based API feels like a far cry from the open foundations of free programming languages and open models, which many of today’s AI platforms now abstract. “Open infrastructure will be the backbone of the AI era,” says Peter Farkas , CEO of Percona , a provider of open-source database solutions. “Right now, too many companies are building their entire AI strategy on top of proprietary platforms because the convenience is seductive.” “It’s ‘three clicks’ to stand up a database or an AI service in a hyperscaler, and that convenience blinds people to the lock-in they’re signing up for,” he adds. “As AI workloads mature, organizations w…
This post was originally an op-ed co-authored with Kevin Xu of Interconnected for a general, non-technical audience.
Short note on GLM-5.2, an open-weight GLM update that keeps the GLM-5 sparse MoE backbone and adds IndexShare for cheaper 1M-token DSA inference.
AI is transforming how we build, deploy, and operate technology. Open source is making it possible. KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China will take place September...
Chinese AI lab Z.ai released GLM-5.2 to their coding plan subscribers on June 13th, and then yesterday (June 16th) released the full open weights under an MIT license. Similar in size to their previous GLM-5 and GLM-5.1 releases this is a 753B parameter, 1.51TB monster - with 40 active parameters (Mixture of Experts). GLM-5.2 is a text input only model - Z.ai have a separate vision family most recently represented by GLM-5V-Turbo , but that one isn't open weights. GLM-5.2 has a 1 million token context window, up from GLM-5.1's 200,000. The buzz around this model is strong. Artificial Analysis, who run one of the most widely respected independent benchmarks: GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index . GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43) They did however find it to be quite token-hungry: GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k) The model is also now ranked 2nd on the Code Arena WebDev leaderboard , behind only Claude Fable 5. That leaderboard measures "front-end web development tasks, including agentic coding workflows". I'm impressed to see it rank so highly given the lack of image input, which I had incorrectly assum…
OpenAI and Anthropic are going public while still capturing much of the money spent on foundation-model usage. But deployment patterns are starting to tell a more complicated story. Companies are building hybrid model portfolios, using proprietary models where convenience, support, and frontier capability matter, while turning to open-weights models where cost, privacy, customization, and deployment Continue reading "The Hybrid AI Stack Is Coming for the Pricing Power of OpenAI and Anthropic" The post The Hybrid AI Stack Is Coming for the Pricing Power of OpenAI and Anthropic appeared first on Gradient Flow .
Nemotron 3 Ultra is NVIDIA's latest frontier-intelligence open model — 5x faster inference, up to 30% lower cost, and fully open: weights, training datasets, and fine-tuning recipes included. In this livestream, we're joined by Nathan Lambert, ML researcher and open model advocate, to dig into what Ultra means for developers building on open models today. We'll cover what sets Ultra apart technically — the hybrid Mamba-Transformer backbone, Multi-Teacher On-Policy Distillation (MOPD), and how it fits into a system-of-models pattern. Nathan brings a researcher's perspective on post-training for agentic systems, and we'll get into where the open frontier model landscape is heading and what it takes to build models worth building on. What you'll learn: - How Ultra's post-training approach compares to what the open model ecosystem has seen at scale - What the hybrid Mamba-Transformer architecture means for long-context, multi-turn agent workflows - How open weights, datasets, and recipes enable domain-specific fine-tuning from day one - Where open frontier models are heading for agentic applications — and what tradeoffs matter most Have questions about Ultra, post-training, or the open model landscape? Drop them live — Nathan and the team will answer them in real time.
More of the iOS app loop, now inside Codex. The Build iOS Apps plugin lets Codex view and test your iOS app in the in-app browser, open SwiftUI previews, and hot reload edits without leaving Codex. Shoutout to the open source projects behind this: • Serve-sim powers the streaming simulator by @Baconbrix https://github.com/EvanBacon/serve-sim • SnapshotPreviews extracts SwiftUI previews by Sentry https://github.com/getsentry/SnapshotPreviews
This opening session builds the foundation for running popular OSS models such as Gemma, Qwen directly on Jetson — no cloud required. We cover when to use Ollama for rapid local prototyping versus vLLM for higher-throughput serving, show how the same workflow applies to both power different OSS models, and walk through the real decisions behind model choice, containers, quantization, and performance tuning on edge hardware. We close with a teaser of OpenClaw and a bonus take-home challenge to kick off community building. You will learn how to deploy open-source AI models on NVIDIA Jetson — no cloud required, from first launch to production-ready serving. We'll cover: Getting models running on NVIDIA Jetson — spin up popular OSS models (open-source large language models (LLMs) like Gemma and Qwen (LLMs and VLMs) using Ollama or vLLM on Jetson hardware and verify they're working end-to-end. Choosing the right inference engine — understand the practical tradeoffs between Ollama for rapid local prototyping, vLLM for higher-throughput serving, and llama.cpp, so you can pick the right tool for your use case. NVIDIA Jetson-specific serving strategies — walk through the real decisions behind model choice, containers, and performance tuning tailored for Orin and Thor, including what works, what doesn't, and why. Performance fundamentals — get introduced to quantization and speculative decoding: what they are, how they work, and when to reach for them on edge hardware. Real-world appl…
Short note on North Mini Code, Cohere's 30B total and 3B active open-weight MoE model for agentic coding tasks.
Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific workflows. The KG provides a unifying layer that organizes datasets, tools, and workflows, while AI agents—powered by generative AI services—enable natural language interaction, automated data access, and streamlined analysis. Together, these components drastically lower the technical threshold for engaging in climate data science, enabling non-specialist users to identify and analyze relevant datasets. By leveraging existing cloud-ready API data portals, we demonstrate that 'a knowledge graph is all you need' to unlock scalable and agentic workflows for scientific inquiry. The open-source design of our system further supports community contributions, ensuring that the KG and associated tools can evolve as a shared commons. Our results illustrate a pathway toward democratizing access to climate data and establishing a reproducible, extensible framework for human–AI collaboration in scientific research.
I'm building a web app with FastAPI + async/await Python backend. Users upload leaf photos via API and the server should return: 1) plant species, 2) disease label or "healthy". Constraints: Generalization: Must handle multiple crops. Users can upload "any" plant leaf, not just tomato/corn. Target 15+ species. Server inference: Runs on GPU server, not mobile. Latency 1-2s is acceptable, so model size isn't a bottleneck. Pre-trained + 100% free: Need open-source weights for transfer learning. No paid APIs. License must allow commercial use. Dataset: Starting with PlantVillage dataset + ~2,000 custom field images. Lab images vs real field images is a domain shift issue. Tech stack: PyTorch + timm library. Inference runs in async endpoints, so I use run_in_executor to avoid blocking. What I tried: Fine-tuned ResNet50 on PlantVillage. 95% accuracy on lab images, but it drops to ~62% on field images. Overfitting to clean backgrounds. Questions: For multi-crop + multi-disease, is a 2-stage approach better: Model A for species ID, Model B for disease per species? Or one multi-label model? Between ConvNeXt-Base, Swin-Base, and ViT-Base, which fine-tunes best on PlantVillage + field data for accuracy in 2025? Are there plant-specific foundation models/checkpoints better than ImageNet pre-training for this domain? I'm looking for architecture + dataset + fine-tuning strategy advice, not code.
Open-source Python library for fast simulation of fermionic quantum circuits enables efficient prototyping and benchmarking for real quantum hardware.
I am considering to buy GPUs for my project of open source text-to-video models like ltx-2-19b (lightricks) or wan-v2.2-a14b. I read online that the same configuration/quantization and seed will give similar results in quality, only difference is in speed/latency of generation. Is this true? Or will there be a difference ?
- Why traditional vulnerability disclosure fails for open-weight models—and how we are building a new standard for AI evaluation. The post The patch model is breaking. AI evaluation needs a new way to disclose what it finds. appeared first on MLCommons .
Introducing stable-worldmodel, an open-source platform for reproducible world model research, evaluation, and benchmarking under visual and physical distribution shifts.
VoidZero, the team behind Vite, Vitest, Rolldown, Oxc, and Vite+, is joining Cloudflare. Vite stays open source, vendor-agnostic, and built for everyone.
Learn more: https://bit.ly/3RtV5Lk Introducing Fast & Efficient LLM Inference with vLLM, a short course built in partnership with Red Hat and taught by Cedric Clyburn, Senior Developer Advocate at Red Hat. Serving open-source LLMs efficiently, for many users at low latency and reasonable cost, comes down mostly to memory management. Two things compete for that memory: the model weights and the KV cache. A 70-billion-parameter model takes around 140 GB of memory just for the weights, while the KV cache grows with every request you serve. In this course, you'll learn to shrink the weights through quantization, and serve the model with vLLM, the widely adopted open-source serving system, taking advantage of the memory management techniques it provides like PagedAttention and prefix caching. You'll run the full optimize-deploy-benchmark workflow on a real model: compressing an open-source Qwen model with LLM Compressor, serving it with vLLM, and benchmarking your deployment under realistic traffic using GuideLLM and lm-eval. By the end, you'll have run the full optimize-deploy-benchmark workflow on a real model and built the intuition to navigate the tradeoffs between accuracy, speed, and cost. Enroll now: https://bit.ly/3RtV5Lk
China’s AI competition strategy: Wide dispersion, cheap tokens Linda_Heyer Wed, 06/03/2026 - 10:01 picture alliance / Bildagentur-online | Tetra Images-Erik Isakson Comment Jun 03, 2026 2 min read China’s AI competition strategy: Wide dispersion, cheap tokens China’s flagship AI company DeepSeek released its V4 model in April, with a promotional price that puts it at a mere fraction of the cost of its North American competitors’ models. This reflects a wider trend in China’s AI sector: Instead of competing directly with companies like OpenAI, Anthropic and Google, who offer state of the art services at a premium, Chinese companies are pursuing a strategy of wide diffusion and cheap tokens to gain market share across the world. For Europe, this may pose the risk of forming a quick dependency on Chinese models as the basis for AI development, plus European talent being funneled to enhance Chinese systems. Many Chinese AI companies have followed the DeepSeek model. They are building models that are decent, but not cutting-edge, in performance and instead are focused on high compute efficiency that lowers costs for users. They have also made their models available via open-source platforms, meaning anyone can use, fine-tune and host them for free, as opposed to proprietary models like current Western leaders. Downloads of Chinese models on open-source platform Hugging Face have surpassed US models since late 2025. Of the top ten open-weight models by performance, the top seven a…
Trained from scratch and designed for practical deployment, Mellum2 is built for routing, Q&A, sub-agents, and private AI use in software engineering systems. Today, we’re open-sourcing Mellum2, a 12B model engineered to solve the hardest parts of production AI: latency, throughput, and cost. Built from scratch and released under the Apache 2.0 license, Mellum2 offers […]
What we've seen helping teams run Reinforcement Learning at scale on Modal. Plus an open-source library to skip the scaffolding.
Last week at the KotlinConf 2026 keynote (watch the recording here), we announced Koog 1.0. Koog is JetBrains’ open-source framework for building AI agents in Kotlin and Java. It provides the core building blocks for agentic applications: tools, workflows, persistence, memory, observability, and integrations with existing JVM and Kotlin Multiplatform projects. We introduced Koog at […]
The Kubernetes project relies on transparency to empower cluster administrators and security researchers. One important way we do that is by publishing CVE records into the Common Vulnerabilities and Exposures database. As part of our ongoing effort to mature the official Kubernetes CVE Feed , we have identified some discrepancies. CVE records for a few older, unfixed issues incorrectly include a fixed version field. The Kubernetes Security Response Committee (SRC) will correct the affected CVE records on June 1, 2026. This may result in vulnerability scanners identifying these vulnerabilities in places where they were previously not detected. To help reduce confusion, this post provides a technical update on three vulnerabilities that were disclosed in previous years but remain unfixed: CVE-2020-8561 , CVE-2020-8562 , and CVE-2021-25740 . Why we are updating these records now While these vulnerabilities have been public for several years, the recent work to generate official Open Source Vulnerabilities (OSV) files revealed that their corresponding CVE records did not accurately reflect their status. Specifically, some records suggested a fixed version existed, when in reality, these issues are architectural design trade-offs that cannot be fully remediated through code without breaking fundamental Kubernetes functionality. Correcting these records is vital for the community for: Automation Fidelity : Modern vulnerability scanners depend on precise version ranges. Inaccurate…
Gemini Flash 3.5, Mythos, open-closed balance, America's open-source surge, emerging power struggles and more.
We present HELM Arabic Enterprise, a leaderboard for transparent, reproducible evaluation of large language models on Arabic-language benchmarks designed around enterprise use cases. The leaderboard was developed in collaboration with Arabic.AI and builds on the HELM evaluation methodology: standardized prompting, fully logged requests and responses, and reproducible scoring through the open-source HELM framework.
Open Source AI is entering a new era, one shaped by self-improving AI Agents, recursive learning systems, and rapidly evolving AI Tools that blur the line between software and autonomous collaborators. In this episode, Daniel and Chris sit down with Nous Research co-founder and CTO Jeffrey Quesnelle to explore Hermes Agent. Along the way, they discuss models vs. harnesses, the changing role of developers, and one of the biggest questions facing the AI Future: what remains uniquely human as AI capabilities continue to accelerate? Featuring: Jeffrey Quesnelle – Website , LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: Nous Research Hermes Agent Sponsors: Framer: The enterprise-grade website builder that lets your team ship faster. Get 30% off at framer.com/practicalai Prediction Guard: A self-hosted AI control plane for running agents in high impact environments. predictionguard.com/practicalai Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs
Short note linking a talk on implementing LLM architectures from scratch and comparing new open-weight model implementations against references.
mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS primitives), bounded space overhead, low internal fragmentation, and minimal contention by relying almost exclusively on atomic operations. The post mimalloc: A new, high-performance, scalable memory allocator for the modern era appeared first on Microsoft Research .
Data from 1,281 agent runs across 40+ large open source repos reveals five repeatable failure patterns in coding agents, and the infrastructure fixes for each.
In this fully connected episode, Dan and Chris break down one of the biggest questions in AI today: do open vs. closed models still matter? From the rise of physical AI and edge devices to the shifting landscape of open-source models like LLaMA, they explore whether the “model wars” are becoming irrelevant. The conversation then dives into a bigger transformation, the rise of agentic systems, workflows, and AI-driven infrastructure. Featuring: Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Upcoming Events: Register for upcoming webinars here ! Midwest AI Summit 2026
A learning-oriented workflow for understanding new open-weight model releases
Autonomous driving is not just a big tech or closed-source game, it's becoming accessible through open innovation and real-world deployment. Dan and Chris sit down with Harald Schäfer, CTO at Comma AI, to explore how OpenPilot is bringing self-driving to everyday vehicles using open source AI. We dive into the intersection of machine learning, robotics, and simulation, including how world models are enabling training at scale and shaping the future of autonomy. Featuring: Harald Schäfer – LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Links: Comma Upcoming Events: Register for upcoming webinars here !
Another dance around fears of open-source.
In this fully connected episode, Dan and Chris break down the Anthropic Claude Code leak, what went wrong and what it reveals about agentic systems, AI architecture, and AI safety. They also explore how the open source community is responding and why this moment could reshape how AI systems are built and secured. Featuring: Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Daniel Whitenack – Website , GitHub , X Upcoming Events: Register for upcoming webinars here !
AI is rapidly transforming how software is built, shifting economic incentives from open source code and collaboration toward on-demand, personalized development through agentic coding a.k.a. vibe coding. In this episode, Chris speaks with Miklós Koren of Central European University about how AI is reshaping open source and the software industry. They explore the economics of incentives, evolving collaboration patterns, and what this shift means for software development, the future of AI, and its broader impact on the technology sector. Featuring: Miklós Koren – LinkedIn Chris Benson – Website , LinkedIn , Bluesky , GitHub , X Links: Vibe Coding Kills Open Source The Directions of Technical Change The Tailwind story Upcoming Events: Register for upcoming webinars here !
We are excited to announce our transition to a community-driven open source project. While making this change, we reaffirm our deep commitment to remaining active members of the community.
MLPerf Inference v6.0 expands open-weight LLM coverage with a new GPT-OSS 120B benchmark and a latency-constrained interactive scenario for DeepSeek-R1 — the first MLPerf standard for speculative decoding. The post A new GPT-OSS benchmark and DeepSeek R1 updates for latency-optimized reasoning appeared first on MLCommons .
Voxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.
Summary: We find that roughly half of test-passing SWE-bench Verified PRs written by mid-2024 to mid/late-2025 agents would not be merged into main by repo maintainers, even after adjusting for noise in maintainer merge decisions. Since the agents are not given a chance to iterate on their solution in response to feedback the way a human developer would, we do not claim that this represents a fundamental capability limitation. Rather, our results indicate that a naive interpretation of benchmark scores may lead one to overestimate how useful agents are without more elicitation or human feedback. Introduction It is often unclear how to translate benchmark scores into real-world usefulness. For example, if a model’s SWE-bench Verified score is 60%, does that mean it can resolve 60% of real-world open-source issues? One reason to doubt this is that benchmarks are clean and verifiable in ways the real world is not. To study this quantitatively, we take SWE-bench Verified and zoom in on one such difference — it uses an automated grader rather than the real-world standard of maintainer review. To study how agent success on benchmark tasks relates to real-world usefulness, we had 4 active maintainers from 3 SWE-bench Verified repositories review 296 AI-generated pull requests (PRs). We had maintainers (hypothetically) accept or request changes for patches as well as provide the core reason they were requesting changes: core functionality failure, patch breaks other code or code qua…
First results in a project developing next-generation open-source language models to advance European AI capabilities.
A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026
METR previously published a paper which found the use of AI tools caused a 20% slowdown in completing tasks among experienced open-source developers, using data from February to June 2025. To understand how AI is impacting developer productivity over time, we started a new experiment in August 2025 with a larger pool of developers using the latest AI tools. Unfortunately, given participant feedback and surveys, we believe that the data from our new experiment gives us an unreliable signal of the current productivity effect of AI tools. The primary reason is that we have observed a significant increase in developers choosing not to participate in the study because they do not wish to work without AI, which likely biases downwards our estimate of AI-assisted speedup. We additionally believe there have been selection effects due to a lower pay rate (we reduced the pay from $150/hr to $50/hr), and that our measurements of time-spent on each task are unreliable for the fraction of developers who use multiple AI agents concurrently. Based on conversations with study participants, we believe it is likely that developers are more sped up from AI tools now — in early 2026 — compared to our estimates from early 2025. However, because of the selection effects in our experiment, our data is only very weak evidence for the size of this increase. Our raw results show some evidence for speedup. Our early 2025 study found the use of AI causes tasks to take 19% longer, with a confidence inte…
This guest blog post is from Arek Borucki, Machine Learning Platform & Data Engineer for Hugging Face - a collaboration platform for the machine learning community. The Hugging Face Hub works as a central place where anyone can share, explore, discover, and experiment with open-source ML. HF empowers the next generation of machine learning engineers, scientists, and end users to learn, collaborate and share their work to build an open and ethical AI future together. With the fast-growing community, some of the most used open-source ML libraries and tools, and a talented science team exploring the edge of tech, Hugging Face is at the heart of the AI revolution. Traditional movie search relies on filtering by genre, actor, or title. But what if you could search by how you feel? Imagine typing: "something uplifting after a rough day at work" "a movie that will make me cry" "I need adrenaline, can't sleep anyway" "something to watch with grandma who hates violence" This is mood-based semantic search: matching your emotional state to movie plot descriptions using AI embeddings. In this tutorial, you will build a mood-based movie recommendation engine using three powerful technologies: voyage-4-nano (a state-of-the-art open-source embedding model), Hugging Face (for model and dataset hosting), and MongoDB Atlas Vector Search (for storing and querying embeddings at scale). Why mood-based search? Genre tags are coarse. A "drama" can be heartwarming or devastating. A "comedy" can be…
Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that’s the fastest-growing project in GitHub history. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep491-sc See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript: https://lexfridman.com/peter-steinberger-transcript CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: Peter’s X: https://x.com/steipete Peter’s GitHub: https://github.com/steipete Peter’s Website: https://steipete.com Peter’s LinkedIn: https://www.linkedin.com/in/steipete OpenClaw Website: https://openclaw.ai OpenClaw GitHub: https://github.com/openclaw/openclaw OpenClaw Discord: https://discord.gg/openclaw
Google adds Gemini AI-powered ‘auto browse’ to Chrome, Users flock to open source Moltbot for always-on AI, Qwen3-Max-Thinking debuts, and more!
China’s Moonshot releases a new open source model Kimi K2.5 and a coding agent, Google Brings Genie 3’s Interactive World-Building Prototype to AI Ultra Subscribers, and more!
Today at MongoDB.local San Francisco, we announced capabilities that collapse the distance between AI prototype and production. Building AI applications means solving real problems: keeping conversational context clean and queryable, retrieving the right information from thousands of past interactions, connecting AI agents to your data without custom plumbing. These aren't theoretical challenges, they're the friction points that slow teams down every day. The AI era demands more from your data platform. MongoDB gives you everything you need to build quickly. Voyage AI: the best gets better Embedding models can make or break AI search experiences. We're proud that voyage-3-large has been the world's top-performing embedding model on Hugging Face's RTEB benchmark since its inception. But we didn’t rest on our laurels. There’s a new model at the top of the charts. Today, we're pleased to announce that the Voyage 4 model family is now generally available. The best just got better. The voyage-4 series models operate in a shared embedding space, allowing for cross-model compatibility and unprecedented flexibility to optimize for accuracy, speed, or cost. This release also includes voyage-4-nano, our first open-weight model available on HuggingFace, perfect for local development. Additionally, we're launching the new voyage-multimodal-3.5 model, which has been specifically trained to support video content alongside text and images. For developers building multimodal AI applications…
Understanding How DeepSeek's Flagship Open-Weight Models Evolved
A Detailed Look at One of the Leading Open-Source LLMs
Announcing Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
DeepSeek AI releases DeepSeek-Prover-V2, an open-source LLM for Lean 4 theorem proving. It uses recursive proof search with DeepSeek-V3 for training data and reinforcement learning, achieving top results on MiniF2F. The post DeepSeek Unveils DeepSeek-Prover-V2: Advancing Neural Theorem Proving with Recursive Proof Search and a New Benchmark first appeared on Synced .
Zhipu.AI open-sources faster GLM models (8x speedup), launches Z.ai, aiming for global expansion, potentially ahead of IPO. The post Zhipu.AI’s Open-Source Power Play: Blazing-Fast GLM Models & Global Expansion Ahead of Potential IPO first appeared on Synced .
Europe's leading AI companies and research institutions combine their expertise to develop next-generation open-source language models to advance European AI capabilities, the OpenEuroLLM project.
Open source LLMs are becoming very powerful, but pay attention to how you (or your provider) are serving the model. It can affect code editing skill.
The Holistic Evaluation of Language Models (HELM) framework is an open source framework for reproducible and transparent benchmarking of language models that is widely adopted by academia and industry. To meet HELM users’ needs for more powerful benchmarking features, we are proud to announce our collaboration with Unitxt, an open-source community platform developed by IBM Research for data preprocessing and benchmark customization. The integration of Unitxt into HELM gives HELM users access to the vast Unitxt catalog of benchmarks, and allows users to run sharable and customizable evaluation pipelines with greater ease.
Building and evaluating an open-source pipeline for auto-interpretability
[ Hacker News discussion , LinkedIn discussion , Twitter thread ] Update (Feb 2026) : The full list of open source AI repos is hosted at Good AI List , updated daily. It’s balooned to 15K repos, and you can submit missing repos. You can also find some of them on my cool-llm-repos list on GitHub. Four years ago, I did an analysis of the open source ML ecosystem . Since then, the landscape has changed, so I revisited the topic. This time, I focused exclusively on the stack around foundation models. Data I searched GitHub using the keywords gpt , llm , and generative ai . If AI feels so overwhelming right now, it’s because it is. There are 118K results for gpt alone. To make my life easier, I limited my search to the repos with at least 500 stars. There were 590 results for llm , 531 for gpt , and 38 for generative ai . I also occasionally checked GitHub trending and social media for new repos. After MANY hours, I found 896 repos. Of these, 51 are tutorials (e.g. dair-ai/Prompt-Engineering-Guide ) and aggregated lists (e.g. f/awesome-chatgpt-prompts ). While these tutorials and lists are helpful, I’m more interested in software. I still include them in the final list, but the analysis is done with the 845 software repositories. It was a painful but rewarding process. It gave me a much better understanding of what people are working on, how incredibly collaborative the open source community is, and just how much China’s open source ecosystem diverges from the Western one. The Ne…
I have the following time-series data with two value columns. (t: time, v1: time-series values 1, v2: time-series values 2) t | v1 | v2 ---+----+---- 1 | 1 | 0 2 | 2 | 2 3 | 3 | 4 4 | 3 | 6 5 | 3 | 6 6 | 4 | 6 7 | 5 | 8 (7 rows) I am trying to discover (or approximate) the correlation between the $v1$ and $v2$ , and use that approximation for the next step predictions. Please note, the most obvious correlation is $v2(t)=2.v1(t-1)$ . My question is, what are the algorithms to employ for such approximations and are there any open source implementations of those algorithms for SQL/python/javascript?
Introduction In 2019, Stanford entered the Alexa Prize Socialbot Grand Challenge 3 for the first time, with its bot Chirpy Cardinal , which went on to win 2nd place in the competition. In our previous post , we discussed the technical structure of our socialbot and how developers can use our open-source code to develop their own. In this post we share further research conducted while developing Chirpy Cardinal to discover common pain points that users encounter when interacting with socialbots, and strategies for addressing them. The Alexa Prize is a unique research setting, as it allows researchers to study how users interact with a bot when doing so solely for their own motivations. During the competition, US-based Alexa users can say the phrase “let’s chat” to speak in English to an anonymous and randomly-selected competing bot. They are free to end the conversation at any time. Since Alexa Prize socialbots are intended to create as natural an experience as possible, they should be capable of long, open-domain social conversations with high coverage of topics. We observed that Chirpy users were interested in many different subjects, from current events (e.g., the coronavirus) to pop culture (e.g., the movie Frozen 2 ) to personal interests (e.g,. their pets). Chirpy achieves its coverage of these diverse topics by using a modular design that combines both neural generation and scripted dialogue, as described in our previous post . We used this setting to study three quest…
I find blockchain fascinating because it extends open source software development to open source + state. This seems to be a genuine/exciting innovation in computing paradigms; We don’t just get to share code, we get to share a running computer, and anyone anywhere can use it in an open and permissionless manner. The seeds of this revolution arguably began with Bitcoin, so I became curious to drill into it in some detail to get an intuitive understanding of how it works. And in the spirit of “what I cannot create I do not understand”, what better way to do this than implement it from scratch? We are going to create, digitally sign, and broadcast a Bitcoin transaction in pure Python, from scratch, and with zero dependencies. In the process we’re going to learn quite a bit about how Bitcoin represents value. Let’s get it. (btw if the visual format of this post annoys you, see the jupyter notebook version, which has identical content). Step 1: generating a crypto identity First we want to generate a brand new cryptographic identity, which is just a private, public keypair. Bitcoin uses Elliptic Curve Cryptography instead of something more common like RSA to secure the transactions. I am not going to do a full introduction to ECC here because others have done a significantly better job, e.g. I found Andrea Corbellini’s blog post series to be an exceptional resource. Here we are just going to write the code but to understand why it works mathematically you’d need to go through th…
By visualizing the hidden state between a model's layers, we can get some clues as to the model's "thought process". Figure: Finding the words to say After a language model generates a sentence, we can visualize a view of how the model came by each word (column). Each row is a model layer. The value and color indicate the ranking of the output token at that layer. The darker the color, the higher the ranking. Layer 0 is at the top. Layer 47 is at the bottom. Model:GPT2-XL Part 2: Continuing the pursuit of making Transformer language models more transparent, this article showcases a collection of visualizations to uncover mechanics of language generation inside a pre-trained language model. These visualizations are all created using Ecco, the open-source package we're releasing In the first part of this series, Interfaces for Explaining Transformer Language Models, we showcased interactive interfaces for input saliency and neuron activations. In this article, we will focus on the hidden state as it evolves from model layer to the next. By looking at the hidden states produced by every transformer decoder block, we aim to gleam information about how a language model arrived at a specific output token. This method is explored by Voita et al.. Nostalgebraist presents compelling visual treatments showcasing the evolution of token rankings, logit scores, and softmax probabilities for the evolving hidden state through the various layers of the model.
Masterclass Pharo : l’expertise informatique open source d’Inria rayonne à l’international decarpig mer, 10/21/2020 - 14:40 Des chercheurs en informatique Inria Lille s’impliquent dans la stratégie de diffusion des logiciels libres à destination des entreprises du numérique. Le langage de programmation Pharo a reçu l’attention d’une large communauté scientifique à l’occasion d’une masterclass animée par Stéphane Ducasse, expert internationalement reconnu en informatique, et organisée par Inria Academy et Inria au Chili. © Inria / Photo C. Morel Communiquer et échanger des informations ou des objets plus rapidement, produire et distribuer de l’énergie plus efficacement, proposer de nouveaux services ciblés sur les besoins de chacun : le numérique bouleverse de nombreux secteurs d’activité et leur offre de prometteuses perspectives d’innovation et de développement. Cachés derrière les algorithmes et les lignes de code que l’informatique et le numérique mettent en œuvre, les logiciels et les langages de programmation sont les éléments incontournables de cette transformation numérique. C’est le cas de Pharo , présenté en juillet dernier à plus de 80 participants lors d’une masterclass organisée conjointement par Inria Academy et Inria Chile (voir encadré). Animée par Stéphane Ducasse , directeur de recherche en informatique et responsable de l’équipe Rmod, elle a mis en lumière les qualités de l’outil, illustrées par des usages concrets. Pharo , un langage évolutif et polyvalent…
[…] One Simple Chart: how open source projects interact with users […]
MIT simulator lets users design wide range of functional soft robots aconner Mon, 06/10/2019 - 12:37 Article June 10 '19 Adam Conner-Simons, MIT CSAIL MIT simulator lets users design wide range of functional soft robots To get robots to do things, computer scientists often use systems called physics simulators that reflect how a robot’s actions will impact the real world. These simulators don’t work particularly well, however, when it comes to soft robots made of flexible, deformable materials. This is because the underlying physical laws of deformable objects are much more complicated, requiring a lot more computational power to simulate. But in a new paper, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a new simulator made specifically for soft robots, and have shown that it can realistically simulate an eclectic mix of robotic designs, from a crawling robot to a four-legged running robot. The simulator doesn’t just efficiently evaluate robot designs, but also provides feedback on how designs can be improved. (The system’s feedback is computed based on something called “the chain rule,” and so the team has dubbed the simulator “ChainQueen”.) The team developed a high-performance GPU implementation of the simulator that they hope to eventually make open-source. “We believe this system has the potential to dramatically accelerate the development of soft robots,” says PhD student Andrew Spielberg, one of the co-authors of the…
Khanh Nguyen, Benjamin Plaut, Tu Trinh, and Mohamad Danesh introduce a fundamental coordination problem called Learning to Yield and Request Control (YRC), where the objective is to learn a strategy that determines when to act autonomously and when to seek expert assistance. They build an open-source benchmark featuring diverse domains, propose a novel validation approach, and investigate the performance of various learning methods across diverse environments, yielding insights that can guide future research.