AI/ML News & Innovations Hub

AI/ML news, top picks, and generated innovation digests.

★ Visit ai-karthik.com
422Sources
5100News Items
8Top Picks
43Blogs
runningLast Run

Hugging Face

16 articles tagged with this keyword, sorted by most recent first.

← All Keywords
MarkTechPost 2026-06-28 07:02 UTC Score 59.0 AI-032-20260628-ai-specialis-e4ec4fcf

Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls, Auditing Data, and Training Baselines

In this tutorial, we build a stable workflow around the Fable 5 Traces dataset from Hugging Face. We avoid fragile dependencies and manually parse the merged JSONL file to keep Colab reliable. We inspect repository files, normalize tool calls, audit structure, redact secrets, and visualize key distributions. We also export safe no-CoT chat datasets and train pure-Python Naive Bayes baselines on the traces. The post Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls, Auditing Data, and Training Baselines appeared first on MarkTechPost .

MarkTechPost 2026-06-27 00:02 UTC Score 60.0 AI-032-20260627-ai-specialis-ad0ae3f2

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics

In this tutorial, we work with NVIDIA's Open-SWE-Traces dataset to study agentic software-engineering trajectories for fine-tuning. We stream the data directly from Hugging Face, so we can process it efficiently in Google Colab without downloading everything locally. We normalize multi-turn agent conversations, parse final code patches, and build an analysis DataFrame covering trajectory length, tool usage, patch size, language distribution, and resolution outcomes. We then curate a supervised fine-tuning subset using success labels, token limits, language filters, and patch availability. The post Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics appeared first on MarkTechPost .

MERICS China AI 2026-06-03 08:01 UTC Score 70.0 USR-0207-20260603-research-aca-5119adc6

China’s AI competition strategy: Wide dispersion, cheap tokens

China’s AI competition strategy: Wide dispersion, cheap tokens Linda_Heyer Wed, 06/03/2026 - 10:01 picture alliance / Bildagentur-online | Tetra Images-Erik Isakson Comment Jun 03, 2026 2 min read China’s AI competition strategy: Wide dispersion, cheap tokens China’s flagship AI company DeepSeek released its V4 model in April, with a promotional price that puts it at a mere fraction of the cost of its North American competitors’ models. This reflects a wider trend in China’s AI sector: Instead of competing directly with companies like OpenAI, Anthropic and Google, who offer state of the art services at a premium, Chinese companies are pursuing a strategy of wide diffusion and cheap tokens to gain market share across the world. For Europe, this may pose the risk of forming a quick dependency on Chinese models as the basis for AI development, plus European talent being funneled to enhance Chinese systems. Many Chinese AI companies have followed the DeepSeek model. They are building models that are decent, but not cutting-edge, in performance and instead are focused on high compute efficiency that lowers costs for users. They have also made their models available via open-source platforms, meaning anyone can use, fine-tune and host them for free, as opposed to proprietary models like current Western leaders. Downloads of Chinese models on open-source platform Hugging Face have surpassed US models since late 2025. Of the top ten open-weight models by performance, the top seven a…

Two Minute Papers 2026-05-22 00:47 UTC Score 36.0 AI-139-20260522-podcasts-and-98bdc664

DeepSeek’s New AI Is A Game Changer

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here: https://github.com/ailuntx/Thinking-with-Visual-Primitives https://huggingface.co/datasets/NodeLinker/deepseek-ai-Thinking-with-Visual-Primitives-deleted-repo/blob/main/Thinking_with_Visual_Primitives.pdf Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ Thumbnail design: https://felicia.hu #deepseek

Two Minute Papers 2026-05-13 16:07 UTC Score 47.0 AI-139-20260513-podcasts-and-156232e5

NVIDIA New AI Is An Efficiency Monster

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here: https://arxiv.org/abs/2604.24954 https://developer.nvidia.com/blog/nvidia-nemotron-3-nano-omni-powers-multimodal-agent-reasoning-in-a-single-efficient-open-model/ https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ Thumbnail design: https://felicia.hu #nvidia

MongoDB AI Blog 2026-02-17 15:30 UTC Score 61.0 USR-0070-20260217-ai-specialis-685171d4

Building a Movie Recommendation Engine with Hugging Face and Voyage AI

This guest blog post is from Arek Borucki, Machine Learning Platform & Data Engineer for Hugging Face - a collaboration platform for the machine learning community. The Hugging Face Hub works as a central place where anyone can share, explore, discover, and experiment with open-source ML. HF empowers the next generation of machine learning engineers, scientists, and end users to learn, collaborate and share their work to build an open and ethical AI future together. With the fast-growing community, some of the most used open-source ML libraries and tools, and a talented science team exploring the edge of tech, Hugging Face is at the heart of the AI revolution. Traditional movie search relies on filtering by genre, actor, or title. But what if you could search by how you feel? Imagine typing: "something uplifting after a rough day at work" "a movie that will make me cry" "I need adrenaline, can't sleep anyway" "something to watch with grandma who hates violence" This is mood-based semantic search: matching your emotional state to movie plot descriptions using AI embeddings. In this tutorial, you will build a mood-based movie recommendation engine using three powerful technologies: voyage-4-nano (a state-of-the-art open-source embedding model), Hugging Face (for model and dataset hosting), and MongoDB Atlas Vector Search (for storing and querying embeddings at scale). Why mood-based search? Genre tags are coarse. A "drama" can be heartwarming or devastating. A "comedy" can be…

MongoDB AI Blog 2026-01-15 20:15 UTC Score 82.0 USR-0070-20260115-ai-specialis-0045c0cd Top pick

MongoDB.local San Francisco 2026: Ship Production AI, Faster

Today at MongoDB.local San Francisco, we announced capabilities that collapse the distance between AI prototype and production. Building AI applications means solving real problems: keeping conversational context clean and queryable, retrieving the right information from thousands of past interactions, connecting AI agents to your data without custom plumbing. These aren't theoretical challenges, they're the friction points that slow teams down every day. The AI era demands more from your data platform. MongoDB gives you everything you need to build quickly. Voyage AI: the best gets better Embedding models can make or break AI search experiences. We're proud that voyage-3-large has been the world's top-performing embedding model on Hugging Face's RTEB benchmark since its inception. But we didn’t rest on our laurels. There’s a new model at the top of the charts. Today, we're pleased to announce that the Voyage 4 model family is now generally available. The best just got better. The voyage-4 series models operate in a shared embedding space, allowing for cross-model compatibility and unprecedented flexibility to optimize for accuracy, speed, or cost. This release also includes voyage-4-nano, our first open-weight model available on HuggingFace, perfect for local development. Additionally, we're launching the new voyage-multimodal-3.5 model, which has been specifically trained to support video content alongside text and images. For developers building multimodal AI applications…