AI companions in games have long been constrained by fixed dialogue. PUBG Ally is a different kind of system. Built by KRAFTON for PUBG: BATTLEGROUNDS, this AI teammate is powered by NVIDIA ACE and its suite of efficient models and tooling. 

PUBG Ally uses automatic speech recognition, a 2B-parameter small language model, and text-to-speech to understand player voice, reason through game context and dynamic events, and respond in real time. KRAFTON is calling the result a co-playable character (CPC): a new category distinct from non-playable characters (NPCs), designed to cooperate, adapt, and remember across sessions. 

PUBG Ally entered public beta on June 17 and will be available in PUBG: BATTLEGROUNDS Arcade Mode for gamers until June 30th. 

We sat down with with Hyunseung Kim, research lead, and Yujeong Son, project manager, at KRAFTON to go deep on the architecture, the latency challenges, the multilingual support, and what they learned building and tuning an AI system whose outputs aren’t deterministic.

VIDEO

Video 1: PUBG Ally in action, as shown at the Seoul GeForce Gamer Festival

KRAFTON has framed PUBG Ally as a co-playable character, distinct from a traditional NPC. What does that distinction mean in practice, and why was PUBG: BATTLEGROUNDS the right title to introduce it?

Powered by NVIDIA ACE, Ally understands player intent through natural voice interaction, interprets the current gameplay situation, and responds dynamically in real time. Ally can cooperate with players, adapt its behavior based on combat or looting situations, and communicate naturally during gameplay. 

The goal was to create an AI companion that feels closer to a real squadmate than a conventional bot.

PUBG: BATTLEGROUNDS was the ideal title to introduce this concept because PUBG’s gameplay is heavily driven by teamwork, communication, and unpredictable emergent situations. In a battle royale environment, players constantly make tactical decisions under pressure;  sharing items, coordinating movement, reviving teammates, or reacting to sudden combat encounters. These kinds of dynamic interactions create a strong environment for a CPC to explore meaningful cooperation and contextual understanding in ways that traditional NPC systems can’t. 

VIDEO

Video 2: Watch the full replay of our GDC 2026 session, where Krafton Deep Learning engineers and NVIDIA experts deep dive into PUBG: Ally, a real-time AI companion that talks, jokes, reacts to player commands and brings extra fun and personality to every match

Walk us through the technical architecture of PUBG Ally. The interactive pipeline combines automatic speech recognition, a small language model, and text-to-speech through NVIDIA ACE. How is that pipeline structured, and how do those components work together in a live match?

At a high level, the pipeline turns two inputs—the player’s voice and the live game state —into two outputs: Ally’s speech and its in-game actions. NVIDIA ACE and custom AI models run on-device: NVIDIA automatic speech recognition (ASR), NVIDIA small language model (SLM), and a custom in-house text-to-speech (TTS) model. Around them sit our own agent harness and the game-side integration.

When the player speaks, ASR transcribes the utterance. In parallel, the game engine exposes the live match state, which the agent reads through observation tools as plain-text descriptions. So on each turn the SLM works from a transcribed request together with a textual view of what is happening in the match.

The SLM is the decision core. It runs an event-driven loop, triggered either by the player speaking or by a game event, observes what it needs, and produces two kinds of output. Speech is passed to TTS and played back as Ally’s voice; game actions are handed to the game side, where a behavior tree executes them and handles fast, reactive gameplay that shouldn’t wait on language reasoning. All of this runs locally and continuously through the match, so the player can talk to Ally naturally while Ally also speaks and acts on its own when the situation calls for it.

Why a small language model rather than a larger LLM? What did running the Mistral-NeMo-Minitron-2B on-device give you in terms of latency, hardware reach, and player experience?

For PUBG Ally, our priority was to create an AI teammate that could respond at the speed of gameplay. In a real-time games, even a small delay can change how natural or useful an interaction feels. When we tested cloud-hosted LLM approaches, the combination of network latency and model inference latency often made responses feel too slow for live squad communication.

Running a NVIDIA ACE Mistral-NeMo-Minitron-2B SLM locally removed the network round-trip and gave us far more predictable response times during gameplay. This mattered in user testing: players consistently valued the SLM’s responsiveness and sense of presence. PUBG Ally felt immediately available in the moment, which in this context matters more than broad general-purpose reasoning alone.

On-device deployment also came with a hard constraint. PUBG is a graphically rich title that already consumes a significant share of GPU memory, so the VRAM available to an AI companion is limited. At 2B parameters, Mistral-NeMo-Minitron-2B was already a compact starting point with strong instruction-following quality, and we further quantized it for client-side deployment. The quantized model fits within the VRAM headroom that remains after PUBG, allowing PUBG Ally to run on GPUs with as little as 8GB of total VRAM.  

Latency is everything in a battle royale. What specific techniques did your team apply to keep the round trip from voice input to spoken response fast enough for real-time combat, and where did you spend the most engineering effort?

Latency was one of the biggest design constraints for PUBG Ally: both conversation and in-game actions have to happen in real time. Since the experience runs on-device, we optimized the small language model’s inference cost and the separation between fast actions and language reasoning.

On the model side, we designed the prompt structure to make the best use of the KV cache. Stable instructions and gameplay context are kept as consistent as possible across turns, while only the most relevant real-time information is updated each turn. This reduces redundant computation and makes on-device response times more predictable.

On the architecture side, we didn’t want every in-game reaction to wait on language reasoning. We treat this as a System 1 / System 2 problem, analogous to the fast, instinctive responses and the slower, deliberate reasoning in human decision-making. A System 1 layer, implemented as a behavior tree, handles fast, reactive gameplay such as movement, aiming, and immediate combat responses at game tick rate. A System 2 layer, the language model, handles the deliberate work: interpreting player intent, coordinating with the player, and generating natural speech. Reflex-level actions never have to wait for the model.

A significant amount of engineering effort went into defining the boundary between those two layers, deciding what should be handled immediately by the System 1 gameplay layer and what should be routed through the System 2 language model. Getting that division right was essential to making PUBG Ally feel responsive enough for combat while preserving the flexibility and naturalness of an AI teammate.

PUBG Ally understands PUBG-specific terminology, maps, items, and weapon attributes. How did you approach that domain adaptation, and how do you keep Ally’s responses grounded in actual game state?

We approached this in two steps. The first was simply deciding what not to handle. Rather than asking the model to cope with every possible PUBG scenario, we constrained the world: a single map, Sanhok, and a single mode, AI Duo, with a fixed item taxonomy that defines what Ally can use, what it can recognize but not use, and what doesn’t exist in this context. Working within a closed world made everything downstream far more tractable.

From there, we built a deterministic PUBG specification and placed it in a larger teacher model’s system prompt: Sanhok landmarks, item and weapon knowledge, and a clear account of what Ally can actually do. The teacher used it to produce grounded responses, demonstrating both the right PUBG terminology and style and its own limits, including when to call a knowledge-lookup tool backed by a curated dictionary of weapons, attachments, items, and rules. We distilled all of that into the on-device student model, so those behaviors became the student’s own rather than a set of instructions.

Keeping Ally grounded in live game state is a separate problem, and we handle it agentically. Ally decides what it needs to know and fetches only that through tool calls. The agent has a set of observation tools: its own status, the teammate, nearby items, the combat situation. Each returns a plain-text view of its slice of authoritative engine data, such as: “You are armed with M416, 24/30 bullets in the magazine. HP 78%, inside the safe zone.” 

The system prompt establishes that these tool results are the only ground truth for the current match, and during distillation the teacher reinforced the habit of re-observing through tools rather than recalling from earlier turns, a discipline the student model absorbed as its own. So even as the game changes fast, Ally’s factual claims trace back to a value it deliberately pulled from the engine moments earlier, not to something the model guessed.

PUBG Ally supports English, Korean, and Chinese. What did multilingual support add to the engineering challenge, and how did you balance quality across languages?

Multilingual support adds complexity because PUBG communication isn’t just standard language. Players use short commands, slang, abbreviations, item names, map callouts, and culturally specific expressions, especially in fast combat situations.

To address this, we built language-specific data by researching how PUBG players actually communicate in English, Korean, and Chinese communities. We also worked with language experts to review quality, naturalness, and gameplay appropriateness across each language.

At CES 2026, KRAFTON showed PUBG Ally with long-term memory. What did adding memory unlock for the gameplay experience, and how is it implemented?

Ally’s memory is structured, and it operates on two timescales. Long-term memory carries across matches. It holds a profile of who the player is — their name, preferred weapons, favorite drop locations, and personal details they have shared with Ally — alongside a record of prior matches with that player, including final placements and notable moments. Short-term memory is the within-match context: the player’s recent speech and what has just happened in the current game.

The clearest signal came from player comments after playtests. One player asked Ally to look out for a Beryl in the first match, and from the next match onward Ally started finding it without being asked. Another told Ally their name once, and Ally was already using it in the next match. Several players described Ally remembering their preferred weapons and favorite drop locations across matches and bringing them up on its own. These were the moments that turned Ally from a per-match assistant into a teammate players treated as ongoing.

What did the iteration loop look like? How do you test and tune an AI teammate whose outputs aren’t deterministic, especially in a competitive multiplayer game where consistency matters?

Testing an AI teammate is very different from testing a chatbot or a standalone model. A good teammate is one that players feel is useful, timely, and reliable during a real match. User experience is the most important signal and it is also the hardest thing to evaluate directly.

That is why our tuning process combined multiple evaluation layers. We used automated evaluations to check whether Ally followed the expected interaction protocol, used available tools properly, and kept its speech and actions consistent. We then compared candidate models through live playtests and A/B tests, using surveys and free-form feedback to understand what players actually noticed and cared about. Finally, we validated selected models through large-scale playtests, refining Ally with play feedback from over a thousand real players.

Player feedback helped us understand where Ally failed, as well as what players liked, disliked, and valued in an AI teammate. We used those preferences to refine our evaluation criteria and guide further tuning. 

What advice would you give to other studios looking to build their first AI-powered companion or teammate using NVIDIA ACE?

The goal is the player’s experience, which boils down to whether the companion feels like a good teammate. That is hard to judge from offline metrics or a small internal team alone. So the loop that mattered most was simple: develop a fast prototype, put it in front of a large number of real players, and improve the model from their feedback, then repeat. Each turn through that cycle taught us more about what “a good teammate” actually means than any amount of upfront design could. If we could give just one piece of advice, it would be to get that loop running early, and to keep it cheap enough to run often.

Looking ahead, where do you see CPCs going next? More autonomy, multi-agent cooperation between Allies, richer sensory inputs?

For us, the most important next step is discovery. Through the arcade mode, we want to learn how players actually want to enjoy the game when they have an AI feature like Ally playing alongside them, and use what we find to prepare genuinely new kinds of fun for the game.

A CPC can understand a player and adapt to them in ways a scripted system never could, so the space of what might be possible is broad. The arcade playtest is our first real chance to see which of those possibilities players gravitate toward, and to let that shape what we build next.

Get started with NVIDIA ACE

We’re excited to collaborate with KRAFTON on PUBG Ally and see the CPC concept enter public beta. If you’re building AI-driven game characters, get started with NVIDIA ACE,  a suite of on-device and cloud AI models for speech, intelligence, and animation. 

Resources for game developers

See our full list of game developer resources here and follow us to stay up to date with the latest NVIDIA game development news: