Key lessons include: 

  • Tagging components with the model, harness, and issues found when scanning.

  • Allocating hardware and token budgets for finding, developing fixes, build and test.  

  • Managing change volume (and engineer hours) while simultaneously focusing on more, smaller updates, where possible, with good rollout plans to de-risk the change.

2. Scan and prioritize: We continuously scan our code across products — Search, Ads, Android, Chrome, and Google Cloud — managing tens of thousands of packages.

First, we kicked off scanning and centrally tracked our progress, integrating the same tools into our pipelines. We learned early on that the best scanning results come from a combination of an expert in the specific product plus the harness plus the AI model. The combination is crucial, because results will be markedly different without all three.

It’s worth noting that if you can only pick two, we recommend expertise and harness. A less capable model with a good harness and good expert is more powerful than the best model without a good harness or good experts. We also advise using more than one model.

It’s important to track and iterate the data. Since the technology is evolving fast, your data is critical to revise and refine your processes.

Second, look carefully at your software supply chain, and engage your key suppliers. Reachability remains a key criteria for fixes, as does streamlining and simplifying the areas you work on.

Third, because there are so many vulnerabilities that can show up, it’s important to have the right methodology to prioritize them. Normally, when you’re rolling out a change you prioritize the smallest blast radius to make incremental change. Here, we recommend flipping that model: Begin with foundational code with the biggest blast radius to tackle the hardest problems first.

AI models can do a good job of developing proof-of-concepts to rapidly test accuracy. Harness and models play a significant role in reducing false positive rate. Adapting your harness to do validation and using a different agent or model to validate results are both very valuable.

Another key to AI-powered triage is to use your harness and tools to state vulnerability confidence as well as severity. Of course, developing a patch is only part of the problem.

3. Remediate: Fixing vulnerabilities at Google scale required a fundamental shift in strategy. We developed a new approach centered on three lessons.

First, how you roll out patches matters. We adopted a risk-based approach that prioritized code reachable from the outside and had the largest blast radius, such as critical applications like BoringSSL and gVisor. We also learned that providing the model with context was the key to faster, more trusted remediation.

Second, we learned you cannot fix what you cannot track. To manage remediation at scale, we built a central system to track every vulnerability, from discovery to resolution, with every finding labeled in a central repository. This single source of truth allowed us to enforce service-level objectives (SLOs) for patching, and enabled us to deploy constant autonomous patching with human review. Coupled with robust roll-back capabilities, our teams got better at fixing things quickly and safely.

Finally, we learned to build resilience directly into the system. The ultimate goal was to create an inherently-resilient system that can also patch vulnerabilities, not the other way around. We don't just fix the code; we harden the entire system around it.

These changes helped us rethink our approach to securing open-source software with a three-R’s strategy: Refresh, remove, and rewrite. 

  1. First, we refresh what is foundational — finding and fixing vulnerabilities in the code. This is about being good network citizens and protecting the core.

  2. Second, we remove what is peripheral. We are removing dependencies and replacing them with custom code. This is about both efficiency and reducing the attack surface, moving from a broad base of trust to a narrow, controlled one.

  3. Third, we rewrite what is critical. For everything in between, we are transitioning legacy logic and critical capabilities into modern, memory-safe languages using AI to automate the transition to eliminate entire classes of vulnerabilities from that software. 

This evolution is a deliberate approach to reduce complexity, shrinking the attack surface, and building a more resilient, autonomous, and secure-by-design foundation for everything we do.

4. Monitor: Our work doesn’t stop there, and neither should yours. The security landscape is always changing, and the monitor phase is where our approach comes alive by creating a perpetual feedback loop to ensure we stay secure — and get stronger over time.

We had three key lessons in this phase. First, security demands a constant feedback loop. We created a feedback loop to monitor the entire ecosystem for two things: system strain and vulnerability hotspots. 

Second, we invested in tracking our long-term remediation health. You can only improve what you measure. We built a comprehensive asset inventory to track our overall security posture and the completeness of our remediation efforts. Here’s where we hold ourselves accountable to product-level SLOs for vulnerability management. 

This system allows us to deploy rolling patches that can update even our data center hardware continuously and use AI agents to verify patch efficacy at a scale no human team could manage.

Third, we planned for the future by using AI agents for both coding and monitoring. You have to assume that at some point, the attackers' models will become more advanced. We need to evolve our operating model and build for that reality.

We use AI agents to automate and standardize our response playbooks, enabling instantaneous containment when an issue is found. We move beyond just finding bugs by feeding key libraries into Gemini to improve its pattern recognition, creating security-aware coding agents. Meanwhile, our AI-assisted red teamers are continuously stress-testing our core infrastructure, ensuring our defenses are always evolving.

The outcome of this constant monitoring is a living, measured program that we can trust.

This is how we protect billions of users every day, and it provides a framework that any team can use to build a defense that learns, adapts, and hardens itself against the threats of tomorrow.

To learn more about AI Threat Defense, you can watch our recent Security Talks online event.