Application hosting: Compute to host your application. Your application can use Google Cloud's client libraries and SDKs to talk to different Cloud products.
Model hosting: Scalable and secure hosting for a generative model.
Model: Generative model for text, chat, images, code, embeddings, and multimodal.
Grounding solution: Anchor model output to verifiable, updated sources of information.
Database: Store your application's data. You might reuse your existing database as your grounding solution, by augmenting prompts via SQL query, and/or storing your data as vector embeddings using an extension like pgvector.
Storage: Store files such as images, videos, or static web frontends. You might also use Storage for the raw grounding data (eg. PDFs) that you later convert into embeddings and store in a vector database.
The sections below walk through each of those components, helping you choose which Google Cloud products to try.
Application hosting infrastructure
Choose a product to host and serve your application workload, which makes calls out to the generative model.
Model hosting infrastructure
Google Cloud provides multiple ways to host a generative model, from the flagship Vertex AI platform, to customizable and portable hosting on Google Kubernetes Engine.
Grounding and RAG
To ensure informed and accurate model responses, ground your generative AI application with real-time data. This is called retrieval-augmented generation (RAG).
If you want to generate content that’s grounded on up-to-date information from the internet, then Gemini models can evaluate whether the model's knowledge is sufficient or whether grounding with Google Search is required.
You can implement grounding using an index of your data with a search engine. Many search engines now store embeddings in a vector database, which is an optimal format for operations like similarity search. Google Cloud offers multiple vector database solutions, for different use cases.
Note: You can ground using non-vector databases by querying an existing database like Cloud SQL or Firestore, and you can use the result of the query in your model prompt.
Do you want a fully-managed optimized solution that supports most data sources and prevents direct access to the underlying embeddings?
Vertex AI Searchclose
You are building a search engine for RAGclose
You might use a reference architecture to build a tailor-made search engine and a vector database for RAG use cases.Is your data accessed programmatically (OLTP)? Already using a SQL database?
Want to use Google AI models directly from your database? Require low latency?
Have a large analytical dataset (OLAP)? Require batch processing, and frequent SQL table access by humans or scripts (data science)?
BigQueryExcept as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-06-18 UTC.
Need to tell us more? [[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-18 UTC."],[],[]]