AI/ML News & Innovations Hub

Google has tightened how Gemini’s free and paid tiers work. Since May 17, 2026, Gemini Apps have run with compute-based usage limits based on prompt complexity, the model chosen, and the length of the chat, refreshing every five hours up to a weekly cap. The change is the consumer-facing edge of a compute shortage that has squeezed Google’s biggest enterprise customers for months, and it is reshaping how much free AI ordinary users get.

Who got hit first? Google capped Meta’s use of its Gemini models in March after Meta sought more capacity than Google could supply, the Financial Times reported. The restriction, still in place, disrupted some of Meta’s internal AI projects and pushed the company to tell staff to use AI tokens, the units that measure AI usage, more efficiently. The constraints hit several other Google clients too, though less hard than Meta. Capping one of its largest customers signals how serious the shortage already was.

Has Google admitted the problem? At its first-quarter earnings in April, CEO Sundar Pichai said Google was “compute-constrained in the near term,” and that cloud revenue would have been higher had it been able to meet demand, per the FT. Cloud revenue crossed $20 billion for the first time, while signed-but-undelivered cloud contracts nearly doubled to more than $460 billion. That backlog is the tell: customers are signing up for compute faster than Google can deliver it.

How are providers coping? Google signed a $920 million-a-month deal to lease computing capacity from Elon Musk’s SpaceX, and Anthropic, which makes Claude, struck a similar arrangement with SpaceX. These leasing deals show the shortage has become a structural feature of the market, which is why the limits are unlikely to loosen soon.

What is driving the cost? The highest cost now comes from inference, the work of running models after training, the FT noted. Every prompt a user sends consumes compute, so the more people use AI for everyday tasks, the heavier that running cost becomes.

Does Google link the limits to capacity? Google’s Gemini Apps help page ties the limits to capacity directly:

If capacity changes, Google may cut limits for free users before paying subscribers.
During periods of high demand, Google may withhold compute-heavy features such as Deep Research from free users.
Usage limits may change without notice, including due to capacity constraints, and Google may tighten them when activity spikes.

By Google’s own account, capacity sets the limits.

Where do consumers fit in? Token-based limits let a provider ration finite compute across its user base. People who never touch an API use Gemini Apps as mass-market tools to summarise, brainstorm and generate images. As that everyday usage scales, it puts the same constrained infrastructure under greater strain. Metering by compute lets Google charge heavier tasks more and steer demand towards lighter models, matching each user’s cost to their actual consumption.

What does each tier get? What a user gets now scales with what they pay:

Context window: 32,000 tokens on the free tier, 128,000 on AI Plus, one million on AI Pro and AI Ultra.
Usage limits: standard on the free tier, 2x on Plus, 4x on Pro, and up to 20x on the top Ultra plan.
Gated features: video generation, scheduled actions and the Daily Brief sit behind paid plans.

What does this mean for India? Google built its Indian user base on free access, and that access is now governed by compute-based usage limits. A wave of free giveaways onboarded Indian users to AI, with three deals landing close together in late 2025:

Google-Jio: 18 months of free Google AI Pro, a plan worth about Rs 35,100, for eligible subscribers.
OpenAI: ChatGPT Go offered free to Indian users.
Perplexity-Airtel: a free Perplexity Pro subscription for the telco’s customers.

MediaNama founder Nikhil Pahwa read the wave as a “scale first, monetise later” play, the same free-access land grab Jio used to capture the telecom market. Compute-based limits are what the monetise phase looks like in practice.

Can India supply its own compute? India’s flagship IndiaAI Mission set aside roughly Rs 4,563 crore for compute capacity, against more than $52 billion in private AI infrastructure commitments from Amazon and Microsoft alone, MediaNama reported. At a MediaNama roundtable, Ajay Kumar of Triumvir Law argued that India should incentivise the private sector to build domestic compute capacity so that “one day we’re not at the mercy of foreign data centers” for running public infrastructure. When the global providers ration capacity, the markets that depend on them feel it first.

Why it matters. The Gemini change tells consumers something the enterprise story already showed: AI economics are tightening for everyone. More generous free access is giving way to tighter metered limits, because the underlying resource is genuinely scarce and providers are paying heavily to secure it.

For users, more of what AI can do will sit behind paid tiers. The deeper question is who controls the compute on which all of this runs, who can afford access to it, and on what terms.

Also Read:

Explained: Why Google moved Gemini to token-based limits