Blog·AI Sovereignty·No. 021 / 132

Inference is the New Manufacturing

Training a frontier model costs $100M and benefits a few labs. Owning national-scale inference benefits 1.4 billion humans every day. The conversation is the inverse.

1,191
words
5m
read time
7,374
characters
13
paragraphs
68
sentences
I
signature
Inference is the New Manufacturing
AI Sovereignty · Essay 021 of 132

For most of the AI hype cycle, training has been the protagonist. The lab announces a new model. The model required ten thousand GPUs and several months. The press reports it as a breakthrough. The country whose lab built it is described, briefly, as winning. The cycle repeats. This is not a useless story; training is genuinely hard and the labs that do it well deserve real credit. But the story has hidden the more important infrastructure question, which is not where the model was trained but where it is being served. Training a frontier model is the lab. Inference at scale is the factory. The factory is where economies are built.

The analogy with industrial manufacturing is exact. The country that designed the integrated circuit was not the country that won the chip industry. The country that designed the automobile was not the country that won the car industry. In both cases, the design was the lab; the manufacturing at scale was the factory; and the factory is where the durable economic value accrued. The same pattern will hold for AI. The countries that built the first frontier models will be remembered. The countries that own the inference infrastructure of the next decade will be the ones whose economies are reshaped.

What inference at scale actually is

Inference is the work of taking a trained model and running it, again and again, for users in production. Every time you ask an AI assistant a question, an inference happens, a forward pass through the model, on some piece of hardware, producing an answer. At the scale of an entire country using AI in daily work, the number of inferences per day is in the tens of billions. Each of those inferences costs energy, cycles, and time. The cost per inference, multiplied by the inferences per day, equals the operating cost of the country's cognitive layer.

This is a very different problem from training. Training is bursty, periodic, and tolerant of latency. Inference is constant, low-latency, and unforgiving. Training cares about flops. Inference cares about utilization, network, locality, and energy. Training can happen, in principle, in any country with a few GPU clusters. Inference at the scale of a national economy has to happen close to where the users are, on infrastructure that can be relied on, under terms that the local economy can sustain.

Training cares about flops. Inference cares about utilization, network, locality, and energy. They are not the same problem, and the same country need not win both.

Why national-scale inference matters strategically

The strategic stakes of inference are easy to miss until you trace them through. The inferences a country runs today are training the habits, vocabularies, and defaults of its professional class tomorrow. If those inferences are running on foreign cloud infrastructure under foreign terms, then the cognitive substrate of the country is, in a strict supply chain sense, an imported substrate. The supply can be priced up. It can be throttled. It can be conditioned on policy. None of this requires malice, it is the unavoidable consequence of dependence on critical infrastructure controlled outside the country's jurisdiction.

The alternative is national-scale inference infrastructure, clusters of GPUs in Indian data centres, serving Indian users on Indian terms, with the capacity to scale with the country's actual usage. This does not require India to build the chips. It does require India to operate the inference clusters, to negotiate the long-term contracts with chip suppliers, and to be a sufficiently large buyer that the chip supply line treats India as a strategic customer.

The economics are surprisingly favourable

The economics of inference, unlike those of frontier training, are forgiving. Inference is well-suited to incremental capacity additions, not heroic single projects. The break-even on an inference cluster, given Indian usage patterns and Indian power costs, is reasonable on a few-year horizon. The technical workforce required is large but trainable. The business case is not exotic, it is the same case as cloud infrastructure, with AI-specific hardware as the centre.

What is required, mostly, is the strategic decision to treat inference as infrastructure rather than as a service to be procured. Once that decision is made, the rest is execution: securing the chips, building the data centres, training the operations workforce, negotiating the long-term power, integrating with the existing cloud market. None of this is exotic. All of it is unglamorous in the way that real infrastructure decisions usually are.

The procurement choice nobody is talking about

The single largest leverage point on Indian inference sovereignty in the next three years is procurement. The country buys AI services through a thousand procurement channels, government departments, public sector banks, hospitals, schools, public universities. Each of those procurements is, today, defaulting to foreign cloud-hosted AI in part because no domestic alternative is operationally cheap and easy. If the procurement defaults shift toward inference run on Indian infrastructure, even at a modest premium for a transition period, the domestic inference industry would have the volume to invest seriously. If they do not shift, the foreign clouds will own the public-sector AI workload by 2028 and the cost of unwinding the dependence will be much higher.

This is exactly the move India made successfully with payments, the procurement defaults around UPI created the volume that allowed a domestic stack to thrive. The same procurement-led approach can work for inference, if the policy attention turns to it in time.

What the community can do here

This is the kind of structural problem where a national community of practitioners is not a substitute for policy but is a necessary condition for policy to land well. A community of evaluators can certify which Indian inference providers meet the quality bar for which use cases. A community of practitioners can publish honest cost and performance comparisons that the procurement officers can act on. A community of domain experts can articulate why specific use cases, health, legal, education, require domestic inference for reasons beyond cost.

Without that community layer, the policy conversation gets captured by the foreign cloud lobby on one side and the most aggressive domestic vendor on the other. With it, the conversation has a third voice, the practitioners who are actually using AI every day and who have an interest in honest evaluation. Bharath.CLUB, AI.Bharath.CLUB, and Eval.qa together are pieces of that third voice. The voice is necessary. Without it, the country will end up importing its factories the same way it once exported its oil.

The window

The window for getting national-scale inference right closes around 2028. By then, the procurement defaults will have hardened, the dependence will be entrenched, and the cost of changing course will rise sharply. The right time to make the investment is now. The right architecture is procurement-led, community-supported, and infrastructure-first. The right metric is not how many models India trains; it is how many Indian inferences run on Indian infrastructure. Track that number and most of the rest of the AI sovereignty conversation will sort itself out.

Join the conversation

This essay is part of an ongoing community. If it resonated, the next step is to be in the room.

Join Bharath.club → Read more essays