A doctor in a district hospital who is confidently wrong about a diagnosis harms one patient. The harm is real, sometimes terrible. It is also, structurally, bounded. The doctor sees a few dozen patients a day. The error pattern, if she has one, has to be discovered through the slow accumulation of cases, peer observation, and eventually a review. The mechanism is imperfect, but it exists, and it has been refined over centuries.
A model deployed as a clinical decision support tool across two hundred Indian district hospitals, used by three thousand clinicians, advising on perhaps a hundred thousand patient encounters a month, that is confidently wrong about a diagnostic pattern, say, mis-weighting a lab value in the context of a comorbidity profile common in Indian populations, harms a hundred thousand patients a month. In identical language. With identical certainty. With no peer to disagree, because every clinician using the tool is hearing the same wrong answer, in the same authoritative voice, at the same time.
We have not, as an industry, as a regulator, or as a community, priced this in.
The Asymmetry Is Not New, But Its Scale Is
The general shape of this problem is not novel. Mass-produced anything, pharmaceuticals, automobiles, infant formula, can produce concentrated harm at scale when something is wrong with the master copy. We developed regulatory regimes for these industries precisely because the asymmetry between individual and mass production demanded a new kind of oversight.
AI is now a mass-production technology for cognitive output. A single model can generate hundreds of millions of confidently-stated outputs per day. The cost structure of producing a confidently-wrong sentence is identical to producing a correct one. The model does not pause, does not hedge unless trained to, does not know when it is wrong.
The Indian Surface Area Is Larger
Several features of the Indian deployment landscape make the cost of confident-wrong more acute here than elsewhere.
First, the user base is often less able to push back. A farmer receiving an AI-generated recommendation about pesticide use is in a worse position to second-guess the system than a Western consumer receiving an AI recommendation about a streaming service. The asymmetry of expertise and authority is steeper in many Indian contexts, which means the model's confident wrongness is more likely to translate into action.
Second, the languages and contexts are under-represented in the training data of the major models. The probability of subtle systematic error is higher. This is not a moral failing of the labs. It is a structural feature of where the data has historically been. The prior probability of confident-wrong output in an Indian deployment is higher than the published benchmarks would suggest.
Third, the institutional capacity to detect, redress, and recover from large-scale AI errors is thinner in India than in more mature regulatory regimes. The RBI has begun engaging seriously with AI risk in financial services. SEBI is paying attention in markets. State and central health regulators are beginning to ask questions about AI in clinical settings. None of this yet adds up to the kind of mature recall, redress, and post-market surveillance infrastructure that exists for pharmaceuticals.
What Pricing It In Looks Like
Pricing confident-wrong in means treating each AI deployment as carrying a contingent liability equal to the harm-at-scale of the worst-case scenario, discounted by its probability. This is not a number anyone wants to write down. It is the number that ought to drive evaluation investment, deployment caution, and the design of human-in-the-loop fallbacks.
For a clinical decision support tool deployed across two hundred hospitals, this number is enormous, and it justifies investing in the kind of behavior logs, red-teaming, eval-driven development, and community quality signal that the rest of this body of work has argued for. None of those activities are cheap. All are cheaper than the contingent liability.
For a customer support chatbot at a fintech, the number is smaller but non-zero. Confident-wrong financial advice produces measurable harm, and at scale produces measurable regulatory exposure. The RBI's increasing scrutiny of AI in credit decisions is the early warning signal of this asymmetry being recognised at the institutional level.
The Calibration Problem
A model that says "I am not sure" when it is not sure is enormously more useful than a model that says "the answer is X" with identical confidence whether it knows or not. Calibration, the alignment between expressed confidence and actual probability of correctness, is one of the harder things to achieve. Most production deployments ship with poorly calibrated confidence, because calibration is hard to measure and harder to fix, and saying "I don't know" feels like a worse product.
The product instinct is wrong. The deployer's loyalty has to be to the user, not to the demo. A model that admits uncertainty is a model that has not yet produced a confident-wrong output, and not producing the confident-wrong output is, in expected-value terms, almost always the right answer.
The Action
If you deploy AI, name the worst confident-wrong failure mode your system can produce. Write it down. Estimate, even crudely, the harm at scale if it occurs. Then ask whether your current investment in evaluation, calibration, and human fallback is consistent with that estimate. For most Indian deployments today, the honest answer is no. The gap between the contingent liability and the evaluation investment is the bill being written, in silence, that no one has yet sent. Send yourself the bill, voluntarily, and start paying it now. The alternative is to wait for someone else to send it, on terms you will not get to negotiate.
Join the conversation
This essay is part of an ongoing community. If it resonated, the next step is to be in the room.
Join Bharath.club → Read more essays