A doctor in Aurangabad consults an AI assistant about a patient with persistent unexplained fever. The AI confidently lists a differential diagnosis built around viral causes common in temperate climates. It does not surface scrub typhus, which is the single most likely diagnosis in that district in monsoon season. The doctor knows this. The AI does not, because the corpus it was trained on did not weight Indian infectious disease surveillance data appropriately.
A young lawyer in Indore drafts a tenancy notice using an AI tool. The notice cites a precedent that was relevant in 1991 and overturned in 2003. The AI did not know about the overturning because the Indian case law archive it had access to ended in the year a particular paywall came up.
A farmer in Marathwada asks, through a voice interface in Marathi, whether to plant tur or soyabean this season. The AI gives a confident answer based on global commodity price models. It does not know that the local market yard is a four-hour bullock-cart ride from his village and that the closest tur procurement centre has not paid farmers on time for two seasons.
None of these failures are model failures in the narrow sense. They are commons failures. The reference material that an Indian-aware model would need to consult does not exist as accessible, structured, curated knowledge.
What a commons actually is
A knowledge commons is not a dataset. It is a shared, maintained, governed body of domain knowledge with clear provenance, accountable curators, and rules for how it gets updated. The Indian Penal Code, in its raw textual form, is not a commons; it is a document. A commons would be the IPC, plus authoritative commentary, plus the structured graph of which sections have been amended, repealed, or judicially reinterpreted, plus the canonical citations, plus the working understanding of district-level prosecutors about how each section is applied in practice.
We have document repositories. We do not have commons. The difference is curation, governance, and the willingness to say which version is authoritative.
The sectors that are screaming for one
Health is the most urgent. The ICMR, the National Centre for Disease Control, AIIMS, the state medical councils, and the various national programmes have generated immense amounts of clinical guidance. It sits in PDFs, behind logins, in formats that no AI has been trained on at the appropriate weight. An Indian health knowledge commons would be the ICMR treatment guidelines, the National Formulary, the essential medicines list, the AYUSH-allopathy interaction notes, and the field-officer manuals from the National Health Mission, all structured, versioned, and queryable.
Law is the second. The Supreme Court reports, the High Court archives, the law journals, and the working notes of the law commissions are the canonical Indian legal commons. Today they exist as separate, partially digitized, mostly unstructured archives. A curated commons would let any AI distinguish current law from overturned law, district-specific procedure from national procedure, and statutory text from judicial commentary.
Agriculture is the third, and the most multilingual. Each Krishi Vigyan Kendra holds local crop calendars, soil reports, and extension notes in the regional language. ICAR research stations publish in English. The state agricultural universities publish in a mix. None of this is currently a unified, accessible commons. The model that knows when to plant tur in Latur and when to plant it in Khargone is the model that has consulted a real commons; today no such model exists.
Education, urban administration, judicial procedure, MSME compliance, civil registration, every sector has the same shape. Documents exist. Commons do not.
Why nobody has built them
Three reasons. First, building a commons is institutionally awkward. It requires governments, professional bodies, and private experts to agree on what is authoritative, and Indian institutions are not always quick to align. Second, the work is unglamorous. Nobody gets a Padma award for curating the National Formulary into a structured graph. Third, the funding model is unclear. A commons is a public good; the returns accrue to everyone who builds on top of it, not specifically to the builder.
These are real obstacles. They are not unsolvable. The Wikipedia model showed that volunteer curation can produce serious reference material. The OpenStreetMap model showed that geographic commons can be built without state involvement. The Aadhaar and UPI experience showed that India can ship hard digital public infrastructure when there is alignment.
The work of the next three years
A serious Indian knowledge commons project needs three things. A clear scope, pick one sector, not all of them. A governance model, who decides what is authoritative, who can contribute, who arbitrates disputes. And a sustainability plan, who pays for the curation labour, on what cadence, and how the commons stays current.
The first commons that ships well, say, an Indian primary-care medical commons, governed by a coalition of public health institutions and professional bodies, queryable through a stable API, will be the reference template for every other sector. It will also be the substrate on which the next ten useful Indian AI products are built.
If you are a domain expert with twenty years of practice, the most valuable thing you can do this decade is help build the commons for your field. If you are a founder, stop trying to be the next foundation-model company and start being the company that ships the first useful Indian commons. The compute layer is solved. The knowledge layer is not. That is where the work is.
Join the conversation
This essay is part of an ongoing community. If it resonated, the next step is to be in the room.
Join Bharath.club → Read more essays