If you walked into any Indian tech company today and asked who their lead AI evaluator was, in most cases nobody would have a name to give you. The role barely exists as a profession. There are, by generous estimate, fewer than five thousand people in India who can credibly evaluate the behaviour of a modern AI system in the strict sense: design test suites that catch hallucinations, measure bias across demographic slices, detect drift over time, write red-team prompts that find the model's weaknesses, and produce evidence-grade reports that hold up under audit. The country has four million people writing code. The asymmetry is staggering.
This is not a minor staffing issue. It is the single largest bottleneck for trustworthy AI in India, and it is invisible because evaluators do not generate news. The model that gets released gets a press cycle. The team that quietly rejected three previous versions of the same model because the evaluation flagged unsafe behaviour gets no press cycle, because nothing happened. The most important AI workforce in the country is the one whose job is to make sure things don't happen.
Why this profession barely exists
Several reasons explain why evaluation has stayed small even as model-building has exploded. The first is incentive: companies that ship models get rewarded. Companies that evaluate them get cost-allocated. The second is visibility: a model launch is a marketing event; an evaluation report is an internal document. The third is the strange composition of the work itself, which sits between machine learning, statistics, domain expertise, and quality engineering, and which almost no degree program teaches as a coherent discipline.
The fourth, and most important, is that until very recently, evaluation was not strictly necessary. Companies could ship models, watch what happened, and iterate. As long as the failure modes were minor, a wrong product recommendation, a fumbled translation, the cost of skipping serious evaluation was low. AI's move into high-stakes domains has changed this calculus. A wrong medical diagnosis, a wrong legal interpretation, a wrong agricultural recommendation can have consequences that no amount of post-hoc patching can undo. Evaluation has become structurally necessary in a way it was not three years ago.
What an AI evaluator actually does
A working evaluator does several things that most engineers do not. They write structured prompt sets that probe specific failure modes. They design behavioural tests that distinguish "the model says the right thing on the easy case" from "the model holds up on the adversarial case." They construct demographic and contextual slices, different ages, languages, registers, locations, and measure how the model performs across each. They track drift over time as the underlying model or its environment changes. They write up the findings in language that engineering, product, and policy stakeholders can each act on. They escalate when the model should not ship.
Done well, this is one of the most intellectually serious roles in any AI company. It requires deep familiarity with the model, fluency in the failure modes, and the discipline to write down what was tested, what was found, and what was decided. Done badly, it is bureaucratic theatre. The difference between the two is almost entirely about whether the evaluator has the standing and the budget to actually delay a launch.
India is structurally well-positioned, if it chooses to be
India has, on paper, every structural advantage for becoming the world's evaluation hub. A large, English-fluent, technically skilled workforce. A culture of detail-oriented quality assurance work, built up over thirty years of IT services. A multilingual context that exposes models to a wider distribution of behaviour than any single-language country can. A growing internal market that needs evaluation for its own AI deployments. A regulatory environment that is starting to take evaluation seriously in critical sectors.
The missing pieces are training pipelines and economic recognition. There is no widely-recognized "AI evaluator" qualification. The job postings are inconsistent. The compensation, when the role exists at all, is often below that of a junior model engineer despite the seniority and judgment the work requires. Until evaluators are paid like senior engineers and trained like serious practitioners, the workforce will stay small.
What the training pipeline should look like
A serious AI evaluator pipeline would borrow from three traditions. From software QA, the discipline of structured test design, regression suites, and bug triage. From statistics and survey research, the rigour of slicing, controlling, and reporting. From domain expertise, the deep familiarity with the field where the model is being deployed, medicine, law, agriculture, education. The evaluator is, in effect, a polyglot: not the world's best statistician, ML engineer, or domain expert, but fluent enough in all three to translate between them.
This kind of training does not happen by accident. It requires deliberate program design, mentorship from senior evaluators, and access to real systems with real stakes. The reason fewer than five thousand Indians can do this work today is that fewer than five hundred Indians have had the apprenticeship that produces it. The first step is to fund the apprenticeship.
The opportunity for the next generation
For anyone in India looking at the AI hype cycle and wondering where to apply themselves, evaluation is the unambiguously underpopulated, structurally important, increasingly valuable seat. The job is harder than it looks. The compensation is rising. The work is, in a deep sense, important, every unsafe AI behaviour caught before it is deployed at scale is a small public good produced by an evaluator most people will never hear about.
This is one of several reasons that Eval.qa exists as part of the ecosystem alongside Bharath.CLUB and Sarasvat.ai. Eval.qa is the workplace and tooling layer for the evaluation profession. AI.Bharath.CLUB is the community where the profession can grow into itself. The combination is meant to do something that no government program could do alone: take a serious, underpopulated, important profession and give it a working culture in India.
The Bharath bet on evaluation
The bet is simple. The country that builds the world's strongest evaluation discipline will, by 2035, be the country whose AI deployments other countries trust. Trust, in AI, comes from evidence. Evidence comes from evaluation. Evaluation comes from evaluators. The whole chain runs through a profession that, today, India has barely started training. Closing that gap is the single highest-leverage workforce decision in front of the country. Bharath.CLUB is on the side of making that decision deliberately.
Join the conversation
This essay is part of an ongoing community. If it resonated, the next step is to be in the room.
Join Bharath.club → Read more essays