Govt making AI-ready data for LLMs; 288 datasets standardised to fix silos: MoSPI Secy Saurabh Garg

MoSPI Secretary Saurabh Garg said the Govt is undertaking a data harmonisation exercise, standardising 288 priority datasets, which are important from an economic and social perspective, across ministries

Published on:

05 Jun 2026, 12:48 pm

New Delhi: To ensure that AI models do not rely on non-credible sources for government data, the Ministry of Statistics and Programme Implementation (MoSPI) has upgraded its official data portal to be directly readable by large language models (LLMs), a senior official said on Friday.

Follow The PSUWatch Channel on WhatsApp

Secretary of the Ministry of Statistics and Programme Implementation (MoSPI), Saurabh Garg, said the government is undertaking a data harmonisation exercise, standardising 288 priority datasets, which are important from an economic and social perspective, across ministries.

Speaking on the transition towards an "intelligence infrastructure", he said the ministry has recently added a Model Context Protocol (MCP) layer wrapper around its portal. This technological upgrade allows LLMs to directly access and process official statistics.

"If the models don't get easy access to credible data, there'll be some other data filling up the gap," Garg said at an NCAER event here, noting that the ministry is among the first globally to implement an MCP on government data to ensure AI models have access to trustworthy information.

However, Garg highlighted that the foundational challenge for AI in India is semantic interoperability -- ensuring that AI systems can understand the context and classifications of data across different departments.

Illustrating the issue of siloed information, he pointed out that five different ministries have five definitions of what constitutes a "pakka" house.

"I think where we need to work more is on the semantic interoperability, so that AI systems can understand the context of the definitions and the classifications. And this is extremely important because if a definition of any concept in two systems is different, then those two systems cannot talk to each other," Garg explained.

Follow PSU Watch on LinkedIN

To resolve these discrepancies, the government has identified 288 datasets across ministries and is standardising their metadata. Officials are utilising 38 different types of identifiers and 88 international classifications to ensure the data is FAIR -- Findable, Accessible, Interoperable, and Reusable.

Digital access, cyber security challenges in DPI ecosystem: Nageswaran

Garg emphasised that the ultimate aim of putting harmonised government data in the public domain is to improve public service delivery. He noted that integrated data sets are already enabling state governments to identify beneficiaries and roll out welfare programmes within weeks of announcement, a process that previously took a year or more, while significantly reducing leakages.

(PSU Watch is India's Business News centre that places the spotlight on PSUs, Bureaucracy, Defence and Public Policy. 👉 Click to join our channel now: PSUWatch WhatsApp Channel. Prefer LinkedIn? Follow PSU Watch on LinkedIN. Click to stay connected on Twitter here and stay updated.)

Business News