Why India’s health data needs a booster shot

The search was futile. And not much has changed even today. As we grapple with the third wave of the covid-19 pandemic and the Omicron variant, the gaps are just as frustrating.

Today, the data on health indicators come from multiple sources. There is the National Sample Survey Organization, the National Family Health Surveys (NFHS), the Sample Registration System, disease registries, surveillance reports, the Annual Health Surveys in select states and a few others. However, this plethora of sources just does not give us the data we are looking for in most cases. What we have is a number of opaque data systems that operate in watertight compartments, making their interoperability difficult. These datasets are not shared even between ministries and certainly are not open sourced for use by analysts and commentators.

Consider this: the pandemic has impacted doctors, nurses and policemen disproportionately—they are the ones waging an ongoing battle against the virus. The health ministry, shockingly, declared it had no data on how many of our frontline health workers died. Oxygen supplies ran out and the entire country went through a harrowing time during the second wave. The answer, yet again, was sadly the same. There was, by the government’s own admission, no information on those who died waiting for oxygen supplies at intensive care units and hospital beds.

The paucity of information has several implications. Firstly, health officials are unable to determine the spread of covid-19 infections among different age groups, and therefore are not able to focus attention on the most vulnerable. Secondly, when looking at the death data, for example, it is just not possible to get age-wise or gender-wise numbers. This has resulted in a completely avoidable controversy. While the government declared 400,000 deaths due to covid-19, some very credible analysts, including the former chief economic advisor, Arvind Subramanian, in July 2021, declared that the deaths could be 10 times that number. The other fallouts are drug stockouts, vaccine supply shocks, overused ventilators and overcrowding at covid centres.

Why do we continue to suffer the lack of reliable, transparent, and integrated data? It is ironic given the changing nature of governance where we rely more and more on evidence to make new policy. While this is true of all sectors, nowhere is evidence more important than in healthcare.

The correct numbers would have helped in managing supplies, mortuaries and even cremation facilities during the second wave. In this third wave, where we are all expecting far fewer deaths, reliable data would have lessened the pressure on the healthcare staff and frontline workers while enabling our medical facilities to be better prepared when the peak arrives.

A design problem

There have been efforts made at data collection; money and resources have indeed been allocated. The problem is more in terms of design, transparency and confidence in releasing the information collected.

The Aarogya Setu is a great example of technology that should have given us all the data we needed to trace, track and monitor the spread of covid-19. When asked, the National Informatics Centre first replied that it had no information on who had built the app, denting the credibility of what could have been a great resource for epidemiologists. The government later clarified that it was built on a public-private-partnership and was indeed well designed and protected. Even now, the data that it collects is not available to government departments, not even in an aggregated manner.

The government has declared its policy on using open-source software and has announced knowledge sharing protocols. However, all these systems with their complicated architectures are almost always inflexible in addition to being proprietary and expensive.

Then, there is the Heath Management Information System, started 13 years ago. It collects enormous amounts of information but uses only a 10th of this to generate health indicators. In almost all sheets, half the fields are blank or are marked ‘not applicable’.

The big problem arises even after this data is collected as very often the denominators, such as age and gender, are not available. For example, it’s not enough to simply provide the numbers of those who have tested positive—like 10 out of 500 tested. What we need to know is the population at risk and how many of those are positive. If a large proportion of the 500 are young and vaccinated, 10 is a seriously high number.

Erroneous data is another serious concern, particularly with the data collected through surveys or administrative means. If a data set shows a far greater proportion of geriatric people in an area populated by youngsters, it needs to be corrected. However, in most cases, these errors are corrected at the central level and not at the point of data collection, leading to huge shifts in results. The numbers must be corrected at the hospital itself or by the enumerator who is collecting the field data, not by an analyst looking at millions of data points in Delhi.

The issue of collecting lots of irrelevant data is compounded with collecting the same data more than once. When the same data is collected multiple times on different platforms, it confuses the health workers, the data collection agents and the surveyed population itself. Different sources give out varying numbers—allocating budgets becomes that much more confusing. The best example is India’s TB data sets. There are multiple organizations that collect TB data. While by one estimate India has 10 million cases, a second study pegs the number at 3 million.

Meanwhile, the private sector plays a major role in healthcare today—nearly 75% of all illnesses are treated in the private sector, both in rural and the urban centres. The same is the case for all outpatient care. Even for inpatient care, the proportion that the private sector treats is close to 70%. This means that at least two thirds of all data is with non-state actors. However, none of this data, barring a few minuscule exceptions, is ever notified or reported.

Old, new demands

How have our surveys done? Surveys in a large population suffer from sampling issues, and the NFHS-4 (2015-16) is a great example of a survey that suffered because in some states, the sample used was too small. Same is the case with the National Sample Surveys.

We need to collect and disseminate routine administrative data, the cheaper and more reliable form of collecting information, right near the point of data collection. Imagine all that we could have achieved if we had regular sets of data provided by the health ministry. This is data that is available in most countries.

The first is an old demand. Weight at birth for all children must be recorded and entered into the birth certificate. It will allow us to see what happens to our children as they grow up and would lead to a lowering of our under-five mortality rate, the highest in the world today among developing nations. Similarly, the cause of death must be clearly mentioned in all death certificates.

In the context of covid-19, most researchers, and now, most of the citizenry would like to know periodic and regular results of the genomic sequencing that is being done in some small number of cases. We should be able to get a daily update on tests done, on the hospitalization rates. We also need daily information on infections and reinfections in hospitals. All this would allow us to project the number of beds required and hospitalizations needed over the next week and more.

The way forward

The roll-out of the National Digital Health Mission (NDHM), in September 2021, was indeed a step in the right direction—NDHM started with a vision to improve the efficiency, effectiveness, and transparency of health service delivery. It may enable an integrated digital database for healthcare in India; the data disseminated may allow public policy to be shaped. However, the NDHM will be successful only if the system allows for a transparent collection and disbursal of the data collected. It needs to include private care and community based hospital services—they would provide most of the information given their reach.

All stakeholders must know that data is being collected to be used for policy purposes. That was the secret behind the success of data collection when it came to the National Rural Employment Guarantee Act (NREGA), which aims to guarantee the ‘right to work’. All users realized that it will be used for budgeting and monitoring of performance. The collection and dissemination became real time, and the data base was made accessible to all—researchers, panchayat heads, state governments and the central government.

Similarly, health data should be viewed as useful. The Health Management Information System (HMIS), a web-based information system started by the ministry of health and family welfare, captures service delivery data (reproductive, maternal and child health related, immunization, family planning, etc) on a monthly basis. However, for the data enumerator, the end result in not clear and therefore the collection is done lackadaisically. It is important that the entire healthcare system sees this as a useful exercise that feeds into decision making. If the data remains hidden behind various firewalls and is inaccessible, the usefulness will be questioned.

The other way forward is to make the ownership of data decentralized. State governments must take pride in collecting and spreading information— again, like is done with casual workers and card holders under NREGA. The panchayat also then starts taking pride in keeping the data ready.

Meanwhile, private hospitals and diagnostic centres should be incentivized and encouraged to share information, register cases and report infections. Some private data aggregators could also be used. This data must be available in the public domain, open to researchers and to all those who are interested in the subject.

A great example of this is the way in which the NFHS-3 (2005-06) data was disseminated openly, despite causing embarrassment to the incumbent government. NFHS-3 showed how India’s dizzying economic growth had left a large number suffering from malnutrition. The data then forced the union and the state governments to take nutritional planning seriously and correct the many mistakes made in food policy.

Now, we need a new data policy that enables our public funded information to be accessed easily. We have enough examples of these from Israel, the UK and most of Europe, where real time data helped avert deaths. This policy must also ensure that data privacy is respected and theft is simply not tolerated. Here, it is important to point out that the Digital Information Security in Healthcare Act (DISHA) has been passed and needs to be tightened further.

Lastly, robust health data collection mechanisms are now possible with the easy availability of technology and the spread of bandwidth across the country. Real time data monitored by GPS tools can be collected and verified immediately. That may pave the way for quick decision making. For example, in India’s procurement policy, it will ensure we don’t ever run out of drug supplies, oxygen cylinders and personal protective equipment, or PPE kits.

(Amir Ullah Khan teaches at the MCRHRDI and the Indian School of Business, and Saleema Razvi is a senior economist at the Copenhagen Consensus Centre.)

Subscribe to Mint Newsletters

* Enter a valid email

* Thank you for subscribing to our newsletter.

Never miss a story! Stay connected and informed with Mint.
our App Now!!

Translate »