Digital health measures have the potential to transform medicine from reactively treating acute illness to proactively managing disease. This discussion will explore how existing real-world and digital health data can be reused to develop and validate digital biomarkers. It will also highlight regulatory and clinical considerations, and patient privacy issues.
Geoff: The first challenge when implementing digital health technologies is achieving the adoption and acceptance of digital biomarkers, which can determine whether someone is getting better or worse and if medical intervention is required. A significant part of the problem is that researchers lack some of the data needed to develop and validate those biomarkers. We believe this gap can be addressed by reusing clinical data that is being generated already and by connecting it to real-world evidence.
Daniel: For digital biomarker development and validation to progress, three core tensions need to be addressed.
Reality is too complex for human brains to understand, so we employ heuristics and comprehensible models as tools to help us interact with reality in meaningful ways. Our understanding of human biology is necessarily based on reductive models.
Arguably the richness of human experience, and therefore the underlying basis for the field of psychology and psychiatry is partially contained in all the novels in human history. While one could get an incredibly deep and rich understanding of people by reading novels that level of richness does not make for useful diagnostic ontologies.
On the opposite end of the taxonomic spectrum are things like the Diagnostic and Statistical Manual of Mental Disorders (DSM), a highly specified and therefore reusable ontological system but without much in the way of descriptive richness. We must consider a similar balance with data collection strategies and remember that excessive focus on reusability sacrifices richness.
Clinical measures always need to be validated against something established to be clinically meaningful.
Concurrent criterion validation is the easiest to achieve because it essentially states you have an accepted meaningful measure of disease, and the new measure comes to the same conclusion when assessing the same person simultaneously.
But there is a much more critical type of validation – predictive clinical validation – which takes much longer to achieve. With predictive clinical validation, there is an established clinical state, and the new measure provides data about the likelihood of that clinical state occurring. Essentially it pushes state identification backward in time, ultimately defining a novel clinical state that is predictive of a known clinical state.
For example, by measuring blood pressure repeatedly in a standardized way in clinical settings in many people many times, and subsequently watching what happened to them clinically, researchers could recognize and define the clinical state of hypertension. Then by reducing a person’s blood pressure—treating their hypertension—it was possible to reduce the likelihood of negative clinical outcomes, now newly known to be consequences of hypertension, predominantly damage to end organs: the heart, the brain, the kidneys, etc.
However, with the more recent introduction of high-quality at-home blood pressure monitoring equipment, it became apparent that episodic, in-office sphygmomanometry was not as reliable or valid a measure of real-world blood pressure as initially thought. That episodic approach had inadvertently introduced a measuring error or measuring bias. As a result, it is likely that hypertension was both under and over-diagnosed. In essence, some people who didn’t need treatment were treated, and some who did weren’t.
For digital health data to be predictively clinically validated, researchers must determine how to take the same measurement in the same way in many people and then share that data between them in a pre-competitive or non-competitive environment.
Geoff: Can you use raw or minimally processed sensor data?
James: We cannot directly access totally raw sensor data. Software that controls sensors typically filters the data to improve its stability or to remove outliers. This can happen at the firmware level, or sometimes the sensor data goes through a summative process before it is transmitted to the host computer.
That said, the ideal scenario with any type of measurement is for the data not to have a filter applied to it, because filtering reduces the granularity of the data. Sometimes the filtering process creates summary information, reduces outliers, and tries to improve the data, to make our lives as consumers a bit easier, and to make the sensor data easier to work with.
In doing that, it reduces its usefulness. The sensors become functionally standalone units. We cannot compare the data from one sensor against another because we do not know what type of algorithms have been applied to them. Future data comparisons of research studies are no longer possible.
We want to get as close to the raw information as possible. We do not want anything to be applied to the sensor data, and for it to be transmitted as cleanly as possible.
We employ Euclidian Norm Minus One (ENMO), which is a typical measurement of physical activity, and uses the open-source GGIR algorithm. But we also use the six-minute walk test, and are beginning to look at maximum six-minute activity.
Measuring the maximum amount of movement that happens within a specific time, whether that is an hour, a day, or a week, is shaping up to be a better way of detecting light and sedentary movement.
One of the advantages of using an open-source algorithm, such as GGIR, is that everyone can use the same algorithm to process data and it can be independently measured no matter what sensor or device is being used.
Being able to reuse data is very helpful. Whenever we use any type of hardware, we need to validate its accuracy and reliability. If we could gain access to a global repository of raw data that could save us a lot of time and effort in validation.
As it is, we must spend months applying for ethical approval, comparing our device with the state-of-the-art device, then validating the data, and measuring what is an acceptable level of error before we can begin our tests.
Why could we not use existing inertial measurement unit data from a repository, whether it be from a chronic obstructive pulmonary disease study or from patients who have had a stroke and are being tested for rehab?
William: While I was Chief Scientific Officer at OptumLabs, we built a massive real-world database containing more than 100 million covered lives of claims data and 85 million lives of electronic medical record data. These data were also linked to patient mortality, social demographics, and county-level data from the Area Health Resource File from the Agency for Healthcare Research and Quality (AHRQ).
Academics around the country were keen to analyze that data to generate evidence. Real-world datasets can generate evidence much faster than randomized clinical trials (RCTs).
RCTs are valuable but observational data can generate very credible, useful information when appropriate methods are used to analyze it.
There is also a lot of interest from regulators, especially since the passage of the 21st Century Cures Act which requires the FDA to develop a framework for incorporating real-world evidence into regulatory decision-making about new indications for previously approved treatments, as well as safety surveillance. The FDA has been involved in safety surveillance for a long time with the Sentinel Network, which is a massive national claims database that it uses to evaluate the safety of existing products on the market. But using real-world data to assess new indications for previously approved products is new.
The FDA has shifted its mindset from focusing solely on evidence from randomized trials to considering what role real-world data can play in regulatory decision making about new indications and safety surveillance for previously approved products. The Agency is now considering whether it is possible to generate causal inference like it would have with a randomized trial from observational data.
There is a very large national study out of Harvard called RCT Duplicate, which is emulating 37 randomized trials; 30 of which have been completed. This study is using claims data to emulate the inclusion/exclusion criteria, endpoints, and follow-up periods for those trials.
Randomized trials are also conducted of products that are on the market to test their real clinical effectiveness. Observational studies are being run to predict what the results will be from those trials. It has been remarkable to look at the evidence and the degree of agreement between those observational studies and randomized trials.
William: With FDA approved products, such as glucose monitors, there is a time lag from when the regulatory approval occurs, and an approved Centers for Medicare & Medicaid Services (CMS) code is entered into the databases.
Without that code, researchers cannot identify what device was used to treat the patient. That initial coding issue places a major limitation on the analysis of digital device data by academic researchers.
Geoff: How do we improve the system? Without access to raw or minimally processed sensor data, we do not have digital biomarkers, hence the product cannot be used as a medical device, so it is not going to get a CMS code, which means we can’t use the data.
William: There are two sides to that issue. The FDA is a strong proponent of accessing data as close to the source and in as original a form as possible. The FDA wants to know the source and providence of data before it is included in commercially available databases.
But in cases where a device is not going to be used as a treatment but as a source of data on patient outcomes, we need to link the device data to other data to put the results into context.
Researchers face challenges with linking datasets due to HIPAA regulations, which are designed to minimize the reidentification risk to patients. The more granular the data gets and the rarer the condition the patient has, the more information researchers must relinquish to continue to mask that patient’s identity.
Dan: Pharmaceutical companies are involved in several collaborative, pre-competitive projects, including WATCH-PD, a Parkinson’s disease study, and TransCelerate’s digital biomarker effort. CTTI is also developing policy papers and research best practices to make data more interoperable and integrated. The Digital Medical Society (DiME) is working on a large number of collaborative measure development projects, including a nighttime itch-and-scratch quantification project using wrist-wearing accelerometry to analyze scratching behavior. DiME is also involved in direct risk factor research and establishing best practices across several domains in digital medicine.
William: Life sciences companies have been conducting secondary analyses of claims data for a long time to understand the burden of an illness better and support the commercialization of their products. Pharmaceutical companies analyze large claims datasets to determine the comparative effectiveness of their products and support their value propositions around pricing.
Geoff: Researchers should use minimally processed data, which is as close to the sensor as they can get, and can be independently verified. They should also identify a mechanism upfront that will allow them to share that data.
What advice would you give researchers who want to reuse real-world data?
James: They need to understand how the data was collected. For example, if they are going to do a 40-meter walk test, then they need to define the parameters they will use to collect that data. If we do not all use the same protocol, the data collected is going to be different, even though the test that was examined is the same. For example, the 40-meter walk test could be 1 x 40 meters, or 8 x 5 meters, and that is going to change the shape of and underlying patterns in the data.
William: The 21st Century Cures Act has really changed the game in terms of the FDA’s interest in real-world data. Regulatory authorities around the world are interested in this now, as are health technology assessment groups that make decisions about drug coverage, pricing, and reimbursement decisions.
There is a lot of existing data infrastructure on general patient populations in the United States and elsewhere. But we need sensor and device data to be brought together in one place where we can link it to those existing databases and bring in information about the patients, their diseases, and comorbidities to provide context.
From a technical standpoint, it is an easy problem to solve. If we have identifiers in this data that are required in the other datasets, we can link it without having to exchange names and addresses and social security numbers. We can just hash the identifiers in both places and link the hashed IDs together. Then we have a deidentified dataset that we can use that has the sensor and the device data in it.
The first big challenge is to centralize the device and the sensor data so that we can link it to other datasets.
James: We also need a reliable software system to control the connection between them. That can be a nightmare when it comes to trying to amalgamate data from several sensors. We need to ensure that all the amalgamations are happening accurately.
Dan: Be proactive regarding informed consent in research, and get permission to share data, even at an identifiable level. Go as far as your IRB will let you because I have not yet met a research participant who says “no” when asked, "Can we share your data to advance science as much as possible from your participation?"
Pressure test your assumptions about the boundaries which you consider to be the edges of a research question. Stop doing recurrent criterion validation against crappy legacy standard measures. If you find yourself replacing something low burden, quick to complete, not very useful, and free, such as the Patient Health Questionnaire-9 (PHQ-9), with something expensive, proprietary, and time-consuming, you are probably making a mistake.