Using Existing Data to Create and Validate Digital Biomarkers

Geoffrey Gill, MS, Founder and CEO, Verisense Health

Daniel R. Karlin, MD, MA, Chief Medical Officer, MindMed

James Connolly, PhD, Lecturer in Computing at Atlantic Technical University, Co. Donegal

William Crown, PhD, MA, Distinguished Scientist at Brandeis’ Heller School for Social Policy, and Management

Digital health measures have the potential to transform medicine from reactively treating acute illness to proactively managing disease. This discussion will explore how existing real-world and digital health data can be reused to develop and validate digital biomarkers. It will also highlight regulatory and clinical considerations, and patient privacy issues.

Digital Biomarkers

Geoff: The first challenge when implementing digital health technologies is achieving the adoption and acceptance of digital biomarkers, which can determine whether someone is getting better or worse and if medical intervention is required. A significant part of the problem is that researchers lack some of the data needed to develop and validate those biomarkers. We believe this gap can be addressed by reusing clinical data that is being generated already and by connecting it to real-world evidence.

The Challenge

Daniel: For digital biomarker development and validation to progress, three core tensions need to be addressed.

1. Reality Versus Models of Reality

Reality is too complex for human brains to understand, so we employ heuristics and comprehensible models as tools to help us interact with reality in meaningful ways. Our understanding of human biology is necessarily based on reductive models.

2. Richness Versus Reusability

Arguably the richness of human experience, and therefore the underlying basis for the field of psychology and psychiatry is partially contained in all the novels in human history. While one could get an incredibly deep and rich understanding of people by reading novels that level of richness does not make for useful diagnostic ontologies.

On the opposite end of the taxonomic spectrum are things like the Diagnostic and Statistical Manual of Mental Disorders (DSM), a highly specified and therefore reusable ontological system but without much in the way of descriptive richness. We must consider a similar balance with data collection strategies and remember that excessive focus on reusability sacrifices richness.

3. Validated Against What?

Clinical measures always need to be validated against something established to be clinically meaningful.

Concurrent criterion validation is the easiest to achieve because it essentially states you have an accepted meaningful measure of disease, and the new measure comes to the same conclusion when assessing the same person simultaneously.

But there is a much more critical type of validation – predictive clinical validation – which takes much longer to achieve. With predictive clinical validation, there is an established clinical state, and the new measure provides data about the likelihood of that clinical state occurring. Essentially it pushes state identification backward in time, ultimately defining a novel clinical state that is predictive of a known clinical state.

For example, by measuring blood pressure repeatedly in a standardized way in clinical settings in many people many times, and subsequently watching what happened to them clinically, researchers could recognize and define the clinical state of hypertension. Then by reducing a person’s blood pressure—treating their hypertension—it was possible to reduce the likelihood of negative clinical outcomes, now newly known to be consequences of hypertension, predominantly damage to end organs: the heart, the brain, the kidneys, etc.

However, with the more recent introduction of high-quality at-home blood pressure monitoring equipment, it became apparent that episodic, in-office sphygmomanometry was not as reliable or valid a measure of real-world blood pressure as initially thought. That episodic approach had inadvertently introduced a measuring error or measuring bias. As a result, it is likely that hypertension was both under and over-diagnosed. In essence, some people who didn’t need treatment were treated, and some who did weren’t.

For digital health data to be predictively clinically validated, researchers must determine how to take the same measurement in the same way in many people and then share that data between them in a pre-competitive or non-competitive environment.

Standardizing Measurement

Geoff: Can you use raw or minimally processed sensor data?

James: We cannot directly access totally raw sensor data. Software that controls sensors typically filters the data to improve its stability or to remove outliers. This can happen at the firmware level, or sometimes the sensor data goes through a summative process before it is transmitted to the host computer.

That said, the ideal scenario with any type of measurement is for the data not to have a filter applied to it, because filtering reduces the granularity of the data. Sometimes the filtering process creates summary information, reduces outliers, and tries to improve the data, to make our lives as consumers a bit easier, and to make the sensor data easier to work with.

In doing that, it reduces its usefulness. The sensors become functionally standalone units. We cannot compare the data from one sensor against another because we do not know what type of algorithms have been applied to them. Future data comparisons of research studies are no longer possible.

We want to get as close to the raw information as possible. We do not want anything to be applied to the sensor data, and for it to be transmitted as cleanly as possible.

We employ Euclidian Norm Minus One (ENMO), which is a typical measurement of physical activity, and uses the open-source GGIR algorithm. But we also use the six-minute walk test, and are beginning to look at maximum six-minute activity.

Measuring the maximum amount of movement that happens within a specific time, whether that is an hour, a day, or a week, is shaping up to be a better way of detecting light and sedentary movement.

One of the advantages of using an open-source algorithm, such as GGIR, is that everyone can use the same algorithm to process data and it can be independently measured no matter what sensor or device is being used.

Being able to reuse data is very helpful. Whenever we use any type of hardware, we need to validate its accuracy and reliability. If we could gain access to a global repository of raw data that could save us a lot of time and effort in validation.

As it is, we must spend months applying for ethical approval, comparing our device with the state-of-the-art device, then validating the data, and measuring what is an acceptable level of error before we can begin our tests.

Why could we not use existing inertial measurement unit data from a repository, whether it be from a chronic obstructive pulmonary disease study or from patients who have had a stroke and are being tested for rehab?

Using Real-world Data

William: While I was Chief Scientific Officer at OptumLabs, we built a massive real-world database containing more than 100 million covered lives of claims data and 85 million lives of electronic medical record data. These data were also linked to patient mortality, social demographics, and county-level data from the Area Health Resource File from the Agency for Healthcare Research and Quality (AHRQ).

Academics around the country were keen to analyze that data to generate evidence. Real-world datasets can generate evidence much faster than randomized clinical trials (RCTs).

RCTs are valuable but observational data can generate very credible, useful information when appropriate methods are used to analyze it.

There is also a lot of interest from regulators, especially since the passage of the 21st Century Cures Act which requires the FDA to develop a framework for incorporating real-world evidence into regulatory decision-making about new indications for previously approved treatments, as well as safety surveillance. The FDA has been involved in safety surveillance for a long time with the Sentinel Network, which is a massive national claims database that it uses to evaluate the safety of existing products on the market. But using real-world data to assess new indications for previously approved products is new.

The FDA has shifted its mindset from focusing solely on evidence from randomized trials to considering what role real-world data can play in regulatory decision making about new indications and safety surveillance for previously approved products. The Agency is now considering whether it is possible to generate causal inference like it would have with a randomized trial from observational data.

There is a very large national study out of Harvard called RCT Duplicate, which is emulating 37 randomized trials; 30 of which have been completed. This study is using claims data to emulate the inclusion/exclusion criteria, endpoints, and follow-up periods for those trials.

Randomized trials are also conducted of products that are on the market to test their real clinical effectiveness. Observational studies are being run to predict what the results will be from those trials. It has been remarkable to look at the evidence and the degree of agreement between those observational studies and randomized trials.

Regulatory Perspective

William: With FDA approved products, such as glucose monitors, there is a time lag from when the regulatory approval occurs, and an approved Centers for Medicare & Medicaid Services (CMS) code is entered into the databases.

Without that code, researchers cannot identify what device was used to treat the patient. That initial coding issue places a major limitation on the analysis of digital device data by academic researchers.

Geoff: How do we improve the system? Without access to raw or minimally processed sensor data, we do not have digital biomarkers, hence the product cannot be used as a medical device, so it is not going to get a CMS code, which means we can’t use the data.

William: There are two sides to that issue. The FDA is a strong proponent of accessing data as close to the source and in as original a form as possible. The FDA wants to know the source and providence of data before it is included in commercially available databases.

But in cases where a device is not going to be used as a treatment but as a source of data on patient outcomes, we need to link the device data to other data to put the results into context.

Researchers face challenges with linking datasets due to HIPAA regulations, which are designed to minimize the reidentification risk to patients. The more granular the data gets and the rarer the condition the patient has, the more information researchers must relinquish to continue to mask that patient’s identity.

HIPAA regulations,

Precompetitive Collaboration

Dan: Pharmaceutical companies are involved in several collaborative, pre-competitive projects, including WATCH-PD, a Parkinson’s disease study, and TransCelerate’s digital biomarker effort. CTTI is also developing policy papers and research best practices to make data more interoperable and integrated. The Digital Medical Society (DiME) is working on a large number of collaborative measure development projects, including a nighttime itch-and-scratch quantification project using wrist-wearing accelerometry to analyze scratching behavior. DiME is also involved in direct risk factor research and establishing best practices across several domains in digital medicine.

William: Life sciences companies have been conducting secondary analyses of claims data for a long time to understand the burden of an illness better and support the commercialization of their products. Pharmaceutical companies analyze large claims datasets to determine the comparative effectiveness of their products and support their value propositions around pricing.

Closing Advice for Researchers

Geoff: Researchers should use minimally processed data, which is as close to the sensor as they can get, and can be independently verified. They should also identify a mechanism upfront that will allow them to share that data.

What advice would you give researchers who want to reuse real-world data?

James: They need to understand how the data was collected. For example, if they are going to do a 40-meter walk test, then they need to define the parameters they will use to collect that data. If we do not all use the same protocol, the data collected is going to be different, even though the test that was examined is the same. For example, the 40-meter walk test could be 1 x 40 meters, or 8 x 5 meters, and that is going to change the shape of and underlying patterns in the data.

William: The 21st Century Cures Act has really changed the game in terms of the FDA’s interest in real-world data. Regulatory authorities around the world are interested in this now, as are health technology assessment groups that make decisions about drug coverage, pricing, and reimbursement decisions.

There is a lot of existing data infrastructure on general patient populations in the United States and elsewhere. But we need sensor and device data to be brought together in one place where we can link it to those existing databases and bring in information about the patients, their diseases, and comorbidities to provide context.

From a technical standpoint, it is an easy problem to solve. If we have identifiers in this data that are required in the other datasets, we can link it without having to exchange names and addresses and social security numbers. We can just hash the identifiers in both places and link the hashed IDs together. Then we have a deidentified dataset that we can use that has the sensor and the device data in it.

The first big challenge is to centralize the device and the sensor data so that we can link it to other datasets.

James: We also need a reliable software system to control the connection between them. That can be a nightmare when it comes to trying to amalgamate data from several sensors. We need to ensure that all the amalgamations are happening accurately.

Dan: Be proactive regarding informed consent in research, and get permission to share data, even at an identifiable level. Go as far as your IRB will let you because I have not yet met a research participant who says “no” when asked, "Can we share your data to advance science as much as possible from your participation?"

Pressure test your assumptions about the boundaries which you consider to be the edges of a research question. Stop doing recurrent criterion validation against crappy legacy standard measures. If you find yourself replacing something low burden, quick to complete, not very useful, and free, such as the Patient Health Questionnaire-9 (PHQ-9), with something expensive, proprietary, and time-consuming, you are probably making a mistake.

--Issue 02--

Author Bio

Geoffrey Gill, MS, is Founder and CEO of Verisense Health, a digital health software and data company, and Co-founder of the Open Wearables Initiative (OWEAR). Geoff’s goal is to accelerate the adoption of digital health technologies industry wide by standardizing digital biomarkers and making them useable for providers. He joined Verisense Health from Shimmer Research, the global wearable technology provider, where he served as President of Shimmer Americas. He received his MS in Management of Technology from the MIT Sloan School of Management.

Daniel R. Karlin, MD, MA, is Chief Medical Officer of MindMed. Dan joined MindMed in February 2021 when it acquired HealthMode, the company he co-founded and led as CEO. Prior to HealthMode, Dan built and led clinical, informatics and regulatory strategy for Pfizer’s Digital Medicine and Innovation Research Lab. He also served as Global Clinical Lead for psychiatry clinical compounds at Pfizer. He is a strategic advisor to several pharmaceutical, biotech and health technology companies. Dan is board-certified in psychiatry, addiction medicine, and clinical informatics. He is also an Assistant Professor of Psychiatry at Tufts University School of Medicine.

James Connolly, PhD, is a lecturer and researcher in computing at Atlantic Technical University in County Donegal, Ireland. James’ areas of expertise include digital healthcare and connected health, big data analytics, and artificial intelligence. James has a PhD focused on wearable technology and computing from Ulster University.

William Crown, PhD, MA, is a Distinguished Scientist at Brandeis’ Heller School for Social Policy, and Management and an internationally recognized expert in real-world data analysis. He specializes in research designs and statistical methods for drawing causal inferences from transactional health care datasets such as medical claims and electronic health records. Bill began his career at the Heller School in the early 1980s focusing on the demography and health economics of aging and he taught a series of statistics courses in the Heller PhD program. He left Brandeis to lead health economics consultancies at Truven and Optum. He received his PhD in regional economic modeling from MIT, and an MA in Economics from Boston University.

Thermo Fisher Scientific - mRNA Services