Big Data Analytics - A Revolution in the Pharmaceutical Industry.

Michael N. Liebman, Managing Director, Co-Founder at IPQ Analytics, LLC

Catherine Hall, Head of GXP Quality at Egnyte

Christopher Bouton, Ph.D.: Head of AI, SVP at Certara

Big Data Analytics

"A Pharma Focus America's initiative fostering exclusive and open discussion on industry trends”

Welcome to our panel discussion on "Big Data Analytics - A Revolution in the Pharmaceutical Industry." Our esteemed panelists are leading experts driving transformative change in pharmaceuticals through data analytics:

The Discussion Ahead:

In this panel discussion, our panelists will explore how big data analytics is reshaping drug discovery, clinical trials, patient care, and regulatory decisions. Join us as we uncover the challenges, opportunities, and ethical dimensions of leveraging data to advance healthcare.

Let's begin the conversation!

With the abundance of real-world patient data, how can big data analytics drive the identification and validation of new therapeutic indications for existing drugs, leading to faster and more cost-effective drug development?

Michael N. Liebman: I would actually rephrase this question and put the goal first…to achieve faster and more cost-effective drug development, how can we best use real-world patient data and big data analytics. When we do this, we are better able to focus on the current challenges and then see where these approaches can be applied to reduce them. Faster and smaller trials could be achieved by recognizing that most diseases as defined in clinical trials inclusion/exclusion criteria and, even earlier, as used for initial target selection are really complex disorders under an umbrella diagnosis and code which contribute to their high rate of failure for lack of efficacy. Real-world patient data, i.e. EMR data not claims data, should be used to stratify the disease to enhance target selection and establish the potential patient population size to evaluate commercial potential much earlier and more accurately. Furthermore, clinicians recognize the differences between real-world patients and clinical trial patients and addressing greater alignment of patient recruitment in trials with real-world patients could enhance the commercial opportunity beyond development cost-savings.

Personalized medicine heavily relies on patient data. How can the pharmaceutical industry strike a balance between utilizing patient data to tailor treatments while respecting patient privacy and complying with evolving data protection regulations?

Catherine Hall: I think to answer this question you have to first consider that some things have not changed when it comes to data security and privacy. When patients volunteer to participate in a clinical trial they heavily rely on the trial investigator, sponsor, and subprocessors to ensure their personal information is treated with the confidentiality and do all things possible to protect it from harm. Personalized medicine, decentralized trials, and electronic capture of patient outcomes now make complete de-identification insufficient for protecting trial data. However, today’s technology is able to isolate and obscure data based on multiple parameters, so it is not a matter of needing to strike a balance but rather a need to have thoughtful intent and planning focused on the data itself. With careful mapping of data across the various e-clinical systems utilized in the trial, appropriate technical and operational measures can be put in place to effectively protect the data not only from any potential threat but also restrict its access to only those that must know the information from the point of data collection to data archive. The industry is thus best positioned to ensure at the foundation of any trial that there is a cross-functional risk assessment that works to identify what data is truly required to be collected and ensuring that there are appropriate and tested measures in place to secure that data from end to end across the e-clinical network of computerized systems.

Christopher Bouton: I believe that there are a number of ways in which the pharma industry is already attempting to strike this balance. That said there is going to be continually more and more data generated and that data can help to develop better, more specific and efficacious medicines on a personalized level if done correctly. As a result, the pharma industry will need to continue to work with patient communities and individuals in order to properly protect patient level data and adhere to evolving regulations.

The integration of real-world evidence (RWE) is gaining traction in regulatory decision-making. How can big data analytics enhance the reliability and acceptance of RWE as a complement to traditional clinical trials in drug approval processes?

Michael N. Liebman: We have to understand that RWE is typically comprised of real-world clinical data, e.g. EMR’s, lab results, and real-world business data, e.g. claims data and pharmacy records. The latter commonly reflects clinician’s need to address reimbursements and minimize denials of claims, and frequently (more than just frequently) does not reflect the true clinical picture of the patient that is more comprehensively captured in the EMR and clinician notes. This does not invalidate either data source, only differentiates how their analysis should be interpreted and presented. Current coding, e.g. ICD-10 codes, are an attempt to bridge this gap but are not adequate to reflect the true complexity of the patient and the disease and frequently present challenges to the physician by requiring them to place the patient and their evaluation/diagnosis into a “box” in which they do not really fit.

Catherine Hall: I think the key to accepting any data is to understand not only the data itself from origin to report, but also the context in which the data was collected, transformed and stored. For example, utilizing the average blood pressure of an adult female easily becomes muddled when one data set was taken during from patients participating in an early pregnancy trial, and another data set is from a diabetes trial for geriatric patients. Good data analytics platforms at their core help to retain the contextual metadata in sync with the data itself to help assure that the final data is well understood. In this way, data previously collected from other sources, can be effectively utilized given the assurances to accurately represent a data set that would be otherwise collected within the parameters of the trial it will be compared to.

Advanced analytics can optimize clinical trial design, recruitment, and monitoring. How are data-driven approaches reshaping the design of adaptive trials and enabling the industry to make quicker go/no-go decisions?

Christopher Bouton: We’re actually just at the very beginning of a wide range of novel advanced analytics approaches that can be applied to clinical trial design, recruitment and monitoring. Fundamentally these approaches have to do with better pattern detection at multiple levels. For example, we can now use GPT / large language model (LLM) driven AI approaches to do a better job of clinical trial outcomes aggregation and associated model based meta-analysis. There are also faster and more efficient ways to apply generative AI to all of the documentation necessary for trials and regulatory filings. These approaches of course need to be overseen by experts in these fields in order to make sure that the technologies are generating the correct outputs. This is the very beginning of a decade worth of these types of advancements in trial design and implementation which will hopefully yield better outcomes and more efficacious medicines for patient populations.

Big data has the potential to empower patients in their healthcare journeys. How can the pharmaceutical industry leverage data analytics to enhance patient engagement, improve treatment adherence, and ensure equitable access to therapies?

Michael N. Liebman: There is increasing recognition of the importance for considering and incorporating social determinants of health (SDOH) into patient management, from engagement and diagnosis to treatment adherence. It is also being applied to understand how to make access to care more equitable. While this is a critical step towards incorporating patient-centricity, to a certain extent it is missing some critical perspectives, namely patient culture and trust. SDOH tends to establish criteria and guidelines and makes efforts to close gaps that may exist but does not adequately address the reality that and individual, and population group, may possess significant cultural differences in prioritization, etc that can limit such SDOH approaches. An additional level of complexity comes from the patient’s perspective of trust, trust in what the physician says, what the drug “promises”, and largely what has been the “historical” experience. It is critical to recognize this in efforts around “DEI” which need to focus more on the patient perspective and not solely “checking the box”. We need to do much more of listening to the patient and to their trusted physicians to understand how to achieve these goals.

Christopher Bouton: We first need to refer back to the first question and state that of course the use of big data in this context must be conducted with the utmost focus on patient data privacy and protection. That said, the “voice of the patient” is much louder and more prevalent now. Multiple forums including social media, telehealth, doctor / patient secure messaging and other means allow for the healthcare system to learn much more about what is important to patients and what their challenges are. In a similar manner to the last question, the core additional activity that we can apply now to these forms of data is better, AI powered pattern recognition. Instead of having to build ontologies or heuristics to find particular patterns in the data, we can now take all of that messy data and use GPT / LLM models to sift through the content to find important patterns and unexpected discoveries. In order to do so, novel architectures for running these technologies on data in a secure, behind firewall manner are going to become critical. These systems also need to be specialized to these areas so that they can find the most salient patterns in the data.

AI-powered algorithms are becoming essential in pharmacovigilance for adverse event detection. How can the industry ensure transparency and accountability in these algorithms while addressing concerns about bias and algorithmic decision-making?

Catherine Hall: I am going to separate bias from algorithmic decision making and address the latter first. Whether information is human generated or computer generated I think what we all want to know is how the decision was arrived at as being able to reconstruct the way the decision was made is a core principle of good Clinical Practices. I feel the issue is not in the idea that an algorithm made the decision in as much as that may times it is not obvious how the AI came to its decision. For example, if you ask an AI to group trials into groups, you may be expecting to see the trials arranged by Phase of Development or therapeutic area or patient population, but instead you might find the AI grouped the studies in a way that makes no sense to you at all. Does this result imply AI is not yet reliable, or does it mean there is yet another way to organize the data that you had not thought about? Reality is if you cannot understand how AI arrived at the result, you are more likely than not to discount it. This applies to bias as well. Bias is based on our experiences we have had, and a developed preference to outcomes based on those experiences. If an AI is trained to learn on a particular set of data, it is going to apply its experiences to arrive at similar outcomes. Just as we value diversity and having a difference in opinion to debate the problem, AI, to be more trusted, will need to be exposed to a diversity of data to ensure that the problem has been analyzed from a variety of experiences. It is only through a variety of experiences treated equally can bias truly be eliminated. The question is, if humans are choosing to expose the AI to data, it will bias toward the data being adopted. Most likely it will and just as we parse our trust toward others based on our knowledge of their experience, AI too will be judged on the diversity it has been exposed to.

The pharmaceutical industry generates diverse data across research, clinical, and commercial domains. How can companies effectively integrate and analyze data from disparate sources to gain holistic insights for informed decision-making?

Catherine Hall: If we are to make bigger strides toward wholistic data sets we must come together as an industry on the adoption of true data standards. More often than not the issue with data integration and compilation of large data sets is that the data must be transformed to one standard or another. When working with highly precise data, these transformations can introduce inconsistencies depending on the data mapping or other processes employed. Even something as simple as time can be difficult to align considering all of the various time zones rules, not to mention the precision of the tools used to collect it. While CDISC has gone a long way to begin the process of standardization in the Industry, there are plenty of exceptions to the rules that still exist. The standards also cannot just apply to the data points themselves but also to metadata associated with the data. As stated before, to reliably trust in the data, one must have assurances of the context around the data. It that context is not clearly associated, it cannot be well understood and may not be considered reliable.

Christopher Bouton: What we’re seeing is a move away from traditional data integration strategies that have to do with centralized storage of data from across an organization (e.g., data warehouses and / or data lakes). Instead, more flexible, modern architectures such as data fabrics allow for an organization’s data to be distributed and instead of trying to bring the data to the architecture, we can bring the architecture to the data. We’ve seen significant benefits of these kinds of approaches both for data integration as well as for the application of advanced AI analytics against the data stored in those systems. Furthermore, AI analytics approaches can help with the derivation of insights from messy data as well as helping to clean and organize data even when entity naming, and other attributes of the content aren’t controlled. This is because these novel analytics approaches are far more robust to variants than traditional data cleaning and harmonization / normalization approaches.

As data becomes a valuable asset, collaborations between pharmaceutical companies and data-driven tech firms are increasing. How can these collaborations ensure a win-win scenario for data providers and pharmaceutical innovators while maintaining data security?

Michael N. Liebman: These collaborations are potentially invaluable…but! There is an absolute need for data security and there are increasing use of federated data and learning models that can help maintain and comply with both privacy and provenance issues. A greater concern is that these efforts focus on “more is better” whereas Mies van der rohe stated that “less is more”. In an effort to collect more data, i.e. to apply AI/ML, etc, there is a. tendency to minimize the need to emphasize interoperability (among databases) over potential quality issues. Most, if not all, databases inadequately annotate the data within a data field especially in clinical data where the testing/laboratory modality and/or specific algorithm used to measure or compute is not included in the annotation. The example I use is Glomerular Filtration Rate which estimated, not measured, as eGFR. There are 5 different equations to generate eGFR, each developed for a different reason and with different populations, and two incorporate a “race adjustment factor” which is incorrect. Incorporation of eGFR into typical data analytics without recognizing these differences can only contribute to increasing the noise, i.e. inaccuracy, of the results.

Catherine Hall: Data being an asset is the key mentality to focus on here. So often data is taken for granted as a necessary item sought after for other means, but data is not as often treated as a true asset in itself. For many pharmaceutical companies, the value of data is that it extends beyond one single use. Did this trial succeed or not?

The technology industry however understands the value of data as a reusable asset. Through partnership and collaboration, the technology sector can help the pharmaceutical industry refocus their vision of data and structure it in such a way that can maximize its value. By moving more and more toward data standards and cross network data management practices, the measures needed to protect and secure data can also be more reliably applied. The relationship between people, process and technology is one that is often referred to, but as the data has grown in complexity from the growing volume, variety and velocity, gaps between these three factors have grown as well. It is only through true partnership that expertise can be better shared, processes can better align and technology can better support to ensure the quality controls are in place that allow data to be verified and trusted to hold its value.

How can predictive analytics enhance post-market surveillance, assist in forecasting demand, and inform strategic decisions in drug lifecycle management, ultimately leading to optimized resource allocation and patient safety?

Michael N. Liebman: Predictive analytics, if based on enhanced disease and patient stratification as noted earlier in this discussion, can be much more successfully applied in post-market surveillance because a more accurately “labelled” patient/market will have been identified for the product. Also, as noted, earlier stratification of the disease and application in target selection, will lead to better defining the potential patient population in terms of size and specific characteristics. These could naturally lead to better resource allocation to patient groups in these target populations and the physician who treat these patients. Thiis approach could dramatically reduce the preponderance of mis- and missed diagnoses, inappropriate testing and inappropriate prescribing that currently exists.

Christopher Bouton: As noted in the responses to other questions on the panel, most of this comes down to the data being analyzed, how one is bringing that data together, what sorts of patterns one is looking for and what kinds of analytics approaches you’re using to identify those patterns. This is no different in the post-market space where a better understanding of how a therapeutic is performing in the market can help to inform areas such as adverse event reporting for safety all the way to therapeutic product demand for supply chains and resource allocation. We see the advent of both data fabrics as a novel more flexible and modular form of data integration along with AI analytics for robust pattern detection as two critical advances in helping to extract signal from noise for better decision making in these spaces.

Given the rapid evolution of big data analytics in the pharmaceutical industry, where do you foresee the most impactful and exciting developments occurring in the next five years? How do you believe these advancements will shape the future of healthcare and pharmaceutical innovation?

Michael N. Liebman: I believe that the increasing cost and inefficiency in drug development and its impact on cost of drugs to patients presents a significant opportunity to apply new approaches and technologies to the benefit of all…the patients, the physicians, pharma/biotech and payers. This will require greater collaboration among these groups and an emphasis on understanding both their “surface needs” and carrying out root cause analysis to identify the real core issues. I believe that many efforts accept what we have and know today to be the “gold standard” rather than simply the “standard of care” and as a result typically drive the use of new technologies to facilitate reproducing these standards rather than questioning them. An example of this is to use evolving digital technologies to. Not simply monitor alerts against existing standards but rather enable new ways to stratify disease as a process, not a state. There is a need to incorporate true “critical thinking” and “systems thinking” into identifying these root cause issues in medicine and then applying the appropriate technology to address the question (or modify existing or create new methods) rather than look to apply the next shiny object. More cost-effective and disease/patient-specific drug development will follow on improving the accuracy in medicine.

Catherine Hall: I am not sure we are as of yet within 5 years of this, but I have often thought a great advancement in our industry would be the development of smart contracts within a life sciences blockchain. The ability for companies to share select data sets that help to advance our industry and stimulate innovation would be priceless. If supply chains exchanged temperature data across international shipment lanes, transport companies might readily adapt to climate changes or innovate new shipping containers that, in turn, help the supply chain reduce product losses. If hospitals exchanged more real-time data regarding the demand for saline, the suppliers could readily adapt to the changes in demand and help hospitals avoid shortages. There are mountains of data that have been collected for such limited purposes that, if shared, could universally help make a difference in how we operate as an industry. To get to that point, however, we need to handle the basics of data exchange and step up our practices in how we collect and store data so that it could be one day more readily shared for these kinds of advances to our industry. Acceptance of real-world data in clinical trials by the regulators is a good first step that will help motivate the industry to think of data on a new playfield of being a true reusable asset.

Christopher Bouton: I would point to the development of the Covid vaccines as a seminal moment in the pharma industry. We saw the entire industry come together to apply novel technologies, modalities and business approaches to accomplish something that was practically unimaginable not many years earlier. Between novel AI analytics, more flexible data integration architectures and greater volumes of data available, I believe that we’re hitting a tipping point in technological progress that will help to fuel paradigm shifting advances in the speed and effectiveness with which we can address our most pressing therapeutic and healthcare challenges. At Certara we are focused on applying these types of approaches across the entire pharma pipeline from early basic research to regulatory filings and post-market access. It’s a very exciting time to be working in the industry and I’m personally very excited to see what the next 5 to 10 years brings.

As we bring this insightful discussion to a close, we extend our sincere gratitude to our esteemed panelists, Michael N. Liebman, Catherine Hall, and Dr. Christopher Bouton. Your expertise has shed light on the transformative power of big data analytics in the pharmaceutical realm.

Looking ahead, the fusion of data and pharmaceuticals promises an exciting era of innovation.

--Issue 02--

Author Bio

Michael N. Liebman, Ph.D (theoretical chemistry and protein crystallography) is the Managing Director of IPQ Analytics, and Adjunct Professor of Pharmacology and Physiology at Drexel College of Medicine, Research Professor of Biology at University of Massachusetts-Lowell and Adjunct Professor of Drug Discovery, First Hospital of Wenzhou Medical University and also Fudan University

Cat Hall is a well-recognized expert in the Life Sciences industry sharing her unique insights acquired over 20+ years and diverse background. She began her career in academic science in the area of Molecular and Cellular Biology before transitioning into Pharmaceutical Supply Chain and then into e-clinical software development. Her perspectives have been deepened with experiences in process improvement, strategic relationship leadership, learning and development, product development, data analytics, data privacy, and quality assurance. Cat recently joined Egnyte as the Head of GXP Quality, where she is excited to help the industry realize how information, both structured and unstructured, can be more efficiently managed with assurances of high quality and in compliance with regulatory expectations.

Dr. Christopher Bouton is SVP and Head of AI at Certara. Prior to Certara, he was Founder & CEO of Vyasa Analytics (acquired by Certara in 2023), an AI analytics company focused on drug discovery. Dr. Bouton is an author on numerous scientific papers and has been covered in a wide range of industry news articles.

Thermo Fisher Scientific - mRNA Services