Data Governance in Life Sciences
Catherine Hall, Head of GxP Quality Assurance, Egnyte
In life sciences, data governance is critical to ensuring data integrity, security, and compliance from research to post-market surveillance. This article explores the core principles, challenges, and optimal practices surrounding data governance in the life sciences, offering insights into how entities can harness their data's potential while navigating the intricate regulatory landscape.

In today's rapidly evolving digital landscape, data has become a critical asset for organizations across all industries, especially within the life sciences. However, the true value of data, both structured and unstructured, can only be realized when it is properly managed and governed. Data governance provides a framework for ensuring the availability, quality, usability, integrity, reliability, and security (AQUIRS) of any data, which then enables organizations to make more informed decisions, improve operational efficiency, and maintain regulatory compliance. Despite the pace of technology and focus on data, organizations still grapple with the corresponding imperative to implement data governance, often citing the existence or absence of regulatory requirements as obstacles. The importance of data governance however has only been reinforced by the authorities as seen within the latest revision of Good Clinical Practices. Now being adopted into law by the various health authorities around the world, ICH E6 (R3) highlights how data governance is not only a linchpin of good data management practises, it is now essential for GCP compliance.
Fundamentals of Data Governance
To understand what data governance really means and why it is essential, it helps to start with a clear definition. So, what is data governance? Data governance is a collection of strategic practices designed to enable organizations to manage key characteristics of the data such that it is findable, accessible, interoperable across the needs for it, and can be reused across its data lifecycle (FAIR). It involves defining policies and procedures applied across a data lifecycle as well as defining who is accountable, responsible, consulted and informed within those processes. In this way, establishing and maintaining data governance is much like that of a quality management framework. The similarities don’t end with policy alone, but additionally they are alike in the need for oversight and continuous improvement. Data governance cannot be a one-time project to be successful, but rather it must be an ongoing part of corporate strategy regarding how an organization works with and considers data from the point it is created, discovered, or collected to the point it is retired to storage or trash.

When executed effectively and maintained, a robust data governance framework supports not only better-informed decision-making, but it also helps to structure operational planning and execution to achieve accelerated timelines. Historically, however, many have encountered challenges in implementing effective data governance. This stems partially from the volume, variety, and velocity of data2 across divergent systems, as well as from the need for collaboration across multiple internal and external partners. As a result, data is very often fragmented, siloed, and easily lost in data sprawl. The sense that data is in a state of chaos can paralyze efforts to try and put data governance in place despite the risk of not putting it into place (Figure 1). To surmount these challenges, it helps to break data governance into parts and remember that data governance is a journey filled with continuous improvement. Wondering where to start? The following list identifies key principles of a good data governance framework along with some initial steps to take.
• Data Discovery: Identify and classify (including: risk according to intended use) data across the organization’s ecosystem. Taking an inventory of the systems utilized within your ecosystem and creating a library of what data lies within them helps to establish a foundational reference tool that will assist to identify what are the authoritative sources of data, how it is transmitted throughout the ecosystem, and areas of potential risk of data conflict or loss of traceability.
• Data Definition: Focus on key data points and develop standards for them, including the definition and adoption of standardized metadata that helps to ensure that the data has proper context and structure so that it can always be found and might later be reused. Further, begin to create standards for data formatting as well as access rights and content-sharing controls to ensure data privacy and security without compromising timely access to the data that people need to effectively collaborate. Once you know where your data is, having access controls in place also helps to outline oversight needs.
• Data Remediation: Determine how issues related to data quality and integrity will be addressed and resolved. This is where organizational silos must be dissolved as system issues often require technical expertise that many business process owners do not have. Collaboration between IT and business units is a must when handling data issues. Issue remediation regarding data should also be addressed within any Quality by Design (QbD) approach so that potential issues of data related to critical quality factors have pre-formed mitigation and action plans.
• Data Alerting: Identify how administrators and/or owners of the data will be notified of potential issues in real-time and develop communication channels and escalation pathways to enable prompt action. Here it is important to understand not only such pathways within your own organization but also those within any data processor that are involved within your overall data ecosystem.
• Data Reporting: Develop mechanisms that can provide insights into data governance progress and regulatory compliance. Written policies and procedures are not enough. It is evidence that they are being applied, and any issues of non-conformance are being addressed that any auditor is going to expect to see, but it isn’t just for them. Having insights into the effectiveness of your data governance policies and procedures form the basis for improvement.
• Data Retirement: Design of any good data governance framework should keep the end of the data lifecycle in mind. Data retirement however is often ignored until a trial is completed and suddenly no one knows where to put all of the data at the end so that it is still accessible and readable and cannot be altered through the long term of data retention requirements. Securely deleting or archiving expired data however, is an essential element of good data hygiene and should be planned for as early in the process of developing a data governance framework as possible.
Taken together, these principles form the bedrock of robust data management practices that serve to ensure data is: traceable across an ecosystem, secure from unauthorized access or use, compliant with regulatory expectations, and ultimately maintained with high quality, reliability and integrity for accurate, complete and consistent application toward data-driven decision making.
Best Practices for Data Governance in Life Sciences
Having explored the fundamental principles and challenges of data governance in life sciences, it's crucial to understand how to put theory into practice. The following best practices offer practical strategies to overcome common obstacles and establish a robust data governance framework:
Establish a GxP-Compliant Data Governance Framework: Develop a comprehensive framework that aligns with GxP regulations, ensuring compliance with regulations such as 21 CFR Part 113 and GDPR4 in addition to Good Clinical Practices (ICH E6(R3))1 for clinical trials.
Implement Risk-Based Data Classification: Categorize data based on its criticality to patient safety, product quality, and regulatory submissions. Based on this classification, apply appropriate security and management measures.
Automate Compliance and Data Integrity Processes: Leverage technology to automate compliance checks and notifications, verify data integrity, and apply ALCOA+ principles.
Foster a Data Integrity Culture: Promote a culture where data integrity is crucial for patient safety and product efficacy. Provide regular training on data governance principles and their impact on research outcomes and regulatory compliance.
Conduct Regular Audits and Regulatory Readiness Reviews: Perform systematic audits of data practices and of data audit trails to maintain a state of continuous inspection readiness.
Implement Robust Version Control and Change Management: Establish strict version control for all critical documents, including research protocols, regulatory submissions, and Standard Operating Procedures (SOPs). Implement a change control process that maintains data integrity throughout the product lifecycle.
Develop Comprehensive Audit Trail Systems: Implement systems that capture and retain detailed audit trails for all GxP-relevant data, enabling the reconstruction of research activities and supporting regulatory inspections.
Develop a Data Lifecycle Management Strategy: Create policies and procedures for managing data from creation to archival or destruction, ensuring compliance with retention requirements and facilitating efficient data retrieval for regulatory purposes.
Establish Clear Data Ownership and Stewardship: Assign clear ownership and stewardship responsibilities for different data assets, ensuring accountability for data quality and appropriate use throughout the organization.
Implement Metadata Management Practices: Develop robust metadata management practices to provide context, improve data findability, and support long-term data usability, particularly for historical clinical trial data.
For data governance to be successful, however, it must transcend mere best practice as the consequences of bad data can be catastrophic. In today’s drug development programs, data steers critical decisions that can impact both patient safety and product efficacy harming the future of any company. Beyond data-driven decision making, good data governance can be a key part of a company's competitive advantage providing a means to create innovative approaches and mitigate future challenges regarding intellectual property. Effective data governance when properly anchored can help guide organizations through managing their data assets responsibly and efficiently.
Building Data Governance Maturity
Data governance is a journey, not a destination. Life science companies need to continuously improve their data governance practices to keep pace with evolving business needs and regulatory requirements. A maturity model can help organizations assess their current state of data governance and identify areas for improvement.
A typical data governance maturity model includes the following stages:
• Initial: Data governance is ad-hoc and inconsistent.
• Developing: Data governance policies and procedures are being developed.
• Managed: Data governance policies and procedures are in place and being followed.
• Optimized: Data governance is fully integrated into business processes and continuously improved.
The path from one stage to another is best traveled by a cross-functional team that broadly understands how data is used and accessed within the current data ecosystem and can be free to envision and plan for something more. Like any continuous improvement initiative is helps to begin with a plan. The team should develop a plan that outlines the goals, scope, and approach of a data governance strategy. Then a network of Data Owners should be engaged to put strategy into practices. Similarly, Data Stewards should be identified who can adopt the responsibility to follow the processes, measure their effect on data quality and compliance, and report successes and failures so that improvements can be quickly applied. Like any good pilot program, test early and often to help ensure that the theoretical is practical and effective. This also means that measures, like metrics and key performance indicators (KPIs), directly target potential areas of future improvement so that clear decisions of whether to keep the strategy or amend it can be easily determined. Finally, as an organization progresses it is important to remember just like a Quality Management System, data governance must have a change control mechanism to continuously refine practices based on feedback and lessons learned.
Conclusion
For life sciences companies, the governance of data has never been more important. Data steers such critical decisions whereby the consequences of data being compromised is not only detrimental to business reputation and continuity but worse, the safety of patients. The significance of employing data governance frameworks is highlighted by its addition to ICH E6(R3) now being adopted into law across the world. But data governance is not merely a regulatory checkbox, it is a strategic imperative for life sciences organizations as it forms the cornerstone that assures trust in the veracity of data so that the value of data as an asset can be realized. As the industry turns toward the application of Artificial Intelligence, learning models demand clean, trusted data. Thus, a robust data governance framework ensures not only the strength of the data for informed decision-making, it can also be instrumental in developing operational efficiency and fostering innovation through mediums such as AI.
Companies can effectively begin their data governance journey to better manage their assets across their entire data lifecycle by adhering to key principles such as data discovery, definition, remediation, alerting, reporting, and retirement. Once more by applying best practices, including risk-based data classification and automated compliance processes, a data governance framework can further mature over time and help support inspection readiness. To reap the full benefits of data governance, organizations should strive for continuous improvement by not only integrating data governance into their business processes, but ultimately into their culture. With a core mission to expedite the delivery of safe and effective treatments for people in need, data integrity plus reliability, quality, and security is a critical quality factor to the success of any Life Sciences organization.
References:
1. ICH E6(R3), (2025) ICH Harmonised Guideline for Good Clinical Practice E6(R3), Available at https://database.ich.org/sites/default/files/ICH_E6%28R3%29_Step4_FinalGuideline_2025_0106.pdf
2. SCDM Topic Brief, March 2022, The 5Vs of Clinical Data, Available at https://scdm.org/wp-content/uploads/2024/03/SCDM-The-5Vs-of-Clinical-Data-FINAL.pdf
3. FDA 21 CFR Part 11, Guidance for Industry Part 11, Electronic Records; Electronic Signatures — Scope and Application (2003) Available at https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application
4. GDPR (2016), Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), Available at https://gdpr.eu/tag/gdpr