Data and research governance – it’s a complex issue

18 January 2025

A FAIRytale by Liz Merrifield – a personal story about good data governance and data controllership from someone who understands the researcher’s perspectives and frustrations.

My name is Liz Merrifield, and I have been working within the field of Research and Information Governance for almost two decades. During this time, I have had first-hand experience of seeing the evolution of what data means to both researchers and patients, and how great data practices can lead to truly innovative treatments, benefitting patient outcomes.

The need for good governance and good practice

For those of us who work in governance, it is often not the individual research projects which interest us, but the process by which high-quality data is generated and processed, who takes responsibility for the data throughout the project lifecycle, and how, as researchers, it is vital to demonstrate trustworthiness to the public, funders and Regulators. Furthermore, the need for transparency is even more important as we usher into a new era of research where the community looks to integrate and interrogate data like never before. For me, this is the most fundamental part of research.

Researchers can often get frustrated by “the governance” and its nuances, but these structures are there for a reason. To this point, I would like to start by making a simple analogy about trust and trustworthiness which I often find helpful when encouraging colleagues to understand why data protection rules exist, and encourage more objective thinking: I wouldn’t lend the keys to my car to a colleague and simply hope for the best. It would be reasonable, and indeed advisable for me to ask a series of questions beforehand: questions such as, can you drive? do you have insurance? why do you need my car and what are you going to use it for? how long will you need it for and will you bring it back to me once you have finished? will anyone else use it? and, most crucially, what are you going to do if there are any accidents and if anyone gets hurt – will you mend it and tell me the truth about what happened? For the research community to be trusted when accessing and processing sensitive healthcare data, then they must be able to answer the same set of questions, provide credible evidence that these concerns are addressed, assess and mitigate any risks, and put in place steps should anything go wrong and prevent future recurrence. If they can’t, then should we trust them with our precious commodity?

If the research community are to apply the FAIR data principles, we must first ensure that the data are generated, curated, analysed and stored in a way which can support this, and that their actions throughout these processes are documented. Without the fundamentals applied at the outset, researchers can’t reliably produce data which conforms to FAIR principles, and the question of trust reappears. To return to the previous analogy and here reference the onward use of data e.g. in the instance of secondary analysis, data integration or the production of multimodal datasets – buying a second-hand car can be risky, but safeguards can be put in place to document activity and reliability (e.g. checking the MOT certificates and access to a full service history) and in turn, provide assurance to partners of the reliability and robustness of the asset for onward use. 

Questioning the status quo through healthcare research achieves progress in developing new treatments for patients, and is what researchers are good at. In fact, in terms of being disruptive, researchers excel at this, however it is important that for credible research to be conducted, it must be produced whilst working within the parameters permitted by current legislation and ethical guidelines. Within this article I hope to provide a high-level introduction to the requirements with which researchers must comply, and signpost to useful resources to take you on your journey to understand this area more. In order to perform more interesting and challenging science, Researchers must have an understanding of what these parameters or requirements are for particular research interests, particularly where it involves accessing or processing sensitive data. It is also important to acknowledge when they are at the limit of their understanding and engage with governance colleagues who have the necessary expertise to advise on the specifics of legislation, and how they apply to a particular research question. 

Data protection and the GDPR

The UK Data Protection Act 2018, is the UK implementation of the General Data Protection Regulation, and both became effective in 2018. It is extremely important that researchers adhere to the requirements of data protection principles when handling personal data. There are strict rules around how researchers handle personally identifiable data to ensure it is:

  •  It is used transparently, with a lawful basis, and fairly
  • The usage is for specified, with explicit purposes
  • It is used in a way which is adequate, relevant to the research being conducted and limited to what is necessary to the research
  • Data is accurate and kept up to date
  • Retained only for the purposes of the reasons stipulated, and not for longer than necessary
  • Handled in a way which ensures security, including protection from unlawful or unauthorised processing, access, loss, destruction or damage. 

Failure to comply with the data protection rules, means that researchers may not be able to access specific datasets, and perform their analyses. 

Lawful basis

In order to process personal data, the legal requirement is that there is a lawful basis, again the ICO website provides helpful information to allow you to understand what your lawful basis should be. Many companies rely on consent for handling data, and academic institutions rely on their “public task” as the reason to process personal data as the research is being performed in the public task that is set out by law. The Data Protection Officer for your institution should be able to offer appropriate guidance on lawful bases.

Privacy notice

This document is required by law to explain why personal data is being processed. This information should be written in the right tone, and Again, the ICO have guidance on how to produce this document, and a really good training video too. It is important that this document is an accurate, and contemporaneous reflection of what activities are being performed on the data, and how it is being handled. 

There is a lot to get your head around here, and it’s not something which can be easily condensed into a few lines of text, and may be challenging depending on the research question being asked. However, I have outlined below some of the key considerations when obtaining data, to ensure that processing may go ahead. Again, links to key documentation and further reading for each of the points below can be found at the end of this article.

For researchers to utilise personal data, it is important that consent is given by the donors of this data. The Health Research Authority offers a suite of resources to assist researchers in establishing whether a particular question constitutes research. Furthermore, this incredible resource provides guidance and information on research protocols, patient information sheets and consent forms and templates for each of these documents. The HRA are a group who understand the iterative nature of research, and that ethics isn’t a static concept – more that it evolves with time. Where there are particularly challenging research questions, which push the current guidelines and rules, researchers should always seek to engage with their colleagues within the HRA to ensure that it is being conducted robustly and compliantly.

There are additional considerations for paediatrics partaking in research, and how this is managed at the point a child becomes an adult when they reach the age of 16. Effective management of their data is required, and re-consenting the patient may be necessary to continue to hold or process the data. Where re-consent is not possible, it may be necessary to gain additional approvals to retain this data. Again, maintaining good oversight of data throughout the course of its custodianship is incredibly important in preventing issues of non-compliance and loss of trust.

Common law duty of confidentiality

As we previously mentioned, there must be a legal (or lawful basis) for access to personally identifiable information, and consent from a patient if researchers wish to access a healthcare record where it is not for their purposes of direct care. The NHS recognise the important role research provides to advancing healthcare treatments, and that consent cannot always be obtained. There are many reasons by which reconsent is not always possible, but where the requirements of GDPR and the data protection act terminate at the point of a patient’s death, the common law duty of confidentiality still remains. 

Therefore, through the HRA’s Confidentiality Advisory Group (CAG) they have devised a process by which researchers can apply to have access to specific data for patient populations of interest. A CAG approval is an approval made under section 251 of the NHS Act 2006 and its current regulations, the Health Service (Control of Patient Information) Regulations 2002. This means that the Common Law Duty of Confidentiality can be lifted temporarily so that confidential patient information can be disclosed to a third party (i.e. the researcher). Again, there are strict governance processes to be followed to obtain CAG exemption approval, which must be adhered to throughout the duration of data access. 

In the same way consent can be given, it can also be taken away. To respect a patient’s or an individual’s wishes, we must ensure that there are procedures in place to no longer process data relating to an individual. Therefore, there need to be procedures in place to understand what happens to the data at the point of capture, and each of the touchpoints right up to final archival. Should a particular data donor (e.g. patient) no longer wish their data to be analysed, the responsibility rests with the organisation to enact these requests or provide a robust reason why this cannot be achieved. To return to my previous analogy (this is the last time, I promise), by being able to unpick this flow of data, then it provides the necessary assurances to the public and again demonstrates trustworthiness.

Anonymisation

There are circumstances where consent from an individual to conduct research does not need to be sought, and it is possible to look at multiple data modalities for a group of individuals. The predominant method of conducting research in this manner is by accessing the data through a Trusted Research Environment (TRE). These are controlled environments which usually provide linkage from different data providers (NHS, ONS, screening services etc.) to anonymised patient-level data. They are tightly controlled research environments, which adhere to standards such as the Five Safes and the Information Security Standard ISO27001. Privacy enhancing technologies are used to separate individual identifiers (e.g. patient name, address, date of birth, NHS number) from the data itself, and therefore fall outside the terms of the GDPR as:

“…The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.”

More information can be found on this topic on the ICO website.

Through providing a platform allowing access to anonymised, population-level, linked data (without needing to gain individual-level consent), whilst minimising the risks of re-identification, TREs are becoming the go-to environment for research involving sensitive patient health and care data. They enable researchers to perform their analysis within this controlled environment, and export aggregate summary-level data. Again, the robustness of this anonymisation provides assurances to patients, the public and regulators that their confidential and sensitive data is being handled in a trustworthy manner. The Goldacre Review, published in 2022 focuses on the vital role of TREs in the future, for integrated research, and offers a number of recommendations on how to make these systems more innovative and safe to enable research for patient benefit.

Within this article, and have supplied some links, so you can read more about these topics in your own time, andI would like to take this final opportunity to encourage you to reach out to your governance colleagues within your own institution as they can offer you the guidance you may need. You will find that there are numerous individuals who will be happy to talk to you about the subjects outlined here, the local policies and procedures and how all of this relates to your research interests. I am confident in saying that if you need their help and support, they will gladly assist you in producing, and evidencing high-quality, robust and compliant research.  

Many thanks,

Liz

Data protection

Data controllers and data processors

Trusted Research Environments


Funding contributors include Health and Care Research Wales, Cardiff ECMC, Roche. Affiliations Wales Gene Park and Digital Transformation Innovation Institute, Cardiff University