3 Open code in the NHS
3.1 Introduction
There are number of organisations and publications advocating open coding in its various guises within the NHS and more generally across the public sector and academia. This page is intended as a place to collate relevant literature and proposed/tried approaches to implement open code policies in the NHS, from individual teams up to entire organisations.
There is also a significant crossover between open code and software engineering best practice, which means that they are often promoted together as an effective means to improve reproducibility.
3.2 References
The following is a list of some of the material available discussing and supporting open coding and software engineering approaches to code development in the NHS:
Better, broader, safer using health data for research and analysis (also known as The Goldacre Report) - a systematic and far-reaching report, written on behalf of the Department of Health and Social Care, advocating for open coding and Reproducible Analytical Pipelines (RAP) in the NHS,
Government Analysis Function Reproducible Analytical Pipelines (RAP) Strategy - released in June 2022 this is a comprehensive strategy for Government in summarising the finding of the RAP strategy since 2017 and the plan for future work,
Data saves lives: reshaping health and social care with data - a broad document that covers many aspects of “improving trust in the health and care system’s use of data” which extends to how data is used and references the Better, broader, safer using health data for research and analysis Better, broader and safer using health data for research and analysis (Goldacre),
Office for Statistics Regulation – Overcoming Barriers to Adoption of RAP - a report written in support of RAP for adoption by all government departments doing analytics, describing the challenges and recommending solutions to address those challenges covering both organisational, team-level and individual barriers,
Open sourcing analytical code - from the Analysis Function and first published in March 2023 is guidance for anyone working in government and who is involved or owns analysis pipelines,
Quality assurance of code for analysis and research - also known as the Duck Book is a living document for Analysis Standards and Pipelines for the Office of National Statistics,
NHS England Open Source Programme - can be found in the Digital Transformation pages,
Government GitHub Community - links to UK Central and Council GitHub repositories amongst many others across the world,
NHS England R Reporting - includes tutorials and example A&E Attendance Report,
3.2.1 From organisations/teams that no longer operate
NHSx (now NHS England) Open Source Policy - a comprehensive description of why and how open source should be implemented in the NHS, including statements about best practice and a check list for open sourcing code. An issue was opened in Nov 2022 about rebranding to NHS England from NHSx,
NHS Digital (now NHS England) RAP Community of Practice - a wealth of material pertaining to setting up and running RAP. The team also write blogs,
Making source code open and reusable - from the Gov.uk Service Manual for the Government Digital Service,
CDU Data Science Blog - the Clinical Development Unit was a team in Nottinghamshire Healthcare NHS Foundation Trust that coded openly and wrote about some of their approaches on a {distill} blog site. This also features in the NHS England playbook which is a series of case examples, and which was itself referred to in the Data Saves Lives document:
Our commitments We will publish a digital playbook on how to open source your code for health and care organisations. Guidance on where to put the code, how to license and maintain it, and best practice for working with suppliers will be published in addition to case studies of teams who have done this – completed May 2022.
3.3 Specific challenges
3.3.1 Not feeling psychologically safe
This section started off with addressing the technical and information governance issues around sharing code in the open but the greatest challenge, regardless if these are resolved, is that of feeling safe to share.
Quite often people may think or say:
- my code is not good enough
- I’ll get into trouble for sharing code
- I’ll make a mistake and accidentally share something I shouldn’t
- no one will be able to run the code as they don’t have the same data
Jonny Pearson gave a lightning talk on this subject at the NHS-R Community Conference 2022.
There is no magic solution to this. It takes time, support and practice to feel “comfortable” with sharing in the open but it is achievable
NHS-R Community has grown into a learning community where it’s possible to connect with others who have been through the stages required to share, have made the mistakes and have learned the solutions. It’s also a place to try out the technical skills using existing projects which don’t have any connection to sensitive data as there is no better way to learn than trying (and making mistakes with no reputational risk to yourself or your team!). Our GitHub repositories are open to contribute and are full of mistakes to correct or just feel ok with adding to. Ultimately, NHS-R Community a space to challenge those common anxieties.
Another resource, which was built for code reviews which can also help tremendously with sharing in the open, is the Code Review Anxiety Workbook which details what anxiety looks like and how to challenge it. The anxiety for code reviews or sharing in the open isn’t different from any other situation that can cause anxiety but what this may help with is challenging our expectation that experienced developers never experience it (not true) or just not realising that this is a particular thing and we are not alone in these feelings.
3.3.2 Opening code without exposing data/‘secrets’
In coding terms exposing secrets often mean sharing passwords or keys but for the data analysis that occurs in the NHS and other public sector organisations, this will also mean the sharing of personal, clinical or other sensitive data. In practice there are a number of approaches to avoiding this happening, see the specific NHS approaches:
- NHSx Open Source Policy (note NHSx is now NHS England but this hasn’t been adopted as an NHS England Policy),
- NHS England’s R Reporting) and
- CDU Data Science Team blogs about GitHub
- Making source code open and reusable (provides broad requirements like storing secret keys and credentials separately)
- NHS North Central London ICB Analytics Team - use a template repository for new analytical projects and further information is detailed in the slides for Intro to Git
and these include setting up repositories so there are automated restrictions to prevent and reduce the risk of unintentional disclosure of any sensitive data.
Nevertheless, we do have to rely upon people using these recommendations and setting up their systems which is very similar to our use of email or social media accounts. As an example, with email we need to rely upon senders using the blind copy in email if the recipients include patients or the public as there is no automated way to prevent a breach of emails otherwise. Consequently, mistakes have and do happen and there are mechanisms which are in place to report these breaches and does not mean that we prevent people from using email or social media. Systems like GitHub do offer the possibility to set up processes to prevent, reverse and delete accidental sharing (breaches) but are still subject to reporting for near misses and incidents when these occur. However, it may be that these are only reported locally and do not, currently, get shared more widely.
3.3.3 Opening code without giving away commercial/business/clinical proprietary information (and Intellectual Property)
The issue of intellectual property is a very difficult one to argue against, particularly when analysts and data scientists are not necessarily trained or familiar in this area of law. However, it is often used as a reason why not to share specific code, like SQL, but without any formal detail as to how this will be breaking any laws. By publishing SQL code it is often assumed that the structure of the databases, which is owned by a company, could be inadvertently revealed. However, there are examples of clinical database schemas like that of TPP who own SystmOne and which has been shared as part of the OpenSafely secure analytics platform for NHS electronic health records and it could be said that discussions related to sharing code should be considered as part of the procurement process. And also Government guidance, When code should be open or closed, from the Central Digital & Data Office states:
Unless your database schema contains credentials it is not sensitive and can be open. You should store any credentials separately and keep them closed.
3.3.4 Concerns over code being “sold” back to the NHS at a profit
This is an argument that is often used and not necessarily with any examples of when this has ever occurred. Nevertheless, if it did happen we can explore that as the end result is not necessarily a reason to not share publicly.
3.3.4.1 Adding value
The courses from NHS-R Community are all open to the public and can be used by anyone to learn, teach or amend. Nevertheless, they are being used by organisations, like the Strategy Unit (part of NHS Midlands and Lancashire Commissioning Strategy Unit), who use the materials and offer training sessions at a daily rate for the workshops. This means that, although the materials are free, the services of the person running the course is being paid for and that’s an added service.
The same may be the case for code that is built upon, so if a model is taken and enhanced, then that’s an added service. Of course, that subsequent code may not be made available to the public, but the original will be so it’s upon the purchaser to make that judgement on whether a service that leads to closed code is a good thing to purchase and our suggestion is that it’s always considered an important part of procurement that code is made open.
3.3.4.2 Selling back the same code
Taking the idea of adding value further, what if a company got hold of freely shared code and sold it back to the NHS? Under these circumstances that is unfortunate but the mistake that is made here is on not looking for the code in the open in the first place and that could be related to the level of knowledge of the staff on open code. A better remedy would be promoting the use of open code and supporting staff in this professional field to use and contribute to open code areas like GitHub.
3.3.4.3 Taking ideas and selling them back
If there were a situation where something is written, for example, a proposal which is taken up by a consultancy and sold back verbatim, this is again unfortunate as there is an unnecessary cost to public funds. However, there is a point that the sharing was beneficial because the idea was worthwhile. Many consultancies already work on the basis of talking to staff when given a project and gaining information from people at all levels, something that may not have happened internally. Ideas from people “doing the work” do not always go as high as the board but can get there more easily when filtered through management consultancies.
If we are concerned about being sold our own work then, by putting it out in the open, we are also allowing discovery of it by everyone.
3.3.5 Coding in the open
There is, currently, no one policy or procedure that can be referred to in terms of coding in the open. Whilst it has been recommended strongly in many of the important reviews available to the NHS like Data Saves Lives and Better, broader and safer using health data for research and analysis (Goldacre) there continues to be little guidance on what this looks like in day to day work. Some teams have published in the open, like the CDU Data Science Team and features as a case study example, however, the team has changed in approach and name as of 2023 and the blog site is no longer being updated. The NHSx Open Source policy recommended that all development/analytics work done in the NHS be coded in the open unless there is good reason not to. However, NHSx moved into NHS England in 2022 and the GitHub repository (where this is published) and continues to state:
Current status: Version 1.0 - This policy is currently going through internal review as part of the adoption process.
It is worth noting, in terms of open coding, this document also says:
An internal code review should be conducted for all open source projects and specific responsibilities must be met within or close to the development team.
Reviews could be deemed to slow the process of sharing code openly and so needs to be considered within the context of software code maintenance which is a subject in itself. Books on code review principles from the Tidyteam at Posit can be a really good place to start.
The Making source code open and reusable suggests making your code open from the start and gives a short check list on what to consider when making existing code open.
3.4 Licences/Licenses
3.4.1 Language
To clarify on spelling: license is both a noun and a verb in US English, but in UK English licence may be used for a noun and license as a verb. This becomes important for searches, which is why the header title here has both spelling - just in case.
The GDS Way says:
When using a proper name, use the appropriate spelling for that thing (e.g. the MIT License.)
but also:
Each repository should include a licence file. This should be called LICENCE or LICENCE.md.
3.4.2 How to use licences
Although documents like Data Saves Lives states there was an intention to:
publish a digital playbook on how to open source your code for health and care organisations. Guidance on where to put the code, how to license and maintain it, and best practice for working with suppliers will be published in addition to case studies of teams who have done this – completed May 2022.
this hasn’t led to any particular document to explain when or how to use licences. The reference to a digital play book where the requirement is to:
create a project that can be used by other healthcare organisations
links to case study scenarios which may have information, but which will require reading through to check and reference. One site that is listed is the CDU Data Science Team and whilst this team used licences there was never any explicit mention on how or when to use them.
Government documents like the Duck book do have sections on copyright and licenses and their recommendations, along with the Analysis Function, are to use MIT and OGL.
The Making source code open and reusable states:
You should publish your code under an Open Source Initiative compatible licence. For example, GDS uses the MIT licence.
All code produced by civil servants is automatically covered by Crown Copyright.
However, many people who are part of the NHS-R Community are not considered civil servants:
… civil servants are defined much more narrowly than public sector workers: police, teachers, NHS staff, members of the armed forces and local government officers are not counted as civil servants.
Even where a person’s work is covered by being a civil servant, it is still really good practice to include the corresponding licence to any published work as it cannot be assumed that people know the work is covered by the licence or that the person was a civil servant.
Also, whilst some organisations in the NHS are considered Crown bodies like the Department of Health:
Subsection (7) provides that, like NHS trusts, NHS foundation trusts are not Crown bodies: see paragraph 18 of Schedule 2 to the National Health Service and Community Care Act 1990 (“the 1990 Act”).
Health and Social Care (Community Health and Standards) Act 2003 Explanatory Notes.)
3.4.3 NHS-R Community use of licences
The use of licences by NHS-R Community for the GitHub repositories will be detailed in the NHSR-Way book under Style Guide for code.
3.4.4 Setting up licenses
There are two ways to starting a repository for sharing on GitHub. Firstly, there is creating a repository directly on the GitHub website and you can select the appropriate licence as you do that.
If, however, you create the project locally, using R for example, you can add a licence to it using a package called {usethis} and typing in the R console:
usethis::use_cc0_license()
usethis::use_mit_license()
OGL though does not appear in the selections for {usethis} or GitHub. Crown Copyright sometimes appears as a separate licence as in the {a11ytables} and sometimes combined as in pay-aminusers, an example referred to in the GDS Way.
Dual licences, like MIT and CC0 are not so commonly combined and there is more information on the GitHub site about dual licensing.
3.5 Useful blogs
GDS The benefits of coding in the open
GOV.uk Don’t be afraid to codein the open: here’s how to do it securely