 Good afternoon everyone. My name is Francesco Giannoccharo. I work in Public Health England as head of high performance computing and infrastructure. Today I will take about 15 minutes of your time to talk about how we are using open source technology to deliver modern public health services. I want to talk very quickly about what Public Health England does. It is an executive agency of the Department of Health in the UK, which provides local government and the public with evidence based professional scientific expertise and support. It is an organisation that has been established in 2013 merging public health scientists, bringing together public health scientists from about 70 organisations in one single body. Currently PAG has 6,000 employees. Most of them are scientists and public health professionals. The mission of Public Health England is ambitious and inspiring at the same time. The PAG mission is to protect and improve the national health and will be reducing health inequality. And we put effort in delivering the mission through role leading science, knowledge and intelligence, partnership, delivering specialist public health services. The organisation came together in 2013 and since the beginning the effort from a technology point of view was focused on supporting the scientific community within the organisation. The PAG deliver a wide range of public health services that span from research and scientific publication based on statistic mathematical model such as a special meta-mopulation model for transmissible disease like the normal flu but also more aggressive pathogen like Ebola or the coronavirus that unfortunately during these days is on the news for the outbreak in China. As well as research and scientific publication through predictive models applied to anthrax as well as inference problem able to infer the likely size of outbreaks and the location of source and special extent etc. Another area of service that PAG deliver are around pathogen genomic service that we deliver to hospital based on the whole genome sequencing for essentially pathogen identification, pathogen typing, surveillance and outbreak investigation. Essentially PAG receive biological sample from hospital and use those technology, the genome sequencing technology to analyse the biological sample and identify the pathogen that possibly is affecting patients that can be of course aggressive pathogen. In three years between 2014 and 2018 PAG has analysed more than 100,000 bacteria and virus genome. In addition we deliver service directly to the public through campaign to increase awareness around cancer, obesity, smoking and other well-being behaviour. The technology ecosystem that was used in PAG initially was very much focused on a restricted number of proprietary technology primarily to support the business as usual type of IT. We in this effort of supporting the scientific community started looking into a new set of technology and we wanted to stay focused on open source more than proprietary because we see open source being very much in line with the mission of the organisation making as open as possible the science that we work with and keeping open standards so that the results can be easily shared with the scientific community around the world. And I will talk very lightly about those technology. The requirements as I said initially were around improving scalability and cost efficiency for HPC for high performance computing type of workload which was used since the beginning in Public Health England primarily in three departments. Bioinformatics which has been using high performance computing to process and analyse DNA for diagnostic and surveillance of infectious disease. This type of workload is very IO intensive so the workloads manage large amount of data so the environment has to be capable of facing high throughput. Another area that is making use of high performance computing is statistic modelling and economic where we run real time models and simulation to predict expected pandemic disease dynamics supporting national vaccination policy and control of antimicrobial resistance so understanding essentially how the mutation of bacteria and virus are developing resistance towards antibiotics. And the third area is the emergency response which has been using high performance computing to run simulation to better understand ahead of time the epidemiological and social behaviour and how this can potentially increase the risk posed by infection disease threats including bioterrorism. So the open source technology that we have identified to support this specific niche and high performance computing has been OpenStack. OpenStack is a cloud technology specifically infrastructure as a service type of cloud technology and it's been the first new technology that we have been introduced but we also scope out a larger program covering a number of different areas. As I say, PAG produce a very large amount of data. It's a data drive and organization so for us was very important. One of the project was aiming to facilitate data discoverability data browsability and shareability essentially giving scientists the ability to add as many metadata they wanted to each file and data set they have been working with. And for that specific project we have deployed an open source technology called IRODS that make use of a cloud on premise cloud technology based on S3. The third requirements and project was related to improving automation and cross-platform orchestration. So in addition to the traditional virtualization platform that we were using since the beginning the Hammer, Overt, Rev and then OpenStack we started also using a little bit public cloud environment like AWS Azure and Google Compute and the idea was to provide a single pane of glass for user to be able to deploy system in each of those underpinning environment from a single set of API and a single web front end. And we have chose for that project an open source technology called ManageIQ. ManageIQ is the upstream name and cloud form is the version that we use which is supported by Red Hat. And the last area and project was related to deploying a platform as a service to support containerized application. And for that project we chose to use OKD OpenShift and that is one of the topic of this presentation. I think it's important to again emphasize the amount of data that the organization managed because clearly the decision in terms of technology that we chose have been dictated by the amount of storage. So when we talk about using private cloud technologies versus public cloud technology, hybrid cloud, multi-cloud, you are driving in choosing one to the other on the basis of the requirements. Clearly if you have to move a petabyte of storage in a public cloud environment there are constraints and the cost model are slightly different than in case where your workloads is CPU intensive. The amount of data in the space of life science is constantly growing. We produce about 25 petabytes of data worldwide every year and the amount of data related to sequencing DNA is doubling every seven months and is taking over other scientific topics like astronomy. The migration and introduction of open source cloud technology both on infrastructure as a service level or platform as a service level can be challenging especially if you don't have since the beginning the right skill set. So we have approached the introduction of this technology specifically using this user case related to high performance computing. High performance computing is itself a technology that since the beginning is designed to allocate resource in an elastic way. So the software stack that you have in an HPC environment already has a job scheduler capable of looking at the available resource in your cluster and allocating job to the nodes that are available. So the introduction of open stack has been very easily to deploy and to support these type of requirements because essentially the three bare metal cluster that we had in public health England since the beginning instead of relying only on the bare metal compute nodes when those compute nodes were fully saturated they were able of burst additional compute capacity on this shared on-premise cloud environment running on open stack. And once the job were executed and completed the cloud instances that was deployed to be part of the cluster were released and made available for other workload. But the type of requirements solved with the use of infrastructure as a service type of cloud wasn't the only requirement we had in public health England. And after that we start looking into how to make more cost efficient the set of hardware and resource that were used by the legacy application. Essentially the web application that we use to present to the public and to share with other organization the results of the research and analysis that we do. So that results of course is shared through internet and we use web application to share those results. PAG had about 100 web application considered business critical that were essentially results of product commission and delivered in the previous year. The design most of the majority of them has a monolithic application. So the one of the work that we were looking to do was taking those legacy application and wherever possible moving to an environment that is more efficient in terms of resource usage. And this clearly is very much related to the version of the application that you need to run. So there are applications that rely on legacy library, legacy programming language run times and keeping those legacy systems in virtual machine sometimes pose risk in terms of having the operating system also behind because if the machine was updated that updating process was going to update library and therefore breaking the application that was using the legacy library. So the user container has been identified to solve this issue. There are similarity between the requirements that we have in the web application with the application that we use in HPC meaning that in the HPC environment we also use different version of pipeline, different version of workload which rely on different version of library. In the HPC we use modules traditionally to manage different version of library but we started looking to containers as well in that space and container engine specifically for HPC are currently being developed by the open source community one of those engines being singularity. The building of container is a process that can be automated or can be, you know, requires several steps. The automation in the container lifecycle is one of the most relevant aspects in managing the lifecycle of the application of the containers. The use of the container engine itself doesn't provide the mechanism to automate the container lifecycle. So there are a number of other technology around the container engine I don't know, Docker, that need to be leveraged in order to automate the process and the lifecycle of the containers. We have been seeing the technology that we were using in other space very much useful in that process and OpenShift is integrating many of this technology in one single platform very well integrated. So we were using already, for instance, GitLab not only to the versioning of the code but also to trigger operation when code is moved into stage production branch and so on. So the use of the learning curve that we were facing was having benefit in using OpenShift because in addition to the container engine you have since the beginning a number of other tools that are already integrated. That set of tools made possible to automate the application lifecycle so the building process of the container essentially when a new version of the code is pushed into GitLab in our GitLab that trigger through our webbook operation on OpenShift that then take care of rebuilding the image, publishing the image in the register and then having the running application being updated with the very latest version of the image that has been built. The security aspect are also very well managed by the platform. Situation like Public Health England where sensitive data are managed is impossible to think of using a public registry like Docker Hub because the risk of having vulnerability in the container is very high. So having a register that is constantly scanned for security vulnerability is another functionality implemented in this platform that made possible the use of this technology within PHE. So you have in one single environment the automation process of the application that we started with focusing on legacy application wherever possible migrating into subset of container but of course also supporting completely new web application that are designed to have a cloud-native architecture that is the beginning to using this approach of microservices since the beginning. So the developer are empowered and facilitating in maintaining the application lifecycle but also the infrastructure team, the operators have significantly simplified their work to the level of integration and automation that is provided in this platform because when you think about deploying an application you have to think about networking aspect, assign a set of IP address, assign DNS to that application provide to that isolated environment share volume to all the container parts that need to read the same information all these operations are deployed and orchestrated in a very integrated way within the platform and this give us the ability to deploy on premise as well as off premise. So at the moment we are using the platform essentially for two projects. One is in the space of surveillance meaning surveillance of heartbreak and this type of environment may need to scale very quickly across large set of machines so the ability of deploying essentially having a level of portability that containers offer clearly may make the difference in this situation so we currently run those type of workloads on premise but the freedom of redeploying open shift of premise in a public cloud give us that level of freedom that is difficult to have in other solution and that essentially is I think the powerful aspect of those technology is made possible by the effort of the open source community and I think this is the main message that I would like to give the ability of delivering this service is made possible by the effort of many people around the world that work on open source technology and really we take for granted this many times but if you think about what our everyday life would have been if the worldwide web was patented or if the human genome was the intellectual property of a single company which almost happened clearly the entire society as we live today was going to be different so a big thanks to the open source community to Red Hat to support this technology and thank you for listening