 How do planets form? Was Einstein right about his theory of relativity? These are the kinds of questions that the SKA observatory is trying to answer with their next generation radio astronomy facility, which will revolutionize our understanding of the universe and the laws of fundamental physics. This kind of research brings with it its own peculiarities when it comes to software delivery. Our first speaker today is Piers Harding, a software quality engineer at the SKAO with a keen focus on DevOps. He will talk about the project, their adoption of DevSecOps and cloud-native technologies to aid the SKAO's vision. Piers is an open-source advocate and a beekeeper with a particular interest in building large-scale data processing infrastructures. Take it away, Piers. Hi, I'm Piers Harding and I am the software quality engineer for the SKAO, the Square Kilometer A. Thank you for attending this session. This presentation is about how the SKAO is attempting to adopt cloud-native principles and tools in order to build the computing infrastructure to support our mission to explore the universe with the world's largest radio telescope. The SKAO Observatory is a truly global research endeavour that is dedicated to radio astronomy. It covers three continents, 20 dime zones and 14 member countries with over 100 organisations involved. Being a shared research facility makes it somewhat analogous to a factory that generates data for the radio astronomy community on a continuous basis to support their diverse and extraordinary scientific endeavours. There are an impressive number of observatories coming online early this century covering a large portion of the electromagnetic spectrum. These include telescopes in the optical, near-infrared X-ray, laser radio-millimeter and sub-millimeter, and gamma-ray regions. The SKAO is one of these and it will fill a wide range with the LFAA, the low-frequency aperture array, working from 50 megahertz up to 350 megahertz and the mid-array picking up from the 350 megahertz range up to 14 gigahertz. All at approximately 50 times greater sensitivity than previous observatories have been able to achieve for radio astronomy. So what are we doing? The SKAO will provide science as a service to astronomers and physicists around the world to help answer big questions. Things like how do planets form? Are we alone? Was Einstein right with the theory of general relativity? And then what is the role of magnetism and galaxy evolution and the structure of the cosmic web? And how do normal galaxies form and grow? And what are radio, fast radio bursts, and what haven't we discovered about them yet? Dark energy, dark matter, and the cosmic dawn and the epoch of re-ionization. All of these things we're going to try and provide answers to. The SKAO will enable novel approaches to solving these questions. And one of my most favorite things is the technique of using a network of pulsars distributed over the sky as clocks and beacons to detect distortions in space and ascertain the directions that these things travel in. The pulsars emit incredibly reliable pulses, so any variation of transmission will indicate interference. And when we use thousands of these sources spread over the sky, we will be able to observe coordinated patterns in this disruption. So things like gravitational waves come to mind. So what is the SKAO and how is it built? The SKAO consists of three sites and two telescopes, which are all combined into one observatory. While pre-construction activities have been underway for a number of years now, the official launch of construction is due to commence this year, and we have an estimated completion date in bringing things online in 2028. The mid-frequency telescope, which consists of 197 15-meter dishes, are based in the Karoo Desert in South Africa. The low-frequency telescope, which uses approximately 132,000 dipole antennas, these things look like short little Christmas trees, these are based in the Murchison area of Western Australia. Both locations have great radio-acquired characteristics, which is key to a successful site location. When the dishes of the mid-telescope are combined into subarrays, it can achieve baselines of up to 150 kilometers long. In the low-telescope, subarrays can attain baselines of up to 65 kilometers long. These very long baselines are a key part of the innovation of the SKAO and how it will achieve a new degree of sensitivity in radio astronomy. Finally, our headquarters are based in Jodrelbank in the UK on the UNESCO World Heritage Site, where the Level Mark 1 telescope is. It is an amazing site and well worth a visit, as when the Mark 1 moves, it is like something straight out of the Thunderbirds, and I feel enormously privileged to have an office that overlooks this. Because of the complexity of the international collaboration and the time scales involved with an expected operation and lifetime of over 50 years, the SKAO has chosen the route of working under the umbrella of a treaty-level IGO. To put this into perspective, this puts us in the same company as a handful of other large-scale collaborative projects, such as ESO and CERN. Switching to look at our computing requirements, the data flow challenge is immense. This is broadly divided into observational control, data acquisition and image processing characteristics. The SKAO is to operate as a service and can theoretically achieve 24x7 observations. This means that the image processing pipeline capabilities must outrun the visibility data ingest rates. If a single observation utilizing the whole of the mid-telescope runs for 6 hours at 7 terabits per second, then this is approximately 19 petabytes of data that must be stored and iteratively processed and reduced to the final image products. These image products are in the order of 5 to 7 terabytes in size. Creating observational image products is not enough, and we also need to meet the demand for distributing the data so that scientists can do the real science. This includes a program of upgrading global networks and establishing SKAO regional science centres that act as distribution and processing hubs for the scientists. From a software development perspective, our software project is truly global. We traverse countries, time zones, languages and cultures to meet the needs of the domain specialists involved. We underpin this effort with the Safe Agile methodology to guide the development stream and then back this up with management of platform and tooling through DevSecOps principles. Presently we have 19 Safe Agile teams with 150 odd participants, with most teams spanning institutions, countries and time zones, all in the one. The management of this has been challenging, but the agile approach combined with our software development, lifecycle and tooling and processes have been pivotal in our success to date. The SKAO certainly has its fair share of peculiarities. We are not only globally distributed, but involve a mixture of different research institutes, universities and private companies. We have a constantly changing developer profile. Contributors naturally join and leave the project according to their point of contribution. Firmware and software development are managed under the same umbrella, and we have hybrid firmware and software delivery systems. And to top it all, its research, we contract effort, not a finished product, reflecting the fact that some of what we are attempting to do has never been done before. And when we design and build, we have to consider a 50-year operational lifetime. And throughout all of this, we are attempting to adopt industry software development best practices and apply them to the research world. Against this backdrop to achieve this, the context is all about collaboration. We are open. We are open source and open protocols. We provide a framework and tools for contribution based on open tools and standards. We work as a distributed set of teams. We apply quality feedback cycles to really improve our delivery, especially according to DevSecOps principles. And we stand on the shoulder of giants. By this, I mean we learn from the teachings of the research and commercial software sectors and hopefully apply this to our own great benefit. For our software development life cycle, we follow the safe agile methodology with features and capabilities defining our units of work managed through JIRA. The development process is based around using merge requests, working on feature branches where the default branch is always the tip of development and tags forming releases are pinned off this. If we need to patch production, then we create release branches on demand. And these are obviously straight from the tags. We have a fully automated build test, publish and deploy process with centrally managed and curated artifacts housed in Nexus. We are then able to use Nexus and other solutions like Harbour as caches and deployment accelerators. GitLab is our glue for us. It's our social coding platform that binds together developers, our quality framework and standards, integration and our delivery. And it articulates this life cycle processes and standards that we promote to our distributed developer community. So what has CloudNAF offered us? It provides a target environment and framework for the developers to aim for. It enables the greatest possible portability from desktop to production for delivered software and insulates developers from production platform concerns. It gives us the latest possible practical hardware commitment. By this I mean because we can delay the choices about what hardware we need to run on because of the abstraction, we can defer cost, vendor decisions and access later hardware advancements potentially than you would otherwise if you had to make a hardware commitment up front. It lowers barrier to entry and reduces the time to share. So because containers are immutable obviously you can share your code and your data and you have a high degree of certainty that it will be successful by whoever picks it up at the other end. It also enables integration and testing in a completely virtual environment so that again is the abstraction of the developers from the platform. Our CI CD platform factory is Kubernetes centric. We use the Kubernetes GitLab runners for launching pipelines and use the GitLab native Kubernetes environments integration for managing our deployment environments. Pipelines build artifacts on every branch and publish intermediate artifacts to GitLab registry and finals to Nexus where further pipelines will run on a continuous basis to exercise extended artifact testing. This extended testing will be things like source code analysis, dependency checking, license checking, security checking, so on. And also we want to do advanced integration testing because we have an array verification process at the end for productionization. We also get great benefit from the native support provided by Elastic Stack and Prometheus integration for monitoring and logging. Overall it took us three months for the first iteration of adoption for GitLab. We do 1600 odd deployments in a 90-day program increment cycle and we have realized 100% cost saving. And GitLab has kindly added us to their GitLab for open source program so we receive great benefit from that. Looking at our testing strategy, our minimum viable product takes a long longitudinal slice through the entire system from data acquisition to image product delivery. We keep things common as much as possible between the two telescopes so as to maximize component reuse. Our two telescopes have regional data processing centers, one in South Africa and one in Western Australia. Taking a look at how we are attempting to map the telescope control and science data processing to cloud native principles and tools. Within the MVP the telescope manager or monitoring control is a common component of our design. When looking downwards it projects a control hierarchy from the operations control room where the scientists are down through the two telescopes to the antenna resources on the ground. This tree structure is fundamentally based on the Tango Controls framework and written in Python where each node of a tree structure or logical device applications controls portions of the telescope. These devices are numbered in their thousands and ultimately are components like chillers, motors, thermostats, tuners and beamformers and so on. For the cloud native implementation we have been able to neatly translate this application structure to Kubernetes resources where the Tango device servers which are nodes in the hierarchy are deployed as stateful sets. Each of the major sub-trees of the control hierarchy are encapsulated and employed in their own helm chart and these in turn are composed into different super deployments via nested helm chart dependencies. Each of the super deployments whether for desktop, integration or production scale deployment can be controlled from the same chart template using different values files for integration. These values files are used to inject and inherent different behaviors and characteristics into each type of deployment. Switching to look at the image processing and the science data processor the data processing workload challenges can largely be managed through divide and conquer. A complete observation data ingest must almost finish before the processing can start as the data must be iterated over in its entirety many times. For a six hour observation at seven terabits per second we have 19 petabytes of data. This can be divided into time frequency mosaic partitions and processed independently for some large portion of the processing loop. Tango controls are used for the job scheduling control and compute storage resource management. But at the same time of launch the executor calculates the job placement and then submits jobs using helm. These jobs could contain almost any execution engine such as Dask, MPI or Spark or something completely custom. The idea is not to limit the technology choice here. Our target environment is a scalable custom deployment of Kubernetes. To support our software development life cycle our target platform provides us with a universal abstraction for our developers to aim for and a fully integrated continuous integration and deployment solution. Our cloud infrastructure layer this is based on OpenStack but can be AWS, GCP or any other provider even bare metal. The software defined infrastructure is driven by Ansible. On the universal storage layer this is based on SEF. It provides file, block and object storage. Calico provides the pod networking layer and Rook provides the storage level abstraction layer. All our storage types are abstracted by storage classes such as NFS, block or SSD. This enables us to swap the storage layer out based on the infrastructure provider but retain the logical naming scheme for developers to address particular storage characteristics that they need so whether that's high speed or low speed or long term storage or whatever. Kubernetes and the standard resources are our universal application abstraction layer with Prometheus and Elastic Stack providing universal monitoring and logging. GitLab provides the glue for binding this together for our complete software development life cycle. Cloud Native at the SK, why a custom built and what is the value for us? Well it enables us to shift the deployment from infrastructure provider to provider whilst keeping a consistent target interface for our developers. We can swap the underlying resource implementations with minimal impact on our developers. Each of the layers can be swapped out to suit the hosting provider interchangeable so that we can take advantage of future innovations. For example, a new storage or networking solution or monitoring and logging option could be swapped in. It scales from the desktop to production and we have used this in anger through having implemented on two independent OpenStack deployments a bare metal cluster and AWS EKS all with little to no change required by our developers who have worked on these platforms. So what is next for the SKA? Well we want to automate extended quality checks. So we'll do source code analysis, bill of materials and dependency checking, extended application testing SAS security vulnerability and licensing checks that we have to do and also have code quarantine facilities. We would like to have a look at custom operators and we could potentially create first class resources for things like the Tango devices themselves or DAS clusters or MPI clustered applications for example. We also have exotic deployments so we can containerise firmware deployments and use that as a launching mechanism. There is also the possibility for managing exotic devices through custom device plugins within Kubernetes. So we have specialist APIs that do things like tile processing modules. So that is cloud native for the SKA and with that I'd like to finally give thanks to all my colleagues that have assisted in putting this presentation together and to GitLab for all their support of the SKA and I do hope that you all remain safe and well in these unprecedented times and the global difficulties that we are facing. Find out more about the SKA by following the links in the speaker notes that come with the slides of this presentation and you can also find my contact details in there if you have any further questions. So thank you.