 Yn ystod, mae'r amser yw Rachael Wannockot, rydyn ni'n fforddellusio. Mae'r fforddellusio yn ystod i ti wedi'u cael ei wneud y gynhyrchu cyn ymgyrcholol, ar gyfer y Ffoudry Llyfrgellau Llyfrgell, rydyn ni'n gwybod amgylcheddol. Rydyn ni'n gyfrifio'n meddwl am fod yn gweithio. Felly, mae rydyn ni'n yn ystod, rydyn ni'n gwybod i chi'n ceisio'r bywyd. Rydyn ni'n gweithio i chi'n ceisio'r bywyd i chi'n ceisio'r bywyd dywedd o Ffidelisio Llyfrgell. Byddwn ni'n dwylo'r llwyddoedd Cymru, yn ffordd iddo nifer 69, byddwn ni'n dwylo'r dweud o'r teimlo sydd oedd o'r byd ac i ddweud o'r ors, byddwn ni'n dweud o'r £315.6b i gael ei gael o'r gwlad a'r ddefnyddol i gael o ffidelisio Llyfrgell, pension funds, insurers, private corporate banks all over the world in reasons such as Asia-Pacific, Europe, South America and even Africa. So that's a bit about us. We started our cloud foundry journey about four years ago when we were a technology-driven organisation that was looking to reduce our time to market. It was really as simple as that. And the appeal of cloud foundry lied in that it lets the app developers only care about the applications and the data without caring about the underlying infrastructure. But additionally, we really like that cloud foundry is governed by a foundation and would give us the opportunity to go open source at a later date should we choose to do so. Now, in general, as a team and as a company, we were really excited about the idea of going open source to share ideas, to share code. But we didn't have any experience of managing APAS. So in those four years, we've come a really long way. Last year, the disruptive principles that underpin our APAS service were nominated for an award at the Financial Services Technology Awards. We presented at summit two years ago with a keynote on how we put our product team together. So we're a DevOps team. And last year, our really wonderful product owner Emma Hammond, who sadly isn't with us today, was awarded the Cloud Now Women in Cloud Innovation Award 2017, a little bit of a mouthful. But we're really proud of what we've achieved and I'm personally really proud to be here today to represent my team and tell you all about our journey. So in the beginning, it was really difficult for us to know what the future of the platform would be, what would the adoption of the platform be like. We were introducing a very new concept into what I would argue was a very traditional landscape and it was difficult to estimate how many applications would want to migrate and of those that did at what rate would they want to migrate. What would the traditional landscape mean for us? How would we transform from traditional deployment methodologies to cloud native ones? Ultimately, our entire journey has been underpinned by one theme and that theme here is capability. So when I say capability, what do I mean? I mean two things. I mean the capability of the platform team and I mean the capability of the developers. And as I said in the beginning, it was difficult to confirm what it would mean for us to run a Cloud Foundry platform. So we wanted to ask ourselves some questions. So how highly available would this platform be in practice? What monitoring should we put in place? How would we manage the dependencies between services? How would we upgrade the platform? How would we assure and secure reliable new services? And perhaps most importantly, this one at the bottom, how efficient would our app devs be in housekeeping their instances? Now all of these questions are valid questions for any cloud platform, but what brings them all together in a common theme is that the things that a vendor would usually assure for you. So without any concrete estimates for the answers to these questions, we also could not predict the peak load on the platform. And at this point in our journey, the maturity of the platform team was also quite low. We were brand new to the technology. We didn't have any experience. And the result of both of these factors was that our risk appetite was very low. And so a supported commercial offering was simply the most comfortable option for us. Over the first two years, it became very apparent that APAS was very popular. Many existing business critical applications had migrated to the platform. And it was a really popular choice also for greenfield projects. So we had a bit of a challenge. And this challenge is something that I will affectionately call the bloat, hence the puffer fish. And by the bloat, I mean a swell of platform utilisation. When our legacy apps first began to move to the cloud, they were not optimised to be run in the cloud for a couple of reasons. We had monolithic architectures. We were utilising memory intensive languages. We had long release cycles. We even had separate development and testing teams. And people were quite uncomfortable with the concept of spinning up and up and down on demand, both their applications and also their dev environments. And this kind of meant that we had this really big peak load, this swell of the platform. Over time, with experience and education, people started to split out their applications into microservices, which was fantastic. However, with this came a demand to split out these larger orgs that had previously been representative of apps into smaller org spaces that were representative of shared microservices that could be consumed by multiple applications, which is great. But the problem with our original licensing model, this was very expensive. So how could we encourage good behavioural change when it came at a price for the app developers? So over those first two years that I've just been talking about, we'd built a great confidence in ourselves as a platform team and our ability to manage the platform. And the demand for the PAS was showing absolutely no signs of slowing down. So we needed a solution that would scale more favourably with our organisation. So we started to investigate what it would mean to fully adopt Open Source Cloud Foundry. How did we do it? Well, our team likes to eat biscuits, but unfortunately this problem was a little bit bigger than a few gingerbreads. So really, how did we do it? A move to Open Source was a huge risk and this risk needed to be evaluated. And from an enterprise perspective as an FCA regulated company with very little appetite for risk, we were traditionally accustomed to using vendors and making use of support contracts. So Open Source felt very uncomfortable. The two main concerns that we had were one, how would we take code from the internet and package it for internal use? And how would we support this platform as a standalone team? Our platform hosts AAA plus rated applications, and therefore we really had to ask the questions of security for what it would mean to code in the Open. And that involved conversations with both legal and security teams and they gave us a list of things that we would have to be able to do. So we could have no degradation in the service, no introduction of risk. We needed to be able to maintain our security assurance. There could be no change in the head count and no downtime during the transition. So how do we persuade the company that we could do all of those things with a team of about six to eight people? Well, the persuader came with the benefits of Open Source themselves. When we first went Open Source, we were a little bit worried that it was going to be leaving the comfort of the commercial cruise liner and getting into a small speedboat and driving off by ourselves. But actually being part of the Open Source community is, yes, you might be a small team in a small boat, but there are lots of other teams and lots of other boats and you're all supporting each other. Open Source code, hopefully you agree, is robust due to the volume and the enthusiasm of the contributors. Projects are all consumer driven and we can add features that meet customer demand. So by being a member of the Open Source community, we also had the opportunity to drive focus, which was exciting. Additionally, as a team and as a company, we were keen to give back to the community. We are silver members. And in the early days when we were looking at Open Source, we were making customisations on our local that we weren't able to share with the community by making upstream pull requests. So this is something that we're doing now. So if you want to check us out on GitHub, by all means, please do. Right. So it was more than just the Open Source community, it was about trust. And we, the platform team, trusted the Cloud Foundry code. So if you consider this from an architectural perspective, the Open Source platform would really be no different to the commercial one. The only difference we could see was that the responsibility to package and release the platform would now be ours. And initially people were quite nervous about this as they thought that it would bring change to the platform, but actually from the perspective of the developers, they should have seen no change whatsoever. So what about security? How would we secure taking the code from the internet? A lot of people have asked us about this today, so this is a point I want to highlight. Actually, I would argue that our approach to security is now infinitely more flexible that we're with Open Source. This new responsibility to package our own releases makes it possible for us to choose to upgrade individual components. Therefore, if we are to hear about a security or notice a security fix for vulnerability in a component upstream, we are able to upgrade just that component. We don't have to wait as a team for a vendor to package a release. We can update just that component. And this reduces the time it takes us to secure our platform. Equally, we can make the executive decision on what components to upgrade and when. Do we want to do a large release and upgrade all components in one go? Or do we prioritize individual components that have known defects or vulnerabilities? Equally, if there's a customer demand or a development demand for a particular component, we can also upgrade that to meet their needs. We are now autonomous in our ability to prioritize and schedule. So hopefully some of you will recognize this. So in addition to demonstrating that there were no architectural differences between the commercial and the Open Source platform, we needed to be able to demonstrate that we could deploy and operate the platform ourselves. A commercial cloud foundry will have an assured upgrade path, so we needed to be able to replicate this. I.e. we needed to be able to reliably deploy, monitor and upgrade the platform. So at this point, we were using another Open Source tool called concourse to run automated monitoring jobs against our platform. And we could already see the benefits at this stage of concourse as a tool, the benefits of automation, and we could see the potential to use it for what it's really intended, which is CI CD pipelines. And when we started writing our pipelines from scratch, we had a look and assessed our current environment. So back in 2015, in the Frankfurt summit, we had a presentation about how we deploy cloud foundry to two data centers and have that highly abasable across multi data center. We're now actually deploying to six different data centers internationally. And because we do this, we are able to temporarily offline a data center so that we can make changes to it, be that maintenance or upgrades and reroute all app traffic to the other side. So there's no interruption to the service for our customers. Right, this is my favorite slide. So there are only two things in our world, and those two things are configuration and code. And both of these things should be version controlled, peer reviewed and stored in Git. Everything we do as a team follows on from this principles. If you take anything away from my talk, please remember this slide. So our deployment pipeline is the core unit of everything that we do day to day. This pipeline is using concourse, which is a CI CD system with the pipeline as declarative configuration. We practice peer programming, so any change is viewed and re-reviewed by multiple engineers, a pair to write the code, a pair to review the pull request. So in theory, that's four sets of eyes. And by storing everything in Git, we're provided with a full audit trail. So what was merged, when was it merged and by whom. So we are both confident that any change we've made has gone through multiple checks, but also that if we needed to, we can recover to any known state at any time. Okay, so there are several prerequisites that you need to meet to deploy a cloud foundry platform. Some of them are here, so there are IAP ranges, VM folders, storage assignments, perhaps CI CD system. And when we were running a commercial platform, we were meeting these prerequisites manually, mostly using a GUI, which unfortunately left us open to making mistakes, and it also did not provide us with that audit trail. So we really wanted to find a way to automate this and make everything either configuration or code. So we did that by building our own automation tooling and then using concourse, and you can see some boards here, as a CI CD mechanism to deploy it. This means we now have a set of configuration which provides us with both a higher level of deployment confidence, but also that previously mentioned audit trail. And as an FCA regulated company, this is something that's really crucial for us from a security perspective. So besides the obvious benefits of end-to-end automation in terms of freeing up time for your engineers to pursue other pursuits, hopefully more creative ones, it also gives us predictability and assurance. We have something I hope you're familiar with, which is id and potency. So the concept that multiple identical requests have the same effect as making a single request. So anytime we run this pipeline, we know that it's following exactly the same process it's executing exactly the same tests and therefore producing exactly the same platform. So this, and by writing our pipeline as a series of asynchronous jobs, we can rerun any individual step without affecting the end state of the platform. So to go back to my previous slide about the things we had to meet to run open source, we're not introducing any unexpected change and thereby rerunning a job, there's no introduced risk. And this is a sort of an inbuilt error recovery method. So with any system, something might fail and we must engineer for said failure and we can be confident that we can recover quickly. So id and potency combined with this ability to rerun individual steps means we have effective inbuilt error recovery. Okay, so this is our deployment pipeline. It's very big, so you probably can't read anything. That's partly deliberate. But I am going to go through what's happening on screen. So the very first step of our pipeline is a lock. And that means that only a single execution can happen at any one time. And that's helping us to ensure that id and potency I've just mentioned. So only one set of configuration will be running through at any one time. So if we do rerun any individual job, we're not introducing any new state or any new risk. But we then have a step that's checking for configuration change for concourse. That's all happening automatically. And we're going to send notifications to our users to alert them of any maintenance or upgrade group that happens via email. That's the first step along. Our data centre is then marked offline in the load balancer configuration. So that's going back to the point I mentioned about operating out of two data centres. And we now need to deploy Bosch because of course we're using Bosch to deploy Cloud Foundry. We've also recently started making use of CredHub to dynamically generate our credentials. And then we provide Bosch the Cloud configuration to generate the manifest for the sea of databases. Now, by listing all of these things, what I'm hoping you're taking away is a lot of stuff has to happen before we can even consider deploying Cloud Foundry. So by automating this, we've made our jobs a lot easier. And at this point, a lot of stuff starts to happen in parallel, which speeds up a lot of time. And that includes, but is not limited to a myriad of tests. And that's what I'd like to focus on now. Okay. So when we were using a commercial Cloud Foundry, we were relying on their acceptance tests, of which we didn't have any visibility. Whereas now that we're using open source and the open source acceptance tests, we actually have gained a greater understanding of the platform. So for any failing test, I can look at what's going on. The error message, and thereby gain a greater understanding of the behavior of that particular component. So for each service on our platform, we have three synchronous steps, which I have summarized here. So that's interpolate, deploy, and then test. We have known state at all times in the form of these controlled configuration files. Hopefully you're sensing a theme. And the interpolate step reads this conversation. So that's what we're looking at now. So that's what we're looking at now. And that's what we're looking at now. So we're looking at some of these controlled configuration files. Hopefully you're sensing a theme. And the interpolate step reads this config, so that it can deploy it in a required state. Which can then be redeployed to a known state, should it be required to do so. So if at some point, so the internet wasn't working, we could just re-run the job and everything would be fine. For confidence in our deployment, we actually push lightweight test applications to perform simple integration tests and validate that we are able to still push an app and bind said service to the app. I know it sounds really basic, but you're never too good for the basics. Any example apps and services are torn down afterwards to make sure that we're not introducing extra load to the platform. Additionally, we have our own custom acceptance tests, which check the behaviour of components such as Doppler in the logregate system and also test components which are unique to our business requirements. We're currently operating a shared logging solution, and while it is known that not all logs are guaranteed to make it to their destination, it is important that a certain threshold do. So we are monitoring our logregate system end-to-end by deploying an application that emits known logs and then we check to see how many reach the other side. Additionally, it's worth noting that Doppler resources require scaling to accommodate overall log volume. So it's actually crucial that as our platform is expanding, we know whether we need to scale our components alongside it. I'm not going to spend too much time talking about it, but it is worth mentioning that we're also utilising Prometheus to capture our platform metrics. So how can we safely make changes to our tests? Well, all our configuration is managed by Git, so we can do all of our testing on branches. And this means that we can utilise that same deployment pipeline for testing our code before we even raise any pull requests and we can check the behaviour. So we are already very highly confident before we raise a pull request that things are going to work. So when we are happy with these code changes that we have tested in one of our three dev environments and it's made its way through PR review, which is typically done in a different dev environment, the merge change is automatically picked up by our CI environment and if it makes it through the CI environment successfully, that's automatically picked up by staging one and so on, if it's successful in staging one, it's automatically picked up by staging two. At each stage here, our deployment is tagged to say that it was successful. The entire release, including all of its commit references for every single component that has been deployed is tagged and then could be fed into subsequent pipelines and that way we can ensure consistency from pipeline to pipeline. If everything makes it through CI, staging one, staging two, we are then highly confident for it to pass into non-production, which is part of our runtime environments represented just here. However, to minimize disruption, we do often choose to make releases by grouping a set of changes, but again, because we run the platform, we can choose how and when we do that. Something I would like to call out is this GPG signing up here. So we are using GPG commit signing to indicate a release version for deployment into the runtime environments, which is essentially our product owner's seal of approval and is an extra level of security only certain members of the team have the correct admin privileges to be able to do so. So how do we tag successful deployments? So the reason that we're doing that is that we want to know that the code has been successful and the code that we deploy into higher level environments, i.e. non-prod and prod, we need to be completely confident that they've been tested and that they work. So nothing that goes into a higher level environment hasn't already succeeded in a lower one. Tagging also provides a complete audit trail. I know that I've mentioned audit a few times, but it's very important. So if anything were to go wrong, we can go back and see exactly what code was running at that time. So we can replicate whatever was happening. We can also see, we can also see who merged it. No, bear with me, technical difficulties. There we go. We can also see who merged the change. So if we did have to hold anybody accountable for anything that is possible. Okay. So what you saw on the other slide isn't quite accurate. Things actually look a little bit more like this. If you're good at counting, you'll notice 24 CF foundations. We actually have 26. So we've got all of our devMs on the left. We have that CI staging one, the staging two scenario that I've just walked you through. And then you've got a bit more of an accurate picture of our runtime environment on the right. The clocks represent a two-day wait, which is just an extra level of security to make sure that we're happy that what happened in non-prods can then go into production. These 26 cloud foundry foundations are spread across the UK and Asia. And they're all deployed using that same deployment pipeline. We are making use of the stopover resource from our friends engineer better. He's on his phone. So automatically pin our concourse pipelines to specific versions of resources. And this allows us to reuse that same pipeline YAML for all our different environments. We don't like to add code. We always want to minimize code. And this way we can basically ensure consistency across everything. Okay. So you've heard about how we're doing the deployment, but how do we make the transition? Okay. So we chose to run two cloud foundries in parallel, the commercial cloud and the cloud foundry one. The open source cloud foundry one. And this of course meant an increased cost of running. And we also needed to invest in some new infrastructure since we run our CF on virtualized infrastructure internally. Despite this overhead, deploying two cloud foundries allowed us to directly compare the behavior of both of the platforms. And to ensure confidence that there will be no degradation of service or introduction of risk. And by running them both at the same time, it basically gave us the time to do that. How do we ensure that we were capable of doing this direct comparison? Well, when we built our open source platform, we chose components that match the build of the current version of the commercial one where possible. We ran exactly the same test suite against both platforms. And we also utilized the open source cloud foundry acceptance tests to have even more coverage to what we originally had of the commercial platform. And additionally, we asked our lovely app developers to push to both platforms during a migration period so that we could do a direct comparison of performance so that all the developers were happy that there would be no change for their experience or the customer experience when using their applications. This is meant to be the migration period. So, due to our architecture already being set up across two-day centres, apps were actually already written to be deployed to separate cloud foundry foundations. So that was really useful for us. The network routing that was already there actually enabled us to be able to push twice to two different platforms. So we didn't need to engineer anything fresh. It was also therefore reasonable for us to ask our developers to push to another platform in addition to where they were already pushing. And we were also quite prepared for moving our security groups as we had an existing plugin that means that our developers can directly push their security group configuration alongside their apps. So what I really want to highlight here is that we didn't do a lot of engineering here for the migration. Our app teams were mostly using some form of CI or CD workflows to deploy their apps. We're not prescriptive in what they choose to use. So in general, it was as simple as asking them to add the new foundation to their workflows. The key here for the migration was actually communication, not technology. So we needed to ensure that our developers knew well in advance about what they would need to do to deploy their apps and to educate them on the use of open source. There was a lot of nervous energy and we needed to make sure that everyone was happy with the move. We have numerous different product teams and it's quite hard to get them to agree to do everything all at the same time. So in general, coordinating a large number of people is not easy, especially when they're spread across the globe operating to different schedules and in different time zones. So what we wanted to have was a mechanism whereby we would still be able to complete the switch over even if one product team wasn't ready. So we had this really, really simple Golang app aptly named the Redirector app and this would basically send any traffic to the right place even if the app hadn't been migrated. So how do we migrate security and service settings? That hasn't rendered very nicely, I apologise. It would be unreasonable customer experience to ask developers to make requests for all of their services again, not to mention a large volume of work for us to recreate them and thankfully we were able to automate this and it was quite simple. We simply wrote scripts that queried the UAA database and then repopulated all of that in the open source platform. All the previous configuration had an audit trail and had product owner approval, so it was simply a copy and paste exercise really and the only real complexity was making sure that we could automate it by writing the plugin. As a way of ensuring that everything had been copied, we checked that all the commands went through successfully and anything that didn't go through was recorded in a retrials file, just that retried and as a very final pass over we checked a few things manually but for the most part everything was automated end to end. Okay, I imagine the question that you're all asking is how long did this take us? Well, not actually that long, so our inception period started at the end of January, early February and by the end of October we'd completely switched off our commercial platform. So the go live for the platform was really at the end of July and between July and September we were encouraging all of our app developers to migrate to both sides and then at the beginning of September we were expecting all of our app developers to have moved their apps across. We left that month period there essentially to predict that there would be some stragglers I think in any organisation you're going to have people running late so we had that there as a failsafe and any of those people were caught by the redirector app. So a long story short, I'd want to highlight that if you're thinking about going open source and you're currently using a commercial platform, we were already in a really good place. We were very comfortable with running the platform, we were very comfortable with the software. There was a lot of trust, trust in the Cloud Foundry code, trust in ourselves to run the platform and we were very open that it was a big risk, we weren't deluding ourselves in any way shape or form but we could see those huge benefits and the fact that I'm just here today to tell you about the means that it paid off and I'm really grateful to be here. We now have a total of 26 foundations across three different geographical locations. Thank you so much for listening, if you want to hear anything more about what we've done we do have a booth in the sponsor arena, two lovely team members down at the front supporting me come talk to us, I'd love to hear from you.