 Hello, OpenStack. Yeah, the might works. Cool. Okay, so in this presentation, we're going to take you through some of the lessons learned from doing a thousand deployments a day on OpenStack. So what we're going to do is we're going to take you through each of our particular scenarios where we may have had a problem and take you through how we resolved it. What we really want to do with this is make sure that we share everything that we're doing on the project and hopefully some of you can learn from some of the lessons that we've learned on it. So over to Dave. Okay, so I guess first we have the obligatory slide about our company. So Paddy Power Betfair formed last year as a merger between Paddy Power and Betfair, surprisingly. So historically Paddy Power is kind of one of the largest bookmakers in the world. And Betfair is unique because it's the largest betting exchange. So it's kind of an online platform which enables people to bet against each other. Yeah, and the combined company is now on the FTSE 100 on the London Stock Exchange. So we have a few offices spread across Europe, now branching out into the US and Australia. And that's a thousand engineers, over a thousand engineers spread across those locations. And yeah, to give you some some figures, we do 135 million daily transactions, which involves 30 billion calls to our API, which is more than I think the London and the New York Stock Exchange combined. Yeah, and why are we at the OpenStack Summit? Because we've built an OpenStack private cloud to put to run all this on. Yeah, and we're looking getting towards 100,000 calls and about two petabytes of storage. So building this OpenStack architecture, so we've called this internally the I2 project for Infrastructure 2, which is a very creative name. So these are some of kind of the initial aims of the project. One of the big things was to have immutable infrastructure. So we deploy lots of VMs, a thousand a day is the figure we're going with. So they're all like, so we have short-lived VMs, we don't do any in-place patching. And yeah, we want, like everyone here, I suppose, kind of wanted to automate as much as possible. So by delivering applications through continuous delivery pipelines. We wanted a simple architecture that allows to troubleshoot any issues easily. And yeah, we want to maintain everything in source control so that any changes are kind of verified. And yeah. So one of the things with our infrastructure as a service is we want it to leverage APIs of all the different infrastructure components. So that basically create Ansible playbooks which carry out common workflow actions across the infrastructure. Obviously OpenStack is very good for this. That also allows us to kind of use the OpenStack APIs to avoid any vendor lock-in. One of the big things for Betfair at least was we'd historically just run out of one data center. With this platform, we're moving into two DCs and trying to make all the applications kind of active-active. So yeah, this involves the developers having to work to design the apps of failure and for us to design the infrastructure for failure as well. And yeah, we also, yeah, so we want to utilize a common tool chain for this continuous delivery. So just using one tool for each operation. So this is kind of a high-level, yeah, overview of what we've, the journey we've gone through, I suppose, over the last two years. So it started off with a, yeah, four-week proof of concept, which I think was about the time we joined the company. So not a bad place to, not a bad time to join. So this is with Red Hat and we use Nuage networks. So building out the reference architecture and basically proving we could deliver an application in a way that we wanted to. So at that point we're running on the Juno release. We went, in September 2015, moved on to the Keeler release for the pilot phase of the project. So this involved, the end goal of this was to basically have two customer-facing applications on the platform serving traffic and production. And in that six-minute period, we kind of did the bulk of the initial work to set up the self-service workflow that the developers used to interact with the platform. And self-serve, say, well, I guess we'll talk about it in more detail, but self-serve, their infrastructure, their load balancing, their network requirements. After that we went into a long phase of migration. So this was about the time of the merger. So we got set a very aggressive goal of onboarding 100 applications to the platform by Christmas last year, which I think we pretty much bang on achieved, which was very, very cool. We borrowed a lot of hard work from everybody. And then moving into 2017, we still have the ongoing migration effort. And we've also had a few other kind of major infrastructure milestones. So we've done an upgrade of the Nuage component to the 3.2 R10 release. And we also got the kit to build out a test lab, because, yeah, this was something we kind of lacked for a while, but now we were in a position where we've proven the platform. We've kind of given the resources to be able to build out a test lab, which we can kind of, yeah, try out new OpenStat projects or, you know, new functionalities. And here we are today in May, and we're looking in the next, over the next month or so to upgrade from Kilo to Newton. Yeah. Still in Newton. Yeah. And also upgrade Nuage again to the 4.2 release. Yeah. So there was a session earlier on that Jan and Philly ended on immutable OpenStack infrastructure. So you can check the video out for that to show how we're going to do that upgrade. Okay. So this is our reference architecture that we've built. So as Dave touched upon, we have active, active data centers. And to start with, before we were actually building out OpenStack, we needed core services to be available. And so we have a minimal lid cluster that hosts our LDAP NTP and DNS servers. We needed to do this because it was a full Greenfield project where we didn't want to bridge back to our native network at all. We wanted to build this in isolation and then migrate applications onto it. So our architecture is based on the leaf spine architecture from Arista. And so basically we have our spine switches that are configured. And then they build a BGP fabric with each leaf switch sitting top of rack. So we have two leaf switches sitting at the top of each rack configured in NWag mode. And then what we do is we translate that with our SDN controller. And that advertises all of the routing protocols to the NWag SDN solution. So the way it works is the SDN solution plugs into each of our OpenStack clouds. We run two per DC. The original premise for this was we wanted to separate our delivery tooling. So stuff like Jenkins, ThoughtWorks, Go, GitLab would sit in one of the OpenStacks. And then we would have the infrastructure OpenStack for test environments and production workloads. We're since re-evaluated that and we're going to collapse that down to one OpenStack per DC because the maintainability over managing four OpenStack clouds is quite high. And we're a lot more mature now with the Newton release. So we're going to scale out multiple more hypervisors in the one region. So the way that we provision hardware is we will essentially configure RAID configuration using HP1View. We use a series of Ansible playbooks to do that. And then we turn it over to the RHEL OpenStack director which will scale out our compute nodes. At the top, we have our global load balancer which then feeds into our SRX firewalls. We then have two tiers of Citrix Netscaler. The first tier is used for SSL offloading on the hardware. And then we use the Citrix Netscaler SDX to basically root traffic down to each of our micro-service applications in each DC. So we have dark fiber between the data center. So the way that we bridge external networks into that overlay bubble that's created by Nuage and OpenStack is by using the Nuage VSG. So the VSG will connect external networks such as our native legacy network so that we can actually, we're not doing a big bang where we're switching all of this on at once. We're migrating specific workloads over to the Nuage OpenStack and then we're bridging back for the application dependencies. As we migrate more and more applications onto the platform, obviously there'll be less bridging back to the legacy network and we'll drain all of the previous network down and move everything into OpenStack eventually. So it's completely mirrored as you can see and what we're going to do is going to walk you through the next steps. So as I touched on before, we basically use delivery tooling. That's installed in the tooling OpenStack. We use for continuous integration, we use Jenkins. That will actually package all of our continuous integration builds and we build lots of RPMs for that. We also use Jenkins to tag specific repositories for our everything is code mandate which we'll essentially tag in GitLab. We essentially tag our load balancing, our networking, our developer code and our common workflow actions that they've touched upon before. For source control management, we use GitLab. For our repositories, we use GFrog Artifactory and for our security scanning, we use Qualis. Then we give developers the option of using Chef or Ansible to install their application. When we're orchestrating against the OpenStack APIs, we wrap everything in an Ansible playbook. We had a lot of heritage with Chef, one of the creators of Chef actually used to work at Betfair. So a lot of the applications were written in that. It would be too much upheaval for the development teams to basically rewrite everything in Ansible to give some of the teams the option of Ansible or Chef to install their applications. Okay, so I think now we're just going to talk briefly about some of the design decisions that have been taken along the way. So first off, yeah, we're going to tackle the issue of permissions. So we're building this new infrastructure. We want this to be infrastructure as a service. We want the developers to be able to freely and as safely as possible be able to interact with the infrastructure. So I think probably the most important thing here is that the only way people can change infrastructure is through a deployment pipeline. There's no logging into boxes or logging on to net scalers or whatever and doing stuff manually. As this is all done through our GitLab source control, any changes can be verified. Yeah, verified by the appropriate teams before they go in. And if we do bump into issues, it's kind of a traceable, easy to trace, it's easy to trace how things have gone wrong. So our deployment pipelines, we have kind of dedicated service accounts that interact with all the different infrastructure components. And because, yeah, because we're leveraging the OpenStack APIs to do all the interaction, we basically just set up redone the access for the developers. I'm not sure how much they really log into our horizon, but it is useful if you want to see, yeah, your tenants usage, yeah, and how, yeah, view your hypervisors and VMs. So, yeah, we have Keystone v3 to do this, basically. So we have a dedicated LDAP domain that plugs straight into the legacy bet for LDAP server. So, yeah, that's seamless. What we did have to do was create a bespoke redownly role in Keystone, which means, yeah, editing all these different policy.json files on the OpenStack controllers for all the different services, which is a bit of a pain, but once you've got it working, it's kind of just, is there and works forever. Yeah, I think one of the things that would really benefit is having an out-the-box redownly role and having that ability, because if you're doing an infrastructure as code, everything is code model, you really don't want people going into the GUI and doing it, but they still need the ability to view it. Okay, so some of the design decisions that we made were in terms of team setup, so I'll just take you through how we arranged our team to actually achieve it. So, in the pilot phase, we needed to create an unimpeded team that was free to create the self-service automation. So what we did was we ring-fenced particular resources and moved them into a cross-functional team. So we had network engineers, database guys, we had infrastructure guys and some development guys that came into that team. We wanted feedback from the community on what their frustrations had actually been with the previous platform that we put in. So we took all of that data and basically made a series of design decisions based on that. We wanted to set up a model where they continually improved and iterated on it, so what we did was we created T-shaped teams. So T-shaped teams, if you don't know what T-shaped teams are, are you have a deep-dive specialist knowledge in one particular area. You then have a breadth of knowledge and what you'll do is you'll share that breadth of knowledge with other team members. So someone that didn't particularly know networking could work closely with a guy that was a specialist in networking and then that would bring them on to actually understand network conventions and then that expanded your team, the knowledge within it, and makes for a better team and a more well-rounded team. So the way that we set this up was we broke out all of the self-service workflow actions that we wanted each team to work on. So we had 12 engineers based in the core team. We did daily stand-up so that they were collaborating altogether. So in the truest sense, this was really a dev-up model. We then had each team member focus on a particular piece of the puzzle. So what we tried to do was we got team members from different locations to work with each other, maybe people that hadn't worked together before. So for instance, we had people that looked after the OS image creation. So we used Packer for that. What we did with that was we needed CentOS 6, CentOS 7, Windows 2012 R2 images. So they worked on actually creating the base images and doing all the automated patching techniques. We also had people that were working on NuAge. So NuAge was NuFi or is. We had to orchestrate all of the common workflow actions such as creating ACL policies, such as basically setting up subnets from scratch and doing all of that integration work with Ansible and the open stack in NuAge APIs. So that was two people went away and did that. We also had people looking at load balancing for net scalers. So what we wanted to do was take the huge monolith config file and break it down so it was application centric so that we could demystify that so that when teams created load balancer config, it would split out into each application. So that was a massive project for us as well because we wanted to automate everything. If anything was completely manual in this cycle, then it wouldn't do because of the speed that we wanted to operate at. We also had a team that, our system reliability engineering team that went away and worked with some other guys to work in, out and put in Senso, which was the tool of choice for monitoring. On the open stacks side, we worked on common workflow actions such as creating flavors, creating host aggregates, creating virtual machines and basically sorting out all of the identity services that you were integrating with. So that was a huge piece of work as well and that was all done in this model. Then we had our delivery tooling so we wanted to be able to treat the tools that we use every day the same way that we did customer-facing applications. So that meant deployment pipelines for everything before people would just be logging on to a box and install Jenkins and then that would live in that box forever. What we wanted was immutable infrastructure for our delivery tooling, same for our internal services. We wanted to be able to deploy DNS, NTP and LDAP each day. So the workflow that we've used for open stack, we've actually applied to Lidvert as well. So the same workflow is used to build out that tooling. So even if we went to a public cloud platform, we'd be able to substitute that and if we wanted to burst into public cloud that later date. So another design decision that we made was we wanted the team setup for the migration. So that was okay when you were building out the main building blocks for this project. But what we needed to do was when we went into migration phase, we kind of had to re-org the organization to support the onboarding and ongoing maintenance of open stack. So this is the way that we set it out. So we have a team of around six engineers to look after the core infrastructure because we've automated everything and built those building bricks. Sorry, this looks like something out of a war thing, doesn't it? I've just noticed that. Anyway. Yeah, your country needs you. Anyway, so six engineers that look after the core infrastructure. So that's doing the upgrades, completely automating everything and making sure that we can do upgrades of open stack and upgrades of new edge. Then at the same time, what we've done is we have different locations where they need to onboard onto the platform. So we have six engineers that assist with the self-service automation and teaching the teams to actually self-serve themselves. We don't want to be a blocker, so we teach them how to fill in their self-service config files and basically use the platform. So all that they're really doing is filling in YAML files for networking, virtual machines and everything, and we'll take you through that later. So the next design decision was really the logical division of network kit and hardware. So open stack, as you know, is quite open in terms of how you can set up availability zones and everything. So some of the decisions that we needed to make was how to segregate hardware and open stack. So that means how do we split up host aggregates, availability zones, do we use multi-region open stack, or do we use a single region? And how do the new edge layer-free domains map in? Because new edge as well, for those that haven't used it, is fairly flexible as well. You can arrange it in different ways. So what we needed to do was make a decision on that during the pilot phase on how we would segregate out the platform. So this is really how we segregated the hardware and infrastructure. We use Citrix Net Scalar. So we have, for test environments, we have a physical MPX and SDX. We then, within new edge, had a layer-free domain mapped to a particular availability zone and open stack. The way that we use availability zones and open stack is for test environments. So essentially, we have a quality assurance availability zone, NXT is our integration environment, and then we have a perf environment and a prod environment. So with mapping that to new edge, you basically have a segregation between each of those environments based on the layer-free domain. And then underneath it, we have our pure storage, which is our flash storage that sits across all the environments. We've since introduced more flash storage arrays with pure storage, but we didn't have time to update our slides, evidently. And again, this is just completely mattered in the second DC. So your clicker is working, Dave. You probably thought it wasn't. So how did we split up our new edge configuration? So the way it works is each time when we create a particular microservice application, that's mapped to a particular zone in new edge. Off of that, you have an ingress and an egress policy which displays all of the particular ACL rules for that particular microservice application. You then have a particular subnet that comes off of that zone. So for instance, from a security perspective, if you looked at that particular zone and the associated policies with it, you would be able to see the application topology for that application and completely audit it. So one of the things and challenges we had with software-defined networking was how do you manage security policies? So without the micro-segmentation of the new edge firewalls, essentially there was concerns originally over would this work for security? So generally they had big allow rules in the firewalls and they couldn't actually work out what application what their application topology was. So now what they can do is they can look in source control, they can see a YAML file, they can see what particular ports an application is using, and they can also look in new edge and see the corresponding one on the policies. So this has helped massively with security audits where we can just show precisely what we're doing to auditors. So we've seen what we want to do with Nuage, we know what we want to do with OpenStack, how do we actually deliver this? And this is where we build our continuous delivery pipeline. So we've got some pretty cool pictures of pipelines here. So in the beginning we were kind of a small cross-location team. So what we didn't want to happen was each location would go and build their own thing because that's just going to become completely maintainable. So we wanted kind of a consistent pipeline which would allow the developers across all the different locations to be able to carry out the same workflow actions with ease. What we wanted to have basically was a common way to create networks and set up the ACR rules in Nuage. A common way for creating VMs in OpenStack. A common way for installing the software other than the whole chef-first-answerable thing. The same way to set up the load balancing on the net scalers and yet a consistent way to onboard the applications. And part of this is that we wanted to take feedback from people in all the locations so that we could improve on how this workflow worked. So this is the pipeline we came up with. So I'll just give you a quick overview of the different stages here. So basically we start by get prerequisites. This is just going to get lab, getting all the playbooks and the configuration files we require. And at the versions that are specified basically pulling them down to the ThoughtWorksGo agent which is basically all the Ansible playbooks. We then set up prerequisites in OpenStack so that's creating the flavors and host aggregates that are defined. We have a capacity check to basically ensure you have enough RAM and disk on the hypervisor to actually be able to carry out the deployment. Create L3 network. So this is using the Nuage APIs to create the subnet in Nuage, apply those ACR rules, create the corresponding entities in OpenStack. Once that network is there, we can launch the VMs onto it. What we do is we tag a bunch of metadata at that stage which is then used by the later stages in the pipelines to make a whole series of decisions. Once we have our VMs there, we can run Ansible or run Chef depending on your preference to install the software on the box. And then we can do the interaction to create a bit for the application. At this point, we have the rolling update. So we're doing AB deployments here. So we have the A deployment is live. We want to roll the B deployment into production. So for a lot of the applications, we just kind of take one box out of the low balancer, put one box in. But we've actually made this stage customizable for things like stateful applications where there's more complexity involved. You maybe have to send a manila to do your rolling update. And at this point, yeah, we run a test job which is defined by the developers. And if this all passes successfully, we basically clean up the previous version. So destroy the old VMs, destroy the old network. And then at this point, you're basically, if this is your QA pipeline, you promote to your next testing environment, so NXT, and go all the way through eventually to production. So we have our pipelines defined. As we said, there's probably going to be a minimum of four environments per data center. That's eight pipelines for each application. So we need a way to be able to create those quickly and create those in a repeatable fashion that's consistent. So, yeah, what we embarked on a project basically to automate the creation of the pipelines, which we called go pipeline builder, which is basically just having a YAML file because it's really easy. And then having a script that interacts with the go API to create those pipelines. So in here, you can basically define all the different environments you have and pass some other relevant parameters. And this is what it ends up looking like. Your different pipelines and your different environments built out. So one of the other decisions that we made was to make YAML files the main source of truth. So each application, as Dave said, will have a minimum of four pipelines per data center. And then we will essentially have eight pipelines in total to go through each of the availability zones. So the way that we've set this up is to make it simple for developers to use. So our VM naming standard is here. So you can see it's named this way. So we use YAML link to make sure that people fill this in this particular standard. Then moving on, developers will specify their flavor. So they specify the VCPU ram and disk space that they want for their particular application. So this inventory file, each development team will have one per application. And then it will actually have multiple different environments specified on it. So you can see that there's a lot of developers that use all the environments. Aside from this, as Dave said before, you have your host aggregate and hypervisor that is going to land on. What we wanted to do was give each team an allocation, a hypervisor allocation, and let them control how their applications land on the hypervisor. So we tag behind the scenes what we do is tag the host aggregate and then when you spin up virtual machines with that particular flavor, it knows to land on that particular hypervisor. This also means that we design our data center for failure. So essentially, a production application will be mapped across to hypervisors or more. This means if you have a failure of a hypervisor, it only takes down a percentage of the stack and there's a customer impact. And as we touched upon before, this is the role that they will use. This one's called app, which isn't real obviously. Okay, so these are some of the, yeah, as we talked about, we have the self-service files. So this particular one is for the new arch interaction. So basically defining the ACL rules we want. So, yeah, here we have the ingress rules. This is from the point of view of the BM you're deploying. So here we have some examples of rules using TCP on various ports. And we also have the enterprise network entity, which is basically the entity in new arch that allows you to bridge to external networks. So you can bridge back to the native network or to the other data center or even between different domains in new arch. And again, we have some egress rules on, yeah, ports. But the idea of this is that it's kind of a simple format which allows the user to define the access rules they want in a really easy way. And similarly for the load balancer. So at the top we basically have domains. This corresponds to kind of the internal and external bits that exist on the Net Scaler. Yeah, below that we have the LBV servers, which basically defines your load balancing method. And, yeah, the properties that will be associated with your service on the Net Scaler. And then below that you see we define how much is just a health check for your services. And at the bottom this raw percentage, which is when we do the rolling update stage of the pipeline, this is basically how many boxes we act on in turn. So we act on, we put one in or one out or more if we desire. Yeah, so along the way we obviously had some speed bumps. We called them speed bumps. We didn't like to see problems. Okay, so, yeah, the first one was, well, the first big one was we had the launching VMs when we got to a certain scale. At first we kind of, we didn't really know what was happening. It seemed that some VMs would just come up fine, some we'd have a problem with. Started to get some clues from the open stack logs. Basically the issue seemed to be something to do with requesting ports from Neutron. Or creating ports in Neutron. What we discovered was we were basically hitting a time out setting. We had a new virtual machine. It has to apply the ACL rules to the V port on the in open V switch. Basically when it was doing that the entire, all the ACL rules in the domain to see which ones it had to apply. We basically, we, as we had deployed more and more apps onto the platform, you have more and more rules. So this query would take longer and we basically hit the problem again. The short term for this was to basically make these changes in some config files to increase the timeouts in a cascading manner. So in the short term that got us over the pain. We were able to launch VMs again which is pretty critical to basically as being able to do any releases whatsoever. But this is obviously an unsustainable solution because VM launch is really slow and the platform is going to scale way much. So basically in the 3.2.10 release this query had been refactored and basically just worked a lot better. So the time we were going from like 90 seconds in some cases to launch a VM to like milliseconds from the new archers point of view. So another issue that we had when we built this into our pipelines, when we had failures you have multiple different ways that things can fail. So building in logic for all of those particular scenarios is very difficult. So that was the premise that we had when we started out with this. So what we wanted to do is develop a solution. So we used metadata for this. So essentially we introduced something called pipeline status on the metadata of the VMs. So at this stage when we are launching the virtual machines they go into an in progress state on the metadata and we just tag that using the OpenStack APIs. Then when we get to the rolling update stage we basically put the old boxes that are moved out of service into an old state and we put the new boxes that are rolled into service into the live state. This may sound very simple but this was very powerful for us. Then if you've got a breakage at any point in that pipeline the setup prerequisites when it's issuing the new pipeline from the start will clean up any of the boxes that are still in in progress state so that it doesn't block that next pipeline going along. Because we use AB deployments you have an A deployment or a B deployment that's live and then if you have a broken pipeline on the A or the B deployment this will clean it up before. Okay. We've probably only got a couple of minutes but this is the last issue we were going to talk about which was when we were trying to scale out our cloud we were running into problems where certain hypervisors we'd lose contact with all the virtual machines on the hypervisor during a scale out which is not ideal. What this turned out to be was essentially a call a burg or a bad bit of code in the version of heat we were using. So on a deployed node you'll have the heat config which in our version of heat it's basically mounted on a temporary file system so any servers that are rebooted for any reason because it had a hardware issue that folder would just be flushed there'd be nothing there and then in our yeah in our heat templates there's basically a because of the new-age install there's a step where you uninstall the native open v-switch and install the new-age component on these ones that are rebooted it would detect heat from heat's point of view this is a new node I need to redeploy it so it uninstalls open v-switch so you don't have a virtual switch so you lose all connectivity to the VMs and yeah as I said this is in later versions of heat this is fixed the problem is this isn't a fix you can just apply on the undercloud you need to apply it to your actual overcloud image that you used to deploy your overcloud nodes and changing that image once you already have a cloud is a kind of scary thing to do so at the moment we've kind of put in a solution where we populate that folder that heat config folder so we don't bump into this issue anymore okay so one of the other design decisions that we made was to run one open stack as I touched on before this is the target reference architecture that we're going to get to so when we do the upgrade to the Newton release we will run one infrastructure open stack which will basically have all of our workloads whether it's delivery tooling or customer facing applications and test environments all reside under one region we're looking at 650 compute instances per dc for this yeah so that was one of the main decisions so the overall benefits of what we've done is we've reduced time to market we do over a thousand code deployments a day to test in production environments we churn through around 3000 virtual machines and we've also lowered mean time to recover from failure so those ThoughtWorks Go Pipeline templates you can essentially go to any different application and see what's going on so you can understand it because they're not writing completely custom things each time the only thing that's customizable is the Ansible Roller Chef recipe that they're using and the rolling update phase so this makes it scalable and easy to track and as I said repeatable deployment process for apps and then moving on what we're going to be doing next is the Kilo to Newton release to get us to that new reference architecture we're going to be doing that in the next 30 days and some of the things that are coming as well are the OpenStack Ironic and Nuage integration so basically Nuage have a bit of custom code for Ironic that means when you deploy on the provisioning network your bare metal server this is an over cloud implementation then when you do a reboot what it does is it will re-IP that particular server from the provisioning network and move it onto the tenant network which allows you to ring fence in the ACL policies and give you multi-tenancy the other things we're looking to do is some of our teams have a wish for containers like everyone does today so essentially we want to offer bare metal servers, virtual machines or containers so with the Nuage upgrade we can actually plug in and use a container network integrated with Nuage which means that you don't have that double networking layer where you run your subnet and then essentially have your container network also for backups we're also looking at the freezer project which will allow us through the OpenStack APIs to orchestrate backups rather than do third party backup agents and I'm not sure we've got much time but this was Dave meeting the local Red Sox guy I don't think he liked him much and we also have our i2 private cloud whitepaper that we're publishing we just want other users that want to go on a similar journey to basically have something like this we would have benefited greatly we're starting this journey so we're keen to share our experiences there and you can look at it in more detail do we have time for questions or will I get off the stage one question okay one question 16 parts cool just two main questions one I'm wondering what kind of uptime are you getting from your customer side of the application in terms of uptime that you're measuring that you're seeing with these architectures that you've been following and delivering on to date so we're a 24-7 business so if we don't have 100% uptime then we lose business so generally what we've done with this model is we've designed it for failure so we have infrastructure failures now and again we have boxes that go down in DCs what we've done with this design is make sure that that doesn't impact the customer so they don't see it so we'll get paged out we've actually had scenarios where we've lost a percentage of the application and the development teams get paged and they go leave it to Monday until we come in when it's over the weekend so that's really the state that we want to get to we don't want you've lost a hypervisor 10 apps are down so we've got away from that completely so you have percentage failures in your DC but you don't affect customers