 what's orange and sounds like a parrot a carrot a carrot is what's orange and sounds like a parrot so let's let's kick this off of the theme of comic relief yes here we go cool thanks Marie hey ready welcome to the last session of the conference thank you for sticking it out with us which hopefully make it a fun time we're here talking about Cloud Foundry and comic relief in Red Nose Day with representatives from the customer side with comic relief with the IS side of GCP with the developer perspective and with the operator perspective and the things we're going to hit on here are around running a mission critical app that took massive scale over the course of the event moving from a monolith to a microservice or micro services architecture and just the general good outcomes that came about for comic relief through this experience we can introduce our panel or a little fireside chat here maybe starting off with you go you go to Mike I'm seeing on how to come the CTO of comic relief hey my name is Ben Dodd I'm from Armacuni we help people deploy and use Cloud Foundry and we also run some of the training here my name is Marie Cousgrove Davies I'm a PM on Pivotal's CloudOps team Jay Marshall Google Platform I've been working on a lot of the integrations between Google Cloud Platform and Pivotal Cloud Foundry and I'm Evan Willie I run program management for Pivotal Cloud Foundry so kicking it off Zinan could you tell us a little bit about some background on comic relief and on Red Nose Day for the American audience that isn't as familiar yeah so comic relief I was a charity that started 30 years ago by Richard Curtis who was the kind of director of four weddings in a funeral love actually in response to the famine in Ethiopia in the 80s and essentially we have a mission to bring about a just world free from poverty and we believe that creating change through the power of entertainment essentially what we have is created this institution called Red Nose Day which happens once every two years in the UK and it's a kind of six to eight week campaign and culminates with kind of one night of TV where we show some great comedy show some powerful videos of what of how the money we spend changed people's lives and basically get the whole country to kind of partake in in donating great and and this is a pretty big event what's the kind of scale that you generally run out at this point so we kind of scale to take donations across kind of both via SMS via text online and via about 10,000 kind of call centre agents and so we we kind of build to cope with kind of peaks of about 500 donations a second so that's that's completing donations a second the challenge that we have is that as a kind of yearly event we have a yearly feedback loop and we have absolutely zero traffic year round we get a little bit of traffic in the kind of six to eight weeks before the campaign and then we get once a year event I was feeling always feel quite stressed by that difficult challenge but then I was talking to someone from the British Office of National Statistics and they have a 10 year cycle because they run a census once every 10 years so that made me feel slightly better. That's a long feedback loop. How did you tackle that challenge prior to working with Cloud Foundry as your platform? So originally when we moved online we built an online donations application which was built in Java and it was a kind of a real monolith and the thing about it was that like with the yearly feedback loop we just put it away in the cupboard it would gather dust and a year later we kind of bring it out and brush it together. The challenge was that we had some great technology partners one of who would lend us a load of kit it get delivered usually it was due around November it wouldn't come it come in January and then we only had a few months before our kind of event which happens in March and then we'd have to get together like 10-12 technology partners you know the RDBs providers the kind of networking people and get kind of 30 or 40 people kind of working together intensely for a period of weeks to try and pull this thing together and it was essentially a snowflake you know all held together with kind of the human glue of experience of people who put it together a year before you know you get problems this problems happen people would remember what they did a year ago and kind of fix it and it was really not very agile not very responsive the kind of feed feedback loop to be able to kind of scale it up build it test it if something need to change it be kind of another two three days of going through that whole kind of test cycle so really it was very very challenging we couldn't make a lot of changes to it because there was a lot of risk around it so we made the minimum amount of changes and then just kind of held our fingers and kind of hoped and then in 2011 we had a massive campaign and the response rate was really huge and we'd been kind of doubling the amount of people who were coming online and donating between each campaign and we really hit the edges of it and it came very close to kind of failing just simply because the scale of response that we got so realized that we needed to rebuild and so we went back to the drawing board and at that point you went to the cloud right so what was the the impetus for for making that change I mean the main impetus is not having to wait around for trucks to deliver service that someone has lent you but really I mean we could almost our use cases almost the cloud was almost built for us you know we've got nothing year round and then we have that one spot that we need to cope with so the ability to scale up that was just was just perfect perfect for us I believe you also use multiple clouds at this point and what's the motivation behind that so I think when we rebuilt the out we had a couple of motivations one was would was beat ultimately be unlimited by technology so we wanted technology to be able to scale to whatever level of kind of traffic the the campaign built up to the other one was to avoid vendor locking with the old platform I mean it was great you had 10 to 12 really you know large-scale software companies software and hardware companies supporting us giving their time for free to us because who we were and that's fantastic but then you're always relying on that carrying on and then as the kind of market evolved and people bought people whereas a company used to do one part of it they all wanted to do all of the parts and she became a very complex kind of arrangement to manage them so we wanted to be completely not tied into any technology provided be able to move around as we kid we also wanted to minimize our exposure to PCI and really we wanted to kind of adopt kind of modern modern development patterns be able to be agile and be able to move to move quickly so so Ben your team and company helped with that transformation and with building that out can you talk a little bit about that experience yeah sure it's been an interesting ride that's for sure and so back in I think June 2012 we we decided to we were gonna form a team and we were gonna pitch to comic relief to try and improve on on what they currently had and we were awarded that work and you know it was a very uncertain time for comic relief they'd had this platform that had worked for a long time it was taking a huge amount of their revenue every year and so I think one of the really important first things we did is that we arranged with the board of comic relief that we were gonna come back and demo what we had done every two weeks every month not do a PowerPoint deck of what it could do you know actually go in in front of the board and demonstrate what we're gonna do and the choice of technology was a very important part of of doing that and then I think looking at the existing app and I think this is also born out by the experience that we have going into other enterprises is that we didn't accept the premise of the current application we didn't look at the current application and said it had it worked like this I had a central data store it had these apps deployed in this configuration it was you know we have this this problem how can we best solve it and so one of the key parts of that was that we chose availability over consistency so we realized that it wasn't important to comic relief that they had a very up-to-date accurate acid-compliant view of how many people had donated what the current state of play is the most important thing for them is that they'd taken all the money and if we come back an hour later or a day later and report on that progress you know that would be fine as long as we took all the money obviously we don't want that situation either but really the first thought has always got to be availability and then we went on to say that we want to build everything on the premise that it will fail so every component of the stack we want more than one so more than one payment provider more than one I as ideally more than one pass more than one programming language so through the whole stack we ended up with sort of a matrix that we would have to have there be 16 major elements and 15 of those could fail you know payment providers regions infrastructure providers all of that could happen and we would still take all of that money and then I think also one of the key things is that we chose microservices to do this and we actually chose microservices and a lot of times when people try to replicate this experience you say oh we've done something similar you know what is a microservice and I don't think a lot of the time people are really implementing true microservices so this is things like that we can deploy individual microservices so when we want to do changes we can very quickly change the system we can do continuous delivery all these sorts of things and also that we've got our test pyramid right a lot of the time I think and I learned this on this project is that end to end testing is not the only way to test so with these sort of complex systems and when they're distributed over multi by multiple eye as is and these sorts of things if you've got a testing approach that says the only way that I can assert that this works is end to end testing of everything and that becomes really difficult when you've got lots of payment providers so an end to end test is actually in every single environment I've got to test every single route call center and live public all the way to the payment provider every time and that means your pipeline is going to be six hours a day instead of the sort of 20 minutes half an hour that we want to end up with so in terms of the platform we have multiple foundations so we tend to have we tend to call them shards but they're just foundations and we have three front end foundations and each of those foundations could take all of the traffic so they can all take 500 donations a second if all the others fall away we'll still be able to satisfy comic loose need to take all of that money and that's extremely important and then all the data from those shards feeds into another foundation which is our management shard so at that point we're using deltas and event sourcing to to feed data back into that management shard and at that point they're reconstituted and then comic relief can get on a view of of what's happening so we can lose all those front end shards we can also lose the management shard because all we're going to do at that point is set all the cues back to the beginning on the shards replay them to a new management shard and at that point we then get a consistent view of that data the platform is made up of 28 microservices distributed globally event sourced and and also what was important was that we were always optimizing for synchronous payments so we wanted people to be able to give their money and then get instant feedback because a lot of the time sort of nine o'clock on a Friday when the show is on and people are drunk and they're drunk in lots of different ways so a few people are probably drunk but also they've been watching some really powerful films and they've also had a campaign that's been going on for a month or two so when they're there and they're actually typing their credit card teacher you know people are generally quite emotional so we see like a really high failure rate on people being able to type in 16 numbers in a row so it's really important that they get that feedback to say oh that didn't work so that because if we send them an email on Monday morning and say please come back they're gonna go well fifty dollars or or fifty pounds seems like a lot of money now maybe I won't do that cool awesome yeah that sounds amazing and so you're doing this on PCF today and what what brought you to that choice is your kind of platform of choice for the event so I think initially one of the huge drivers was developer velocity so that's the same with everyone so when we did that initial pitch and we knew we were gonna have to come back in two weeks or a month and demonstrate to the board that we'd made some real progress it meant that on day one we could use a public implementation of Cloud Foundry at that time it was cloudfoundry.com now you know run.pivotal.io so very quickly we could we could do that and also we got sort of multi-cloud for free I think we were doing multi-cloud before we realized before the marketing people told us so very quickly we had it deployed to vSphere we had it deployed to public cloud providers so that was great and also we've been running this platform for it'll be our six year next time and that the contract that we have with Cloud Foundry has changed by one character or three so originally it was VMC push now it's CF push and genuinely that is our change over that entire period so managing this platform over that whole period has been an incredibly easy thing to do and we've also reduced our our head count dramatically so the original platform would take 50 or so people as Xenon described and now it takes six and that's a huge thing and I think what's also important is that PCF and Bosch allows us to continuously deliver everything so we deliver the apps we deliver the platforms and that's a really huge huge part for us. You were mentioning when we were outside a minute ago the tests that you run when you do those that continuous delivery would you mind like describing that that security and test suite that you're they're running with every push of that of the platform? Yeah absolutely so in the situation where we we do continuous deployment so every commit that we do is a release candidate and that can flow to production and that can flow to production on the night so when we need to make those changes when we get feedback that can happen so we have a load of tests that are certain that it's working with the payment providers but we're also testing that you know we have security in that pipeline we have load testing in that pipeline so that whole end-to-end experience is tested every time we commit and the ability to to scale up platforms do those tests and scale them down and not cost everyone a lot of money is a really huge win. Awesome and Marie from the operators perspective this this time around with the last Red Nose Day this was on Pdubs EMEA or which your team helps host and put together can you talk a little bit about setting up the production environment and working with the teams? Sure as Ben mentioned there are three separate PCFs that are stood up for this project we ran two of them on GCP one out of Belgium and one out of South Carolina now Ben mentioned the app developer side of the CF push contract and how it makes it very easy for them because nothing changes it also made it really easy for us because we didn't have to do any negotiating with the app developers about the architecture of their app or everything they needed we did a little bit of interaction with them initially to get their load tests and to make sure that we knew how to run them and we could run them effectively and then we were able to focus on scaling the platform making sure it was configured correctly and running those load tests repeatedly to confirm that everything would go smoothly on the night so it was a really smooth experience for us. And from the IAAS provider Jay can you talk a little bit about PCF on GCP and how those have worked together in this event and generally? Yeah absolutely I think the event itself was actually kind of a it was a great moment for us first of all to be associated with such a program but secondly because we've been working for well over a year by that point with Pivotal and also with contributing to the open source community amazing engineering team of Google that's been focused on the PCF on GCP integrations and so from price performance compute storage networking service brokers hopefully you've attended some of our sessions this week we've been a lot of a meaningful work but specific to this event I think things like our global load balancer so we love to talk about this because our global load balancer that you can use on GCP it's the same thing that powers YouTube but powers Gmail okay and you get that when you're running your applications on PCF now in the context of an event like this why that matters you know 75 million pounds for us Americans in the room that's like a lot of money I think it's like 95 million bucks so if you think about the number of transactions and the need to be able to scale instantly from zero and what if it was a hundred million what if it was unfortunately 15 million right what are you doing to plan for what that front-end load balancer does okay but when you're doing on GCP it's literally no ops no infrastructure API call go from zero requests per second up to a million requests per second in seconds so these are some of the things that we've been kind of embedding inside of the PCF on GCP implementation that allows you to do those kinds of things when you're running on Google so it's exciting because we knew obviously the load and the scale and everything that the Red Nose Day would bring so we were kind of sitting back like everybody else like okay here we go it's go time but luckily it was pretty boring night so that's a good segue like how the event go we love boring nights on yeah thanks again for working with it boring or the best releases yeah I mean 2011 was the first year that I joined comic relief and I worked on Red Nose Day and it I could say is probably the single most exciting work experience I've ever had you know it was after weeks of high stress and this event happens lack of sleep and just even the smallest thing that went wrong just I just flew off the handle I didn't know what was going on and like eventually calmed down and it was very very exciting now it's really boring and that is very very good you know if anyone anyone who works in kind of software development works in operations just knows what how slightly dissatisfying it is when everything is really boring but how great it is because it's undoubtedly the best thing and I just like to mention something else that you know that Ben alluded to about the fact that you know the implementation with Cloud Foundry hasn't changed over six years that's an amazing thing and for me you know I'm a bit of a cynic and when we first when they first brought me Cloud Foundry said oh you know run across you know a multiple multiple infrastructure providers I was like yeah of course it will with loads and loads of work but it just does work you know it really does what it says on the tin and I think it's very easy at the pace of technology moves to take stuff for granted but to be able to say that we built an application not had to change anything about the integration to the kind of platform provision of it and that it just works across multiple infrastructure providers is an amazing thing and it's very easy to take it for granted just to kind of close it out in general what is this kind of meant for comic relief and what are you looking forward to going forward in the future so I think like Ben said the most important thing was we take all the money and we took all the money and you know the moment that the total is approaching kind of 77 million pounds you know it takes a total of what we've raised over our lifetime to about kind of one point over 1.2 billion pounds and that money has a real effect on people's lives and I'm incredibly proud of the team and what they've done and difference that's made into people's lives in the UK and across the world and really for me what really proved to me was this that you know by using the PAS you can allow your teams to focus their efforts on where the value is being added on that kind of app development and it allows you to move so much quicker allows you to move to really innovate on top of that knowing that you've got kind of this rock solid foundation which you're which you're kind of working with so after we had the success of the donations platform we've got a fundraising platform which is a slightly different different thing and we converted that from a kind of PHP monolith and started using pulling out some microservices of that and moved it to Cloud Foundry and that just worked as well that was running on on PWS US and that and that worked for our campaign you know we've got a few million dollars a few million pounds through that as well and so really from my perspective as kind of CTO the kind of strategic direction really is to have Cloud Foundry as our kind of baseline foundation for all our development so we're slowly kind of creating other apps and things over to it awesome and that basically takes us to the end of what we want to talk about but if you have questions for us we can have a few more minutes any any questions Tony sure so we round up building a set of tooling we knew we would need to create and tear down a number of production instances for our load test process so the first step for us was actually not to start bringing up a Cloud Foundry it was to build tooling to bring up a Cloud Foundry so that we had that available we knew the configuration was consistent and we could delete it at well since we didn't want a production scale thing for the entire you know six months lead up to the project yeah I think we had enough lead time that we we had a pretty clear picture of where we needed to be and a sense of what we would need to do to get there and we were able to plan it effectively to get to that point so it's actually been not subsumed is not the right word but preempted by the customer zero pipelines that are being made available for a very similar purpose we were building them in parallel with customer zero since we needed them before they would become available but that's the canonical source for this this type of work any other questions from the audience yeah so does it change I guess it depends so it depends on your components so some components have slightly longer test phases but I'll say it's about 30 minutes per per environment so we tend to have two so a commit would probably get to production within two hours I would say and I say like everything is continuous deployment so every commit will get to production if it doesn't you know fail any of the testing so we so we've got offices in Voxhall in London right next to the MI6 building so for you guys have watched any of the James Bond movies that building that blew up where right right next door to that which might be an app destruction for some of the earlier campaigns that we had but from a technical point of view but more recently they've been better so we have we get the Armacuni guys and this year we had the pivotal guys as well David Lang's team in the office with us and we set up a room we've got lots of kind of boards with loads of kind of charts and graphs and things going on we also have a kind of a dashboard which shows the shows the amount of money that's coming in where it's coming in from and kind of overlays the previous years and all that so we're watching that constantly and we use that to provide feedback to the broadcast show so we say this particular film got this response he did that you might want to consider rerunning that 1130 and all those kind of things so having kind of a constant feedback loop so we have a large kind of operations team I'd say and operations more generic operations as kind of finance teams and all that in one room then we have the guys who are running the donations platform in the other room and everyone's got their screens up and watching particular particular kind of graphs and particular bits of information watching payment service provider response times what's happening with those any kind of latest any outliers that are happening for them to go and investigate any kind of we have a person who's directly in contact with the call center if people are phoning up and having particular issues if any of our so we've got kind of 120 companies who give us their time for free and they have call centers where people phone up to to give money so we have contacts from them coming into us if they're noticing any particular issues or any particular problems and so yeah it's quite it's quite an intense evening yeah and I guess from an application point of view that's a really key part that all those elements all those external third party dependencies are you know wrapped in in metrics and and we see fluctuations in in how they're behaving before anyone else does so and we can start to load balance away from certain providers or we can upscale ones particularly quick and just so we can get that feedback loop as short as possible so we can get users giving money as quickly as possible and it's also worth saying that we have kind of we've got built great relationship with those payment service providers and other third party suppliers that we rely on and we usually have kind of open calls with them so if we need to jump on to them they've got kind of supports teams kind of supporting us during the event I think it's fine that you say that because I know I David lying one of Murray's counterparts he actually had a quote it was kind of funny more the way he said it he said you know it just worked and that was kind of it right so when I mentioned about the load balancer example of zero to a million requests per second you know it even as huge as the transaction counts were that's what was nice about it as we kind of knew up front it wasn't going to risk threatening you know any kind of interesting latency and she's the things like that so yeah it was actually like I said it's a very calm night and from the past for us for the load balancers I gather that there was some pre-warming necessary for other IS that wasn't necessary on GCP and so we did some validation it was great and we didn't need to do any additional work there and I guess just one more point and that is that and like we have some really great partners and we have really great support and they're on the phone and they need that support but the whole platform is optimized not to need that support so whenever we're being offered that support if anything special can we do can we do something behind the scenes to do some tweaking generally no just like we we actively stay away from that we just want to hit an API we just want the same service that everyone else gets and if we can't achieve what we need to achieve with that then we need to rethink it we don't want people snowflaking somewhere something behind the scenes doing some pre-warming doing a little bit of config change we don't want that we want reproducible continuous deployment everything is code great any other questions no well thanks again for sticking it out with us y'all hope you had a great conference and we'll see you next time