 Hello everybody and welcome to another tech and talk today. We have Sarah Wells from the Financial Times One of my favorite newspapers and she's been leading up the efforts at financial times around getting their New practices shall we say and getting DevOps and agile and all of the goodness of that Incorporated into FT.com and the operation side of that so I'm going to let Sarah introduce herself and talk a little bit and then we're going to have a little Q&A conversation at the end of that and With that I'll let Sarah get started introducing yourself Hi there, so I've been working at the Financial Times for six years Up until fairly recently. I was the tech lead for our content platform So I've been very involved in the transformations that we've done over that time We have made a lot of changes. I've recently moved into a new role as technical director for operations and reliability So I'm that's going to be a role where I expand on some of this And but I just wanted to share some slides that I put together and to present to our business To try and explain why we made the kinds of transformations that we've made over the last few years I think it's easy as technical people to get quite focused on on The reasons we're doing it for a technical point of view But actually it's good to think and stop and sell it to the business as well So these were slides for a non-technical audience, but I'll expand as I go So the transformation I'm talking about is really around the move to DevOps The adoption of the cloud using containers and move to use all the things from cloud native So doing that kind of transformation costs time and money. So you have to be able to explain So when I joined the FT In 2011 all our software rental machines in our own data centers We had developers and we had operations and there was very much a throw over the wall approach to releasing stuff and we had very limited automation so Setting up a new server was all done by hand and if you wanted to build a new product You had to buy that server and configure it and it on average it took about four months to do it And if you're going to take four months to set up a server You have to be pretty confident in your idea because you're placing a pretty sizable bet that it's worth this investment And when we did have that server we weren't able to release new software onto it without system downtime for the old ft website It was the website that was running up until 2016 We had to do releases outside of normal working hours because nothing could be published while the release was happening And the application was a monolith It sat on a relational database. So schema changes were the real killer on that So what that meant is that releases obviously happened rarely because we did once a month on a Saturday And you can't try something out Unless you have lots of opportunities to do that. So when you only have 12 releases a year, you obviously can't experiment And when you do a release Because you're doing it so rarely every release is terrifying Which meant that we put a lot of process around those releases So to make sure we try to try and make sure we didn't break anything and this is genuinely a real diagram of the process We went through to release stuff and a lot of the steps in this are around Validating that we can recover if things go wrong So now um, there's been a complete transformation Of the organization over the space of maybe two or three years. So first of all, we can provision a server in minutes And I worked out I think it's one It's eight hundredths of a percent of the time to get a server set up And that's a huge difference because it means you can try something out today You don't have to wait for someone to buy something and configure it And in terms of releases we do a lot more So, uh, I was in charge and to recently the content publishing delivery platform at the ft and we did around 2200 releases last year And just for fun, I created the same a graph with the same scale for our releases in 2014 And as you can see it's barely visible And if you add in the ft.com website, which now we we have two separate things the platform and the website We used to have one thing it's roughly 500 times as many releases as we used to do for the same functionality And that means you can experiment because you're releasing things maybe 20 times a day and you can try out individual Made this happen Well, obviously automation It's kind of the first thing that anyone does when they start thinking about dev ops or they start thinking about improving stuff And obviously computers are better at doing this But everything is automated you You basically don't have to to do it manually. You're less likely to make mistakes And for a technical point of view, there have been several iterations in this process at the ft. So the first thing we did was um running in our own data centers using puppet And we deployed one service per vm and we had things like monitoring and log aggregation as part of that So that was a massive improvement on what we had before um, but actually one service per vm is quite Wasteful in terms of optimization You tend to not be very cost effective on that Um, and you don't really want to be running your own data center So the next iteration involved just moving on to aws. We're pretty in on aws here um It's a lot cheaper and lots of things become someone else's problem And then after that the next thing that we did was look at containerization But the ft's actually quite diverse in times of our approaches. We have uh Since we want teams to be empowered to choose their own solutions. So in fact, we have some teams that are Uh, still using puppet and running on stuff in on aws one service per vm We have several teams that are running Kubernetes so Basically running containers and we have quite a few teams using heroku where you're basically passing even more stuff off to be someone else's problem So obviously cat and lot pets. Um, we used to definitely have pets We had servers in the ft that had a riverside view in the center of london And uh, they were all different and that's not the case anymore We can basically trust that most of our vms will will get on there. We understand what's What's there what's installed The next thing we did if it hurts do it more often jazz humble is right. This comes from the continuous delivery book. Um, it is counter intuitive But uh, you do benefit from doing the painful stuff more often because you have to solve the reasons why it's painful Once we decided that we wanted to be able to release code at any time We had to architect our systems so that we could release that code without affecting people currently using the system And because we do lots and lots of releases Obviously, we know what changes in each of these you can understand the change and you can uh, measure the difference And also if something goes wrong, it's much easier to work out what it actually is We used to do those monthly releases if something went wrong You had to work out which of probably hundreds of changes actually had had the impact Um now we don't have to do that Obviously when you do something all the time, uh, they get easier So we're just it's just useful. We can we can release and we can look at the process and say here's a small tweak we can make and then The thing that's really important here and I think and curry in one of your earlier podcast mentioned This is the feedback loops the ability to do things faster is is the crucial thing And there's research that suggests that only 20 percent of features in a custom software program get used and the other 80 percent Is just a waste of time and money, but you don't know which is the 80 percent upfront So you actually need to get that out there in front of people to to tell that and we can do that because we release things Many times a day and we basically get them out there as soon as they're available Um, we don't go off into a rabbit hole while we're building something we think people will like but they don't Um an example of this would be something like, um, we have film reviews on the ft and we thought well What if we put the score like the star rating on the index page? That'll be nice. We can with a list of reviews That'll be a nice thing to do But it turns out that if you do that people don't necessarily go and read the film review And that's what we want them to do, but we were able to measure that really quickly And then but there's a bonus about doing the releases More from which is the failure rate when we did 12 a year One or two of them would fail. Um pretty much every every year. It's about 15% failure rate. Um, But now it's probably less than 1% And when it does fail, it's so much quicker for us to solve that problem So I said that I mentioned DevOps. I think you build it you run it is is a kind of, um Central thing about doing DevOps when you have separate teams doing development operations They have conflicting goals because developers want to get things out there and operations want to keep things stable And clearly the most stable system is one that never changes We don't have those two separate Teams anymore and the people that really choose to do the releases are the same people that that built the code They can decide what the risk is of making a change So and it turns out that developers do understand the risk There aren't many code releases that happen at five o'clock on a friday And when there are it's because people understand that that's not going to cause a problem so Basically lots of this is around DevOps, which is basically a cultural change. It's not a process change It's about deciding that you can trust your people and get them to collaborate Um But it's also about cloud native and I took this list from an podcast where the um first three Of what the cloud native computing foundation defines as cloud native and the other two are ones and identifies as important And I think that we actually are absolutely doing the last three of these it's standard We we build things as microservices. Everything's automated. It's all in the cloud The first two We either do that or we hand it all off to someone else So we're using heroku. We're letting them do the management and the package and those things work for us So that's really a quick summary of some of the changes we've done and just a few stats to show The impact really It's it's really it's it's great to hear it at from From a media perspective too because we expect so much from our media websites and The folks that are serving up the news and even the little And it's a little Story about the taking off the star ratings so that people actually read it We we as end user consumers expect So much variation and so much content Accessible that We forget that the underpinnings of that Is are technically Difficult to achieve sometimes and so it's interesting to hear it from from your perspective because we often hear At least that I do and my red hat roll from large enterprises that are you know manufacturers or financial services or that and They have Other issues that they have to to deal with like security and privacy and and things that are are really high up The experimentation bit that comes And the ability when you're doing so many different releases is just Phenomenal and a huge game changer for most of the industry as well You you mentioned so Sorry Our website team when when they started to build the new version of FT.com a few years ago They absolutely built in from the very beginning the ability to do a v testing of things So everything that we build has a hypothesis and a measurement So we can say whether we achieve what we were trying to do or not And it's incredibly powerful because it's amazing how often your hypothesis is actually wrong It's it's curious of the effect to me too is like what I really like is is the mix you guys probably are Um the epitome of a hybrid operation You know still doing some stuff with vm still doing stuff on aws with puppets still doing some stuff with kubernetes And you know this whole mix and you are sort of the poster child for hybrid cloud deployments Um in the way that you you talk and describe your your situation I'm wondering if you've had to grow your operations team if the size of the team has changed or the the You know how big the team is behind ft So I think over the last few years We've probably have a slightly smaller operations team than we have before but that's because we expect A lot of things to be picked up by delivery teams partly because The freedom to choose lots of different technologies means that it's very hard for a central team to understand all of the things and partly because our new architectures are Actually pretty reliable. They're built. They're built to be resilient So we we kind of expect the things that actually happen to be things where we might have to get people in with specialist knowledge Um and the the fact that we have so much variety I mean, it's great and it's let people move really quickly In my new role one of the things I need to look at is how that impacts on an operation team and to try and Find some things that we can do that that we make common across the ft Because one of the downsides of everyone being empowered to do their own thing is everyone solves The same problem a subtly different way Yeah, we we've run into this like I remember the promise early days of Platform as a service was that you know developers could use whatever bespoke Framework that they wanted to use or language or Database or that and so there was this you know a little bit of an underlying thing Theme of you know People could go off and build a container with anything in it. Have you done anything along the lines? It's like standardizing the frameworks that you're using or the tools that you're you're using it at all Or you is I can't imagine that it's complete free for all It's um, there's a fairly lightweight standardization Um, we probably feel we could do a little bit more But for example pretty much everyone sends logs to to our log aggregation Set up because it's useful to go up to look at things across all of all of our applications We have a Standard for a health check on applications. So we expect that on a particular, you know underscore underscore health You will return Jason of a particular format so that we can easily plug that into monitoring So we've had really quite a lot of success with things that say you need to do this, but we don't really mind how you do it Um And then we find that some things are just you know people start using them and it spreads around the organization because it's so obviously a useful tool And I think that's where we want to go We want to be trying to say show people things and say Here is something that we think you'd find useful and the proof is whether they then adopt it The and another thing that the interesting thing for me that in your slides you were talking about That it was as recently as 2016 that you were on the old ft.com site and That's only two two years ago. That's a that's a very rapid amount of change in Under two years really what what sort of um impact is that had on on the team in the culture? Moving that quickly so Yeah, so I guess that went live the new website went live two years ago probably Probably um had been under development for about a year. So so there'd been some overlap But yeah, there has been it I'd say there's been a fairly big transformation for most people It's so obviously much nicer to a lot of a lot of your frustrations went away You know, it's so frustrating to say I have to go to I have to go to Meeting on a Tuesday to decide that I can release on the saturday or something like that That's a frustrating thing for a developer. So Mostly people welcome the fact that they are trusted they're trusted to make the right decision and A lot of it the crucial thing is having the right sort of people. So it's having people that are Interested in trying trying new things and they're quite flexible and willing to To just learn something new has been really important No, I think I think that's um, it's For me, it's been a lot of um cultural shift within organizations like like ft and others And you're spot on with the description of you know, people who are flexible and able to take on new technologies and changes and and and frankly the stuff isn't That it's a bit of a mind bend But it's not that much different than the coding we did before and it almost gets us back to Being more of coders And gives us a lot more freedom. I think then I think for me The one thing that really got me out of being a full-time coder was because I had to Take four months before I could see my stuff released and now um You know with a containerized universe and cloud native universe I can get stuff up and running in in minutes As you described as well and and that just is almost a joyful thing You know to be able to see the fruits of your labor out there in in the real world And get the feedback, you know, it's it's like instant gratification in some ways And it really changes I think and what I've seen the demeanor of people who are developers um The willingness of them to work in large organizations as large organizations shift as opposed to all Focusing in on wanting to work on little pin startups and shortage or wherever It's I think it makes it easier for big enterprises to recruit Fresh and flexible and new new technology people Um As opposed to if you were going to sit them into and I'm trying not to disparage any old systems But like a large enterprise brp or publishing system Are monolith and ask them to work on that for the rest of their natural lives It's it's a new world and I think it makes it much more interesting for people who are developers I I agree completely. We've had Definite ability to recruit people who would otherwise be working in banks You know for more money because we can say well, you won't have that frustration It's really interesting to me that the ft we started Our hack days probably in 2011 and the difference between the first hack day and the latest hack day You know the original hack days you you'd build something you'd be running it locally You'd be mocking things up The last hack day people actually put some stuff live behind feature flags on our website You know, they've got they've set up dns entries. They've created a run book and they've done they've just done everything and They've done it in hours Whereas you used to be really desperately trying to get things to work So things like heroku things like having your platform set up is really good But I did want to say one thing about the idea of you get to go back to being a coder Because one thing I've really seen with this is we are less coders than we are all sorts of things because The amount of time that people on my team spend or that I spend coding versus Making a choice about technology setting up infrastructure doing Operational type things it's changed. It's definitely the code's the simple bit now It's all it's a microservice architecture. It's all the stuff in between that is really complicated Oh, see I'm seeing some let me be a different aspect too is that Things that I couldn't do before um Because I didn't have the compute resources and I'm thinking of things like machine learning and predictive analysis and things that That you know, the compute resources were difficult to acquire or In terms of you talked about the four months to getting something to seeing something in production So I kind of see new workloads being available for me to Code on and stuff, but you're you're right about the upside of things. It does Make us more beholden and responsible for where our code runs and what we do And there is no more throwing it over the wall And anymore you really do have to if you build it you have to run it as well So it's it's an interesting new world You yeah, you are absolutely right the the ability to just say hey There's a new library and I can try it out and I can have something live Is it's so powerful what we find then is that the I now need to make this and I need to make this something that's scalable and secure and All of that stuff becomes much bigger Part of your effort the actual putting it together is is become simpler Definitely, so in your new role, what's your biggest challenge? As you roll this out further into ft So Um, so it's not so much that I'm rolling this out because actually it's been done all over the ft My challenge here is partly to try and Bring us back to be doing Having a little bit more common stuff between us. I think we've gone too far in terms of the diversity of approach Makes it difficult for a sort of first-line operations to support things um So the the challenge is to is to build a tooling that helps us to support our applications and to convince delivery teams That they want to use it. So it's build things that people actually actively want to use Yeah, we we see that with um Things and like we have a a set of tooling called source to image Which helps us build images that are shareable and are reproducible and manageable um for You know the different stacks that people are using and we see that in um Lots of different aspects of it too If it's that I mean it's it goes back to what I was saying earlier about the promise of pause was I could use anything I wanted but the reality is that Some sort of standardization on the images and the image catalogs that we use and reuse so that we're not building you know New images every time we push out a new app or something that we can use layered image approaches and things a lot of that nature I also think the the work that the service broker folks are doing in kubernetes in terms of Allowing you to create service catalogs That you can share microservices Amongst different applications and and incorporate them It's going to make a big difference for people like yourselves Yeah, we've we've definitely found a kind of wider scale within the organization that are moved to A microservice architecture and the fact that people can build tools to and applications to help in very short amounts of time means We've got so many things And knowing who owns them You know how to fix them where they're running It's a challenge. So we're looking very carefully at you know, we have a We have a central database for our systems. We're looking at how we make that into a really high quality thing that we can use in a lot of ways So take take a look if you get a chance at some of the stuff that's going on in the open service broker initiative inside I think you'll find that is really helpful the creating service catalogs to use across Your the entire enterprise that's that so they have more shared resources and an easier way of cataloging them and maintaining them Really been very helpful for a lot of other folks as well Um, it sounds really interesting So, um, I know you you mentioned that you're using heroku And I don't think heroku is on kubernetes yet How are you finding um running kubernetes in a house? And managing that is that then a big learning curve for you guys Or is that just something that now is part and parcel of your infrastructure team? It's been a learning curve. It's been it's been implemented by two separate delivery teams. Um I'd say the challenge for one of the teams. This was their first containerized stack um The other team it was my team was a bit more challenging because actually Two years ago we built our own stack and moving 150 microservices across Uh, it's just takes it just takes time if you basically like, okay, so we need to have a home chart for each of them We need to set the pod Affinity we need to set memory limits, you know, there's a whole bunch of work um So I think having built it yourself. You're just you look at At um kubernetes and you go great. This will be much better than what we do But then you have to get there Yeah, you have to get there the diy kubernetes deployment says is always interesting To hear what what shape how they shape up and and and how much work and effort it takes to to get them out and deploy and What'll be interesting is to see maybe a year from now where you're at In terms of the standardization and if you take what what approach you you've managed to take with With having so many we have VMs you have kubernetes and you're using heroku how you managed to implement that Um approach to standardization So i'll be curious to hear more on that in it's definitely gonna be interesting. Yeah, I think it is it's a challenge it's um one of the things that that we um because i'm i work on the open shift project um because we have we we have the ability to run open shift on on any cloud and aws or anywhere um And g and gcp or on bare metal. So it's the same interface. It's the same service catalog It's a you know across all of those platforms. So that makes it easier Because you're all sharing the same Images and tooling so it'll be interesting to see how you mix and match And and how you grow that So i'm sure there's lots more to talk about and around that And and uh, I think the other thing that I like to do on these tech and talks and and i'll ask you to as well as if there's um If there was one other person in the industry that you would like coaching from or to hear from who would that be? so um So someone who I think is saying a lot of interesting stuff in the areas that i'm really interested in which is around microservices observability is um sindhisarjan She's written a couple of really interesting blog posts about how testing changes in a distributed microservice basis architecture on observability and cloud native Um, I think that she's That would be a that's a great suggestion. I I've read some of that stuff So I'll have to reach out and see if I can coerce her into coming on As a guest to talk about this and and um, I wish you all the best in in your new role And um, I look forward to hopefully meeting you sometime soon in person as we get over to london for the um upcoming Openship commons gathering On the 31st and of january And if you're listening to this um in the next few weeks and you'd like to join us at the at the Openship commons gathering Um, drop me a line and I will get you hooked up with an invite So um again sarah, thank you very much for taking the time out I know your day is really busy and um, we really appreciate um hearing your perspectives on all things cloud native Well, thank you very much for inviting me. It's been fun