 Hello, how's everybody going? Day two we're at the end almost But you're stuck with me for 30 minutes. So and we should probably get started because I have a lot of slides So cool. So we're talking about a case study on building cloud native platforms My my boss kind of coined it a love letter to skipper. So you're warned but But just a quick quick little thing about me. I'm the operations leader previous next I oversee the the hosting platform and infrastructure and everything from the operations angle and how we deploy code How we yeah, everything on the back end For this talk, I really have three main objectives It's to talk about the journey that we've been on over the past Pretty much decade to this point the lessons we've learned along the way and Then I also want to just embed a couple of little cloud native Concepts in there at the same time teach you a little something at the same time that you can take away And then hopefully we'll have some Q&A. Otherwise, we can do it outside afterwards as well. So Cool, so right at the beginning. So in the beginning we had virtual machines So and those were just the box, right? Anyway, it's what silicon valley. That's your box Like we just had this one VM with every layer of your stack in it varnished Apache MySQL And then if you're really sophisticated towards the end of that journey You're also then splitting everything out and in this case, this is biased towards AWS But this is using like a CDN at the front load balance that distributed across a couple of those VMs Maybe some auto scaling groups and then some services in the back end to allow you to scale out This is kind of the state of the art at the time, right and From there we saw this as the way forward at the time and started to build a lot of tooling around that sort of last part So we were using packer to package our code into AWS and my images and then we had a whole bunch of tooling that were kind of orchestrated a bit of that and then roll out an auto scaling group and turn the knobs So the new version goes up and the old one goes down a lot of stuff. That's baked in the auto scaling now, but The biggest issue was it was so slow. It was very very slow like like 30 minutes slow And if you double push to master, oh boy Like just for dev you were waiting development teams were waiting waiting waiting, right? So and then you have to do a little bit of manual intervention And not only that we still then had with this model this kind of image packaged model We also still had other things to overcome. We had configurations like for Configuring the Drupal app decoupling our config from our Drupal code We had secrets that we didn't want developers to have for, you know, whatever they were API keys things like that And we also had storage concerns about what were we going to use we were integrating with s3 at the time natively using the S3 module but that required us to integrate that with every single Drupal site Spokenly essentially because stream wrappers and it was it was a nightmare. It was a nightmare at scale unless you had one or two sites That leads us into and everything looked really bleak, but then we go into chapter 2 Docker so Little question. Does anybody know when docker 1.0 came out? You want to take a guess? wait October 16th 2014 right I Feel old but not only that we Started experimenting after zero dot seven just after zero dot seven in that kind of end part when it was Very Experimental not very robust like that that a journey to 1.0 was all about trying to be production ready and stable and Something administrators could trust but then we started Playing around with zero dot seven because that was that was the first sort of open source release What that meant was we took that one box and then we went sweet We can carve that one EC2 instance out into multiple EC2 instances, right? So then we took our docker containers and made them VMs as well So in this scenario, you've got like a routing layer with engine X and Then each container was kind of just like our own our own VM or our own big monolithic image, right? And that gave us our Apache MySQL SSH as And and the whole thing was orchestrated with puppet as well So like the config was being it was it was pretty nuts really But it did give us a lot of decoupling from that instance because we could roll out new versions of PHP for different projects and and Isolate them and so it was It was a really nice sort of first step But we learned a lot of lot of lessons along the way there around running all these services in one image and and You really want to run each one of these services in its own discrete image So then Oncame Kubernetes, right? So that last architecture not really great for production, right? It doesn't like we did experiment with the idea of like Like having VMs get all like run out auto-scaled And then we would sort of package the container image in there and roll that out too But that was just as janky but on came Kubernetes and There was this amazing announcement at docker con 2014 with the announcement of Kubernetes by Eric Brewer Who dropped a lot of foundational information that just went over my head. I'm like, what are these things about pods? What's this at CD? What's it? Like what what is all this technology and honestly? I looked out and went that's Interesting, but I don't know right. It wasn't that like like I don't think the world knew What was actually happening at that stage when? Google was essentially coming into docker's con right and then announcing this massive project for running containers Anybody who know when Kubernetes 1.0 was I'll answer for you July 21 2015 so Not very long between the docker release cycle and and then the Kubernetes coming out right the 1.0 to the 1.0 And then we did slowly come we did come around to this And the way that we perceived Kubernetes was okay Well, we have that that HA stack that I had before and Now let's just swap out the middle. It's really just for that middle piece It's cloud front it's load balancer and then we can just do the Kubernetes in the middle Right and then maintain all our existing tooling for managing the infrastructure around the application Which then led to skipper being developed. We didn't start with YAML And looking back on that it kind of seems a bit crazy We didn't we didn't start writing YAML files and things like we never deployed Like our first application that we deployed was with skipper It was actually never like a YAML file in a in a repository and and using koot CTL to roll that out so The goals for skipper and the first commit for skipper was January 5th 2016 as you can see I've been going back into the time and Figuring out all these dates and getting very nostalgic along the way But the goal was to have this layer on top Because the YAML looked really complex that exposed a lot of really, you know complex deep features like Develop like right from the gecko. I was like well developers don't really care about what the resources are They don't care what ports being exposed, but that was what was being presented through these YAML files so So we ended up with this architecture for a command line and API and then Kubernetes in the back back end Which then led to And then there's the CLI so it's very you know, it's a bit of land that you know It's it does the job, but that led us to this guy go so Our journey into go Was based on our requirements right so so we looked at tools around we could have written this thing in PHP Right like that that was our bread and butter. We were you know, we we work in Drupal could have all this stuff could have been done in PHP But then we looked at our requirements and what will the CLI kind of needs to run on Linux boxes and and OS X and potentially Windows Now that's paid off even more because there's way more arm boxes like like my laptop That's an arm like an M1 like that just works Go also has a baked in Web server and then a bunch of frameworks around that it has a really really nice web server built in and And then and then finally Docker and Kubernetes were all written in go and That doesn't seem like much but at the time there weren't any client libraries or anything like that We were importing the Kubernetes source into the skipper project and then tracking everything from release to release to release and following how that You know essentially had how head was going and and trying to compile our API and keeping up With with releases that way. So whereas now it's much much more robust. There's multiple different languages for the clients But the time that's that's what we used But we we had some challenges along the way with the B1, right? So like the challenges didn't really go away. In fact, they kind of got a little bit more in the beginning because We don't have a Managed service for Kubernetes. We didn't have one in the past like we do now, right? so in the past we actually took Packer built our master or our controller like our main node and Then then the nodes after that. So we were packaging everything and self managing in ourselves essentially even before kubernetes The hard way came out which was this great Repository this great resource to kind of ease the pain of people Provisioning managing maintaining and running kubernetes from release to release to release So we were doing all of this ourselves because we knew the payoff Was far greater the fact that we were deploying containers and having this great developer experience through that CLI and API I Had to go deep into networking. So I don't think many of us think about it now because it's just there but in the past When doing that self-managed kubernetes You had to pick a networking solution and there weren't really too many around it at the time there were things like weave and Like all these different sort of like overlay networks that had different trade-offs and we're trying to gain market share and then we actually kind of Zagged a little bit when a lot a lot of folks ziggs like we went we were already running Ubuntu for our nodes and Ubuntu actually has this fan networking Solution I won't go into detail. I like I won't go deep in this, but it was very Easy to kind of understand like the inner and the outer networks So the network that the VMs were running on the network that the containers were running on and and so we wrote a little bit of Tooling around that which works works really really well Comfig was another thing we we did actually have to solve so Going back to the challenges of building packaging and deploying those VMs like the the config challenge didn't really go away We use the kubernetes config map API and then we then we kind of wrapped that with our CLI and API and smooth that out for developers as well as a I have a big big breakdown of all this on the skipper blog and then also The dock side has kind of a step through how to how that that flow works Um, there you go. There's a link. I'll post it later But then application so developers could do like a skipper config set get delete and then the application could load it through a library Which this was honestly game-changing because ever like All our projects had like settings dot local dot PHP, which was very cumbersome and Very hard to debug if somebody had some bespoke configuration locally or not and what this meant was Really, it's just that like database line, right? It's like get the config or this is the default and that was it That was it. That's your local configuration dev staging production That's all set the skipper and that smoothed out a lot of stuff where it's like, oh, this config needs to be set for CI local dev staging prod All that was kind of solved in one fell swoop with this Storage Storage Storage was interesting because really the solution at the time was deploying NFS server on to the cluster and run NFS yourself and mount some blocks and have some mounted block storage on there It's all good. That's not really ha and While we were going down the networking rabbit hole. We didn't really want to be storage Server experts, you know at the same time like we were seeing all these extra things that we were having to manage so and We also saw the pain that we had previously with like S3 module in in Drupal. So we went down the path of mounted views file systems with S3 and There are so many these are the three that we had to pick at the time and and you had like S3 FS like the first one Which is posits compliant, but it's slow Noticeably slow You had Rio FS which we landed on at the time because it was faster because it didn't adhere to all those posits compatibility rules and Then we were starting to things were getting dire and we were starting to look into goofy Which was like a go version of Rio FS because we were running into some issues with caching and things like that we were It was it was a it was a very interesting time I and I still remember where I was standing when the EFS got announced for Australia, and then we immediately got Started working getting on to that. So We also built a bunch of integrations into Like with pubes so you could extend Kubernetes to make all of this mounted volume stuff automated and And yeah, we've written a bunch of provisioners and now we're using the community one Um database images so everything's running up on your infrastructure, but then we sort of had this issue of okay Well, how to look how does local development work now? because we By doing our CLI now API integration, we didn't we lost drush integration So you couldn't just go drush at you know at dev and sink it down So we had to come up with a why we had a constraint We had to come up with a really interesting way to do it and we went well Why don't we package that database into an image and then folks can just Dock or pole and pull down the image and we can sanitize it and the more we thought about it the more we realized This is an amazing idea because not only is it sanitize You can also have policy around who can access and pull that image down See I now gets a Completely sanitized version of the database. So just had this flown affected where at first we weren't sure but then after really like Just diving in a little bit. Oh, wow. Yeah, this is it's really powerful and we've invested in this ever since There's a quick example for folks who want to come back to the slides and see a configuration But but we essentially wrote this tool called MTK which Stands for my spell toolkit and you can kind of replace fields strip out databases No data certain databases and and that's the repository there if you want to check it out for yourself So we're in a pretty good spot at this stage And we were migrating projects onto our onto the new infrastructure and they were all running on skipper v1 at the time But at the same time that we were doing all this work There's also this massive cloud native wave coming because not only were we building tools on top of Kubernetes Everybody else was building tools on top on top of Kubernetes Everybody was building solutions and products on top of Kubernetes people are getting, you know Startups will pop it up everywhere. I'm trying to be the next thing on top as well. So This is massive massive massive influx of projects and I mean this is the screenshot of you go to the cloud native foundation Which is the governing body that kind of shepherds and and assists with all these projects. This is the latest screenshot, right? It's nuts It's very confusing and it's only gotten bigger, right? so But at the same time like I said, there's all these there's all these things being built on top of Kubernetes And there's all these new patterns and models and there's a lot of maturity starting to happen in the Kubernetes project around process so We saw this and went okay, we want to build a v2 of skipper and why would we build a v2 of skipper? Well geez yep first commit 2018 so four years ago from that years ago We did we fell in love with reconciliation loops so Back at that graph where we have our CLI our API there's an I could have put a graph in but there's this in the back end too which is looping and enforcing state so instead of us our API just going deploy the app and that's it It this is also in the background going is the app deployed the app deployed is the app deployed And that sounds really silly, but it's actually extremely powerful because you're Keeping The state of the cluster like you're keeping it all in sync. So there's no drift There's nobody going in and manually changing a deployment or something and then it's it's off And this is really powerful because not only is it just for like deployments, but then it also fed into everything so So cloud want to provision a cloud front. We wrote it We wrote a reconciliation loop for that want to provision some database credentials We wrote a reconciliation look for it. We were able to write these small little loops of code But then just become this big tree and really embody that like Unix philosophy of like one Do one thing and do it well, and that's what each one of our loops were doing in kind of this Orchestra being orchestrated by our platform If you want to learn more about this approach I did a talk at container camp And I kind of compare this approach to like HTTP like a HTTP request in some ways. So Just a self plug there The other side the other thing that we learned was around our API, which was a HTTP server and we wanted to We wanted to leave our options open Let's put it that way in terms of like what do we want to build on top of this API We have a CLI right now. We want to leave the door open moving forward and that's what led us to GRPC so, I mean a lot of developer folks are very Familiar with the idea now of having like a like a definition file and then generating or building against that definition Just like Brands's Graph Q Adam Bramley's GraphQL talk yesterday very similar in some ways But yeah with GRPC you can create this definition file for a service So like the hello service and then you can tell it the inputs and the outputs And then you can generate server stubs and client stubs in whatever language. So we created go ones. There was no this PHP It's Java. There's a whole bunch of them So that left the door open for us to build fun cool stuff on top if we wanted to moving forward It also meant that our APIs were very well-defined. So we have a serve we have a server For kind of each core concepts of its environment like deploy or delete or create. That's a server config That's a server like set get delete. It was this really nice way to really elegantly define our API and Then if we ever needed to make changes you could see it right then and then what what we were changing what the input in the output was We invested heavily into terraform. So Previously we're using some puppet and some terraform But we wanted to go all in on managing our entire infrastructure with terraform, especially with Kubernetes in the mix We wanted to manage Everything that was running on cluster that wasn't managed by the skipper like it wasn't a skipper application so it was like log aggregation and and security tools and things like that we wanted that to be managed all in this one repository the same thing that's bringing up the Cluster should also provision the stuff on the cluster that's used for operations as well So and terraform took a really long time to mature towards having a provider having a plug-in a provider for Kubernetes so we wrote our own and And we still use it to this day. So it's um It's a pretty simple provider But it covers a lot of the API set because we've deployed a lot of stuff onto Kubernetes that Needs specific API. So there's a massive amount of the Kubernetes API surface that we cover with this provider We are looking at and starting to use the Kubernetes one more like the official Sorry, Hashicorp terraform provider more, but we're still using this a lot With v2 we really focused on monitoring and we invest a lot of our time into monitoring too. So I I aggregating all the cloud front data, but then also picking up all the PHP FPM engine X stats and then the logs we Really, we really hit this point where that we're happy with in our monitoring Where you can go from the top of the stack all the way down to the logs in sort of one dashboard and get this Kind of really nice overview of the health of your application And we have a great blog about it. I'm just going through them PHP. So Or more more specifically Alpine Linux. So We started with Debian images like Ubuntu Debian style container images, but they were just much too heavy much too big that which When you're looking at packaging shipping your application, you really want to trim it down to exactly what needs to run, right? so when you're Shipping an engine X image. You just want engine X and exactly what it needs, right? Or you want as close to as what engine X needs to be able to run. So there was this big as a part of the Native Wave Alpine Linux rows and popularity and kind of became a de facto standard for For your images so and What that meant for us is we kind of hit this bit of a bit of a challenge around okay Well, we want Alpine Linux But we also want to support multiple different versions and then so you kind of have this matrix of like Alpine versions and and then PHP versions you can support so we support like 7.4 8.0 8.1 8.2 I think and you know, we've we just run the gamut and then we bump our Alpine Linux version so The problem that we found was Alpine Linux only ships one version of PHP per Alpine Just like release right so so if you wanted like 8.1 You got to pick that specific release and that didn't really work for us So we've got a repository here We package our own PHP and then we keep up with the PHP release cycle so new version comes out Packaging gets updated our images get which are just below then get rebuilt nightly And then the new version gets picked up and it'll just flow through people to play their apps again Security patches are applied happy days cool So yes, that's that's where we've been and Then this is sort of what we're looking to next so and what where our focus is so a Lot of this talk has been about that journey that we've been on right and a big focus that we're on right now Is around stability like the ecosystem is stabilizing kubernetes as a project as Stabilized like the API's that we use are all 1.0 Essentially like they're all stable API's now. We're not chasing API changes in the kubernetes Release cycle as much anymore. So but then that all flows down to the tools that are built on top as well So that's a big focus for a try it now Security is a really big focus for us as well. So we've over the past sort of six months, I guess so the last half year we've Baked in the ability for static analysis tools to so after the applications packaged in CI We spit out a manifest you use a little bit of bash and then chuck it through Like trivy, which is an open source image scanning tool or tool of your choice And then you can scan those images and pick that up in pipeline. So there's any issues any major issues But then we're also looking into other like dynamic application security testing tools like stack or good things like that to To scan routinely scan Environments for for customers as well, but not only that This whole time we had a very strong focus on Drupal So and But that's not as much the case anymore. Well, I mean we still focus on Drupal a lot But there's all these kind of Additional services around around the outside of Drupal now. So so I mean Adam Bramley did a talk last year about Drupal and open search And that's all powered by this like the right-hand side of this of this diagram We have Drupal and then you route specific parts through a proxy to open search to then just go straight there and query from your decoupled applications Know that you might have like a very specific path. That's a completely separate application. That's not that's not Drupal And then knowing that you might have some stuff in the background That you can ping and talk to and then execute an operation against all these things Where we're slowly building out as new application types as we as we go So we've focused on Drupal now we've now we've got PHP and then we've got like a basic deployment type for these sort of extra services to test them out like node and go And things like that to kind of really dial in what what's the best way to do this? So so that's where that's where we're heading and On that note I'm a little bit for question EFS so AWS is EFS solution and then we use the we've got our EFS provisioner That we wrote so that automatically provisions EFS instances under the hood. Sorry. I'm standing the mic But we are migrating to like the official EFS CSI driver So while it looks stable looks stable on the outside zone We really just kind of all the bugs and memory leaks and things that we've seen in the issue queue Those are only really just subsided and gone into main. So yeah You pushed a branch. Yeah, cool. So So that CLI part runs in your CI CD of choice. So for us at circle CI or actions It could be whatever you want. That's a big reason why we did the CLI as well So and it's really boiled down into like three commands essentially skip a package Which under the hood is running a series of Docker build commands. It's all it's talking to the Docker API on that pipeline. Yep Do you see ICD? Yeah, yep, yep, you need you need a docket aim and running running in that pipeline, which is pretty common in in like I mean if you're rolling it yourself, it's a bit different. I guess But if it's like circle CI, it's like one line to get a docket aim and and access to one of those Yeah, or even get up actions. So, yeah Yeah, and that and then after that packaged and pushed and during that time talking to the API as well And it's pushing those images That's all done on CI and then you can go skip a deploy. So if you push to mainline A pretty common patterns to skip a deploy to dev with that version and then it Rolls out the upgrade wait, wait, wait stops and then you've got skipper exact which then you can run Like your drush commands or whatever steps you would normally run locally You can then run skipper exact as many times as you want or run one script in the remote environment That does the whole lot in a in a stepped out way like through a make file or a bash script or whatever you want in that remote environment and then executed against that and That's it. You've rolled out the environment in three steps No, no, no, it's it's definitely not it's actually I guess that's one challenge that we had was around how to how does somebody connect to a remote environment. So we have a The skipper exact command Is kind of a wrapper around SSH in some ways and we have an SSH server written and go And it actually spins up a new independence container or pod in Kubernetes with very similar configuration to your FPM instance Running a CLI image and that's your session for the whole for the whole Execution and then when when you exit your session it gets terminated and cleaned up Oh, yeah, you get so that and that kind of to pull at that thread a bit more than that also means that you have dedicated resourcing right So if somebody else comes in and runs SSH they get a new pod with X amount of resource and depending on how you allocate your resource for pods then that means Those they're not clashing with each other as much like you're not competing for memory Like somebody's running a massive batch process compared to somebody else who's running, you know Like just the deploy command to prod something like that Yeah, yeah, exactly they can exact or shell and get on to prod But then we do have our back controls over prod and non-prod environments and how that works. Yeah. Yeah Yeah Yeah, we we looked into it we did look into it. I'm trying to think what we looked into it for because oh Sorry, the question was have we considered canico for our image building and We did look into it because that was the docker Damon list kind of build build process There was also IMG, which was another one and the whole goal that these projects were trying to solve was do you really need a Docker Damon to build images like you really just packaging tab tab balls and you know with metadata on top and pushing them We ran Most of our issues at the time and why we didn't really pursue one of those tools was for multi Like multi arch builds and things like that for the most part. So as we're going down that path It's it's still a bit Still a bit tricky like I'd probably invest more time into like build acts and those kinds of tools As they as they're maturing because they're pretty pretty sophisticated tools. Yeah. Yeah, canico is still great