 Well, I guess it's about that time, so should get started. Thanks for coming. My name is Richard Marshall. I am the lead platform architect at IAC Publishing Labs. And today I'm going to talk about our container story. And it's a story that's not of building brand new things in a new, beautiful, green environment, but rather building things intertwined with the old and adapting existing applications to the new model. In other words, our brownfield story. So first I'll talk a bit about what IAC Publishing Labs or ICPL is, where we've come from. And then it's sort of the story time of the last three years that have led up to our first deployment of containers in production. A bit about where things are headed. And finally, some of the things that we've learned along the ways and the challenges that forced us to learn them. So some context. ICPL, we're an operating company for a variety of online brands. As the name suggests, we're part of the greater IAC family with sister companies like about.com, Vimeo, Home Advisor, among a number of others. Though we ourselves operate a fairly sizable amount of brands, for the context of this talk, the most important one is Ask.com. As Ask has a long history over 20 years at this point. And while it may no longer be a corporate entity of its own with ICPL taking on that role, its legacy is still alive and strong. And we might not really even be on the map when it comes to scale in comparison to the big fish in the industry. We still have a fairly substantial footprint in data centers around the world with hundreds of physical systems, thousands of VMs, all to run a fairly large collection of applications for a variety of properties. And like many other businesses that have been around as long as we have, we went through the process of transitioning from a fully physical environment into one predominantly based on VMs. And in our case, that happened about six years ago. And now we are starting down the road towards containers and starting that transition process all over again. About three years ago, in the early end of 2014, the first glimmers of interest of the container concept started to emerge within the Ask development organizations. And it mostly came in the form of a desire for an in-house platform like Heroku or Google App Engine or something along those lines. And at the time, we decided to look into the pre-docker OpenShift origin. And it seemed to fit the bill for most of what our users wanted. We spun up a pilot environment, tested things. It went very well actually. It was stable. It did everything it said it was going to do. However, because this was kind of one of those initiatives that was driven more on the interest in the technology and less of a actual business driver, we kind of ended up at a bit of an impasse where the dev org wouldn't buy into the process of working towards putting real applications on this until operations gave them a timeline for when we would be able to go to production. And ops reciprocally wouldn't do that until dev bought into it, because there's a lot of time investment in actually doing that. And so that kind of catch-22 lingered for a while. And eventually, we just let the pilot environment rot in place. And it's still there. There's still a few internal services running on it. Amazingly, they never need any babysitting. But once you stop touching things, they often stop breaking. So it was close to another year before things really started to get interesting for us when it came to containers. It was midway through 2015. The Docker buzz was literally everywhere. You go to any conference, any meetup, mailing lists, wherever. Everyone was talking about Docker. Regardless of your stance on it, you couldn't ignore it. Combined with that, what really triggered the renewal and interest from our development teams was the advent of the open-source container orchestration systems. So things like the announcement of Kubernetes and Mesos with its marathon and things like that. It really did feel at the time. Well, so now it's not exactly as if there is a consensus on that end. But it did at the time seem like there was a new orchestration system being announced every week. And so we spent maybe a good month or so evaluating the options that were in play and looking at the maturity levels of those various projects, what use cases they were targeting. And while not strictly speaking a container orchestration system, we decided to do our initial pilot using CoreOS's fleet. That mostly was due to, at the time, our perceived view of its simplicity and our existing knowledge base when it came to managing services with SystemD. However, as we tried to scale that up beyond our first development team that was making use of it, the limitations really started to show. The lack of namespacing, user tracking, and various other things really made us realize that we were forcing a use case on it that just didn't make any sense. And so in our initial evaluation of all the options, we had seen a lot of promise in Kubernetes. And we had at the time decided to give it some more time to mature before we really dove into it. The problems we ran into with our usage of fleet kind of shortened that period of letting it bake. And so we built out a Kubernetes environment. And fortunately enough, the migration or the transition period for our early guinea pigs was actually incredibly easy. And the adoption of that early POC platform ramped up pretty quickly. We got a bunch of different dev teams on there, all trying out different things, building out there some internal tooling on top of it that was not really important. But if it went down at four in the morning on a Saturday, who cares, those sorts of things that nobody actually cares about when it comes to the business criticality of it. And in parallel to that work, we started trying to use Docker in addition to the orchestrated container environment. We wanted to really start to get our feet wet with operating Docker on its own. So using it more as a deployment and packaging tool and focusing less on how containers could change everything, but rather how they could simplify and add some efficiencies to our existing CI and deployment processes. And out of that came our first production deployment of Docker in maybe not exactly. We deployed our internal asset management and commissioning system on top of it. And while it may be an important system for us and is critical to our day-to-day operations, it's not something that has any direct impact to business revenue or user experience if it went offline. So maybe not production in that sense. And just as that year was wrapping up is when the announcement of the restructuring of our business occurred. And so that was when the formation of ICPL happened. We all got brand new email addresses. And we got a little bit of a change in focus as far as our business is concerned. And that led into the early end of 2016, which did bring a lot of change in our business. But for the most part, it really didn't change the focus of this initiative that much. It was mostly kind of a rethink about what our primary objectives are as a business. And things did stall a little bit when it came to the project after the announcements in December, which wasn't really too surprising. But it didn't take that long, a month or two later. We were back up and running with our Kubernetes pilot and back on track to grow that environment. By the end of the first half of the year, pretty much every single dev team in ICPL was making use of Kubernetes in some degree. The majority of them using them for spinning up testing environments to do CI against as part of their CI flow, some building small tools, various other internal things. But the further we got with that pilot, the more it became apparent that to have any reasonable timelines for getting to production, we would need some sort of on-ramp that didn't actually include all of the complexities at once. And so to build off of the work that we had done in the previous year with deploying Just Docker with our commissioning system, we piggybacked on another initiative from our production engineering group to improve the automation around our general deployment flow. And essentially just, as I said before, we use Docker as a packaging tool just to get the artifacts onto the system and wrap up the various processes that were already on the same sets of VMs. Just wrap them up in the Docker image, deploy them. Maybe it's not the full container dream, but it's a step in the right direction. And it gave us a much shorter ramp to actually get there. Something that we could see being done with before the end of the year. And during all of that, one of our front-end teams was working on a system to build rapid prototypes for brand new sites that we needed to launch. And this coupled with the refocus of the business, that was kind of the new driver, is let's try lots of new things, spin up new sites, see how they work, how well they monetize. And this was something that needed to be deployed very quickly with very little interface with operations. And so it seemed like a perfect fit for getting started with trying this out in Docker. And it would give us in operations a real chance to operate systems with Docker with live traffic on them without any real significant risk to the business. And fortunately, that was a very successful thing. We didn't run into any real problems that blocked the project. We learned a lot along the way. We ran into some stuff I'll talk about later in the lessons learned portion of the talk. But it really lit a fire under the whole, the entire containerization effort as a whole. And because up until that point, everything had really just been an experiment. We hadn't done anything concrete that was impactful to the business and not just some toy that we were trying out. And so it was also the time to take what was a very small focused team that was building out this POC and working with a small number of dev teams and expand that to the entire technology organization. And so it was taking something that a meeting with three or four people could really iron out what we wanted to do next and figure out how to introduce six or seven different teams into that process and how to train them and how to figure out what knowledge gaps we had. And all of that to not only the knowledge issue, the learning curve issues, but also what are we missing to actually make a production environment a reality? Do we need a production grade internal registry? Do we need a CI and build system that has Docker support natively as part of it? And the list went on and on and on. And during all of that, the Kubernetes build or the Kubernetes environment continued. And it was an important part of our, became more and more of an important part of our CI flow for many teams. And we also kind of came to the conclusion that to get Kubernetes to the point where we felt we could have it in production by the end of the year, we weren't gonna be able to completely roll the entire thing ourselves because we needed to divert a lot of our time that we were working on that to the Docker on VM ramp to get into production as of last week. And that was when the decision was made to shift gears and start using the new Kubernetes based OpenShift Origin. And so that decision did kind of upend a lot of what we were doing and required some sort of rethinking of how we were gonna make that happen. But so far it hasn't been a serious blocker to any of our development users who are building out their test systems and early applications on the platform. So far we've only run into a few problems with the differences between the Kubernetes exposed by OpenShift and the bare Kubernetes that we were running before. So as of right now, last week, we launched our first front end production service on Docker serving about 10 million requests per day. As of tomorrow, we will finish deploying the rest of that service and hopefully that will jump that figure up to about 40 million requests per day, which is still not a significant portion of our overall traffic, but it's not nothing. That's a, we're still serving, well, I can't remember, the like 30 requests per second per node on our Docker based applications. And the OpenShift environment build out is ongoing. It's been a very interesting process to bring together all of the teams across operations and really just dig into doing that. Because before this point, it was kind of the convincing process of, hey, we're really trying to make this happen. Let's get on board and help make this happen, working with network engineering, working with release engineering, working with all of the different ops teams, dev teams, and with the shift with the, or not really mandate, but the business initiative behind getting OpenShift going. It really enabled us to just say, all right, we're gonna do this. And when we did that, it was quite a change in how we made forward progress. I'll talk about what led us to that change in a minute. So looking forward, we're gonna containerize all the things, well, not really. The goal is to containerize all of our front end applications by the end of the year. Some of those maybe running in production on OpenShift also by the end of the year, perhaps, we'll see. And we've come a long way, but we are by no means anywhere near the finish line. Not that there really ever is a finish line in this industry, but even on just this container journey, we're not close to done. But we did make it to that big milestone of serving live customer traffic on containers, which is pretty fun. So to get where we are now, we ran into a lot of issues. We had a bunch of speed bumps, ran into various challenges. And a lot of that came from the fact that we just didn't have a business driver behind this. There was a lot of technologists in our company that really wanted to try this stuff out. There were perceived efficiencies that we could be gaining by making this transition. But as far as a overall business incentive to deprioritize other things to be able to make the time to do this, it just wasn't there. And so with the amount of unknowns that we had, it's like how can we justify letting the other things slip? And given all of the concerns about the relative security of containers, it might seem sort of funny that our security initiative is what catalyzed actually having a business incentive to invest time into containerizing applications. Because the main thing that sort of set that fire was an incentive from our security organization or an initiative from our security or mandate from our security organization to improve the sort of latency between a required security patch for our front-end applications and a deployment of that. And part of the delay in that process has been historically within our company the time it takes to validate everything from the application all the way down into the OS. And so there was touch points across numerous operations teams every time a front-end application needed a library change that had a security patch. And that time to market of getting that deployed was considered unacceptably long. And given the sort of split in dependency management with containers, it was able to reduce the touch points between different teams. The infrastructure team that manages the OS and all of the bits inside of there didn't have to be involved in the QA and LMP process for a new deployment of a security fix for a front-end application anymore. And so that is really what gave us the ability to have that incentive from upper management to say, yes, you are going to prioritize this work. And that really enabled us to make forward progress and get to production where we are now. Would we have gotten there without that eventually? You know, there was enough interest and enough desire from developers and folks in operations to eventually just have it happen organically. But it would have been a significantly longer journey. And, you know, it's challenging. And honestly, the learning curve was probably the most challenging thing that we had to overcome before we, you know, in the months and year leading up to our first production deployment. There's a very big learning curve involved in all of this. You know, it's a completely new way of doing things for a lot of different teams. And we had some teams that were very self-motivated for this. You know, the front-end application teams were the most excited, the most desirous of having these sorts of platforms available to them. But not everyone had that same self-motivation and self-starting aspects to try and actually make it happen on their own. Or they just didn't have the time. And as things started to move beyond the just being a pilot project earlier this year, the importance of that training for Docker, Kubernetes, our expanded usage of CI systems with Docker support like GitLab CI, the importance of that exploded. And at the time, there still wasn't a company-wide, you know, mandate that this needed to happen. But we knew that something was coming and we were gonna need to get ready for production at some point. And so we really needed to try and figure out how to focus the training for different types of teams. You know, not everyone needs to know all the things about all of this stuff. You know, the dev teams and the RE groups and the performance and LNP teams. They need to be focused more on the build and image management side of things and how it might fit into the CI pipeline or the performance pipeline, testing pipeline. Our production engineering organization needed to be more focused on actual management of application runtime state and debugging running containers. While the infrastructure teams needed to be more focused on the actual Docker engine and how it interplays with the host OS and the kernel and the networking stack. And so that means that the training kind of needed to be tailored for each different team and the communication around that also somewhat needed to be handled with a lot of care because the fear of the unknown is a very human trait. And when a lot of companies are talking about how they, oh, we were able to get rid of all of our ops teams because we moved to containers, which is somewhat of a silly statement, but it's something that people have a fear around. And so when you're making these pitches to teams that have heard things about, oh, well, in the post-container world, there's no need for release engineers anymore. The communication that you use to talk to teams has to be very carefully constructed. Especially for us, it's like, we need all of these teams on board. And if any one of them is feeling concerned about their job security, that's a problem. We needed to figure out how to carefully manage those expectations of where things are going and how to ensure people that their jobs aren't going away, because that's absolutely not what we're trying to do. And so as far as training is concerned, most of the training is kind of ad hoc, trying people reading the documentation themselves. A couple teams did actually pay for online training courses from sources like Udemy and things like that. But what really paid off for us was ensuring that there was a point contact for every single team that could speak for that team in regards to how things were progressing for them, instead of us having this like giant meeting with 60 people in it and saying, oh, how's your training going? We're trying to train everyone in one go on stuff that they might not actually care that much about. Another thing that's easy to get caught up in is doing all the things right now because we really wanna do all the cool stuff. And this did bite us a little bit. We really got caught up in trying out lots of new things, both in our how we deployed the hosts, how we maybe let's move away from our VMs and run this on bare metal. Let's try out a brand new distribution that none of our operations team have any experience with. Let's ditch the ability to use our configuration management systems because our host OS doesn't actually allow you to install anything in it. And don't get me wrong. I really like CoreOS, which is what we were building a lot of our early platform on. But at the end of the day, having that level or that new aspect of things in play was just one more thing that wasn't really necessary. And so as the farther we got along with this, the more we realized we just need to try and take every single thing and really question whether or not we need it right now. And so at that point is kind of one we sort of made the transition of like, okay, let's stop with this physical migration plan because all of our physical machines are really more geared towards building hypervisors than the configuration that kind of makes a little more sense for containers. We're not gonna buy a whole bunch of new hardware that's optimized for that. And so that's kind of where we ran into some hurdles. We stumbled with that for a while, but eventually we transitioned back to our private cloud environment. We built things on top of CentOS, which we have a lot of experience with and really just tried to pare it out on complexity. And another big thing, this is all very new. Containers themselves as a concept aren't really that new, but a lot of the software we're making use of right now and how it interacts with the Linux kernel is fairly new. And this isn't a criticism on the part of projects like Docker or Kubernetes, it's just when you use new things that are interacting with things in brand new ways, you're gonna run into unexpected scenarios and we ran into a lot of them and we're still running into them. For example, in the period between Docker one, from moving from Docker one six up into one seven and all the way through one nine, we had pretty significant blocking bugs in every single release. Some of them were interactions with the version of the kernel we were running. Some of them were just various bugs that we like ran into with image pulling, network access, all sorts of other things. We got through it and we did run into a few of those situations where upgrading from one version to another version introduced something new and you get into that, well, which one is worse? Do we downgrade? Do we keep rolling forward? And a lot of this is stabilizing and becoming less and less of a problem, but you still need people that are gonna be able to dig into this and figure this stuff out. It's not something that some, not every company's gonna be able to just install it and hope then, well, it's gonna work out the box. We also ran into a lot of problems with buggy hardware causing problems for at CD. And at CD is incredibly important for things like Kubernetes and Fleet because when it starts to go haywire, everything else falls apart too. We also very recently ran into some device mapper bugs. So things aren't going, bugs aren't something that's going away. You know, you're gonna have, you're gonna run into this stuff. You know, last week we ran into a bug with device mapper, a lock ordering issue causing a deadlock when Docker was trying to delete containers or destroy containers and that was fun, but fortunately, CentOS had already fixed that. We just hadn't rebooted our nodes yet. And another fun thing that we ran into was it's pretty important to read the manual for stuff. When we expanded our deployment of Docker on VMs, we ran into issues with our actual physical hypervisor configuration, breaking Docker. It might seem a little bit weird that the underlying configuration of the hypervisor would break Docker running in a VM on top of the hypervisor, but it turns out when we deployed our new 10 gig hypervisors with 10 gig interfaces, we didn't read the manual that that new driver enabled large receive offloading by default, which is a TCP optimization that if we had read the manual before putting these things into production, we'd have known that it was a enabled by default and be completely is breaking to forwarding of packets. Amazingly, we didn't actually run into any problems with the single layer of bridging with just the VMs, but once we added that second layer within Docker, all TCP traffic didn't work. And initially that was pretty weird because it was only on some VMs and it took a little while to figure out, oh, it's only on the 10 gig hypervisors. Oh, and okay, that's fun. Took a lot of debugging to figure out what the heck was going on there, but moral of that story is always read the manual. So yeah, questions. What was that second part? So yeah, the first question of why do we choose CoreOS? And what else did we look at? So when we first went that route, we looked at, looked at just running on our existing CentOS infrastructure. We also looked at the atomic project from Red Hat and RancherOS. We decided to go with CoreOS mostly because of the release cadence. With the early, at that point, Docker was a very fast paced project and we wanted to be able to keep up with it at that early stage in the process. And we also just had, we also found the fleet system to be interesting and wanted to, and sort of decided to go with that as our first platform. See any other important parts of that? Yeah, I mean, it may not have been easy to do, but it really wasn't that hard when it was small scale with a couple of people that were focused on it. When it became problematic for us, when it came to CoreOS, wasn't that there was problems with CoreOS. It was just us adding that as an additional layer of overhead when it came to learning or training up all of the teams on it. As far as what are the gains versus all of the pain that we have gone through at this point? Let's see. The big thing that's really come from the development teams that have been making the most forward progress on this have really in, they see the value as giving them the ability to have a lot more control over what goes into the environment in which their application runs because traditionally the application environment was very gated by operations. We had one or two flavors of the host environment that you could run your application in. If that wasn't ideal for your specific application use case, you would work with the release engineering team to build out chef cookbooks that installed the platform stuff that you needed to make your application work. And there were inefficiencies with that. And so from the developer's perspective, getting that level of control over being able to, oh, I want to build my, it's a bunch who makes more sense for install the dependencies of my application or Santos makes more sense or Debian makes more sense or whatever it is. That was seen as being very valuable and still holds true for most of our dev teams that are making use of this. Additionally, when it comes to the orchestration side of things like the Kubernetes environment, the ability to spin up environments, run tests against them and tear them down very quickly is something that we haven't had in our VM infrastructure. We have APIs for all of that, but they're slow. Spinning up a new VM in our VM infrastructure takes 10, 15 minutes. Spinning up a new container or the collection of containers in our Kubernetes environment is however long it takes for your Docker image to download, which is fairly quick unless you're doing something silly. And so that flexibility of being able to spin up testing environments based on Git commits, we'll actually tie that into a per commit based CI flow was something that was seen as a very valuable thing to developers. And that's what's really driving a lot of the adoption of Kubernetes within our company. And it might seem kind of silly as far as like a win, but the relative being able to do something that's kind of on the bleeding edge and doing something new is very exciting for a lot of people across our organization. And that brings sort of level of excitement and energy that is good when you have an environment that has had a status quo for a very long time and bringing in that sort of big change can kind of revitalize a lot of forward progress on making things better. Sure, so why did we transition from bear Kubernetes over to OpenShift? So part of that came from challenges that we ran into with managing this sort of the security components of Kubernetes on its own. The integrating user management, access control, security isolation was something that was, we were finding challenging. And the sort of set of checkboxes that OpenShift checked off when it came to like our security auditors looked really nice when it came to being able to go to our security folks and be like, hey, this will give us mostly the same thing we're trying to achieve with Kubernetes on its own. There's limitations to that, but for the most part, it'll get us what we want. And it also checks off all these boxes that you are really driving us to be able to achieve in a timeframe that we can actually manage. Was it all hugs and puppies making that transition? No, there was components of the upstream Kubernetes that developers were making use of that had no analogy in OpenShift yet. People were making use of the jobs system, people were making use of various other newer components within like the extensions APIs that just aren't in OpenShift yet. And for the most part, that hasn't actually been a big deal. People making use of the deployment configs within Kubernetes was effectively a search and replace operation on some manifest files. But was it easy to do that? Deploying OpenShift is not without its level of effort. Yes, there's an installer, but it's not a no-touch installer. And it was another challenge that we're just working through that has enabled us to make forward progress because of the security initiatives that we have. And that sort of business driver is what's really enabled us to make a lot of this progress. And because of the sort of checkboxes, the bits that OpenShift checks off on that is what allowed us to make that forward progress. Anything else? Cool, thank you for coming.