 So, hello Mesoscon, it's great to be back in this town, I was living here for a few years, two years ago and great to be at Mesoscon. So I'm Adam Sandor, I work for Container Solutions and Amsterdam based consultancy company specializing in helping other companies utilize cloud native technologies to deliver software better, faster, more resilient, all that stuff. And today's talk was partially inspired by my favorite video game called Doom, who here has played the old one from 19, oh wow, awesome crowd, great. The new one? Not that many, okay, do play the new one, it's fucking awesome. I was very surprised by how awesome it is. So, how does Doom come into a talk about microservices and cloud native technologies? Doom, the story of Doom is about scientists on Mars trying to solve Earth's energy crisis and they go as far as to open a portal to hell to harness the energy of hell as a clean source of energy. Turns out it's not all that clean and demons invade Mars and later Earth and all that stands before them. They practically wipe out everybody anyway, but our hero goes and the Doom guy just goes and kills them all anyway. So, companies who go and start adopting microservices. I'll talk a bit more why. Is that a good idea? But what happens many times is that they go for this new architectural style, go for utilizing the cloud, breaking up their applications, delivering faster, and they unleash a host of demons on themselves. And if they're not prepared, then those demons gonna cause them a lot of trouble at the very least. But what's also happening in the past few years, because a few years ago it was really like this. A lot of companies went down this road and really burned themselves because they burned proper tools around to manage the complexity introduced by a highly distributed architecture. But today we have new weapons that have been invented in the past few years, like Docker and container orchestration that really help us tackle running distributed systems. I would say they take it so far now that we should be architecting applications as microservices. There is very smart people saying monoliths are fine, don't overdo the whole microservice thing, and they're right. But the cost associated with running microservices previously is mitigated a lot by the fancy new tooling we have like Mezos for example. So, just a bit of definition of what do I mean really by microservices. I mean services where code is independent, so no commons library where 90% of the code resides. The code is largely independent, they are packaged as independent artifacts, and they are run in production as independent pieces managed by a container orchestrator of course. So not like this, not deployed to VMs, just a few VMs and many applications running on the same VM, because in this case the production system doesn't really know what you're really running, it just sees there is some VMs and there is some stuff running on them probably. And also not the old school monolithic thing. Of course we knew how to modularize applications before microservices. Microservices are not just about splitting code up, but so modularized code, but then packaged into a single artifact. Again, it will not provide the benefits that a microservice architecture does. So back, this is what I'm talking about. And why would we want to adopt a microservice architecture? Basically the most important thing is to leverage the elastic cloud resources that are available to us from mostly public cloud providers or internal clouds. So a monolithic application that only supports vertical scaling is not able to leverage this new type of base infrastructure that is available out there. And it's also not able to leverage new tooling that we have for more resilient deployments. I will talk about that later. But basically with microservices we want to increase development velocity, especially down the line when we have lots of services and we really want to deploy them independently to know that we are not breaking the ones that we are not deploying right now. With the monolith, you just can't do that because you have to always deploy everything. The other thing is resilience. So when we had just a few servers in the basement, resilience was not such a big deal. But now, of course, big companies were doing this before already. But for smaller and medium-sized companies, they rather ignore this problem. But now with cloud providers, everybody can build resilience into their application. Everybody can scale out. Everybody can run several instances over geographical regions or at least availability zones. And there is the elasticity to actually optimize your cloud build to only run so much compute, so many instances of things that you need and that you can scale out, scale in, etc. So this talk will take you down on the first episode of DOOM, a bit retitled the levels. And I will show all the demons, all the problems that companies run into and cloud native technologies help tackle. There is other challenges with microservices that I will not be talking about. I'm focusing on the things where the containers and orchestration and stuff actually help. So first, packaging. One of the things microservice, a microservice architecture allows and people love to utilize is having different types of code bases, different programming languages mixed together. But once you have, and we needed a proper packaging to support all these, we didn't have that before, like running a JAR file or a VAR file from Java, in production was completely different than running a Ruby application or a PHP application. They required different runtimes, different ways of deployment, etc. So along came Docker, which in my bit twisted analogy is the chainsaw with which you can kill all of them. Of course, you can kill all of them with all the other weapons too, but I really do love the chainsaw. And so basically, also it's a bit of a metaphor for Docker helping us like carve up that monolith, especially that big fat one on the right, the mancubus, that's the right for you, the left, yeah, whatever. So Docker came along and of course containers existed before Docker, but I'm just simplifying it to say, especially in the mesos world, containers were around for a while. But Docker is really the one that made it really easy to utilize containers and suddenly it became straightforward to package any kind of application and ship it in that package to production. That's one of the big improvements Docker made on our software development. The other is the Docker actually supports the whole software development lifecycle. So it's a build tool, it can run your tests, and it can push to a container registry to host your artifacts. We did have artifact registries before Java had a whole ecosystem for this, but that only worked for Java. We had build tools before, but you had to make sure to install whatever your application, your build pipeline needed on the Jenkins server or wherever. Now everything can be done with Docker in a nice and easy way, which is a big step forward in managing large application code bases with different potential different programming languages. Then our next thing is continuous delivery. Now basically the more application components you have, the more of these pipelines you will have to manage the code, tests, build, deploy to some test environment, deploy to production or another acceptance environment or whatever. To achieve good continuous delivery, all of these steps must work flawlessly and fast. Again, now we have Docker to support many of these steps and also container orchestrators, which I'll get around in a moment. The important thing is to manage a microservice application, you will have many of these pipelines for each of your components. You have to get them right, each of them. In this talk, I cannot go into details how to exactly do each of the steps. Actually, they are not that different in how we were doing things, but you have to get them right because the more applications, the more components you will have, the harder it becomes. There's also another important thing in this picture is you can see all of them have different versions at any point in time. So don't tie your components together. Have continuous delivery on each of the components because that's what will enable you to keep your velocity down the line when you have lots of components. I see companies doing microservices and then do big bank releases with all their microservices at the same time. That does help a bit with backwards compatibility and this kind of stuff, but it's way better to invest in the backwards compatibility and in making sure that services can talk to the older version of the other service rather than doing the big bank releases. Also, when you have like lots of components, just knowing that you're only deploying that one thing and not potentially breaking the whole application is a big plus. So continuous delivery was a very good idea before and is really essential if you're running microservice applications. And now got that whole build stuff out of the way. We're coming to the really interesting part, the scheduling. Of course, this is Mezoscon, so all of you know what a container scheduler is, but I would still hope that you find something interesting in what I'm going to say because what I try to approach this problem not from the point of view of how to run containers on nodes and stuff, but why is container scheduling and orchestration and all this stuff important for our application development? How does this really help? So resource allocation in a distributed environment, I put a really big monster, the cyber daemon up for that one because that's a really hard problem. Not just the resource allocation, but also other distributed problems like managing configurations and stuff. But actually for this is the problem that many have set out to tackle and actually we are really succeeding now. So we have our BFG, our big fucking gun in the form of, for example, Mezos to kill that son of a bitch and yeah, that's his really dead there. It's not much remaining. Well, it's kind of a problem where I think we're really solving very well, but let's get back to the basics. So what is container scheduling? Basically, once containers came around and became really easy to use, the funny thing was that you can run them on any Linux machine, but public clouds are not Linux machines. Public clouds are giant hypervisors. So you actually have to spin up machines to be able to run containers on top of them, which in a cloud environment is a bit of a hassle. Of course, if you're using physical hardware, it's a no brainer. That's why containers actually came out of companies who were trying to utilize their physical hardware like Twitter and Google. But still, you spin up a few nodes. So how the container scheduler works is there is one or more master nodes, but we can just logically call it like one master that manages a set of virtual machines. These virtual machines have their IP addresses and stuff. And once you want to run containers, the master will decide where is the best to run them. And these containers start up as Docker containers on the nodes wherever the master decides to put them. First problem, each of these containers needs a port as you can see there. You cannot trust your front-end containers to always run on port 3000 in this case, because if you run two of them on the same node, then they will collide. So you have to start managing ports, which is really not something. So this is the DevOps track. So I'll try to highlight this from the point of view of collaboration between developers and operations. First thing, this system is not enough, because developers delivering software don't want to care about port mapping stuff. And if you only have this, if you only have Mesos, they will have to care about the port mapping, they will have to build logic into the application to manage the ports and somehow find the services. So here is a short list of the most used container schedulers, Mesos, Kubernetes, MAWS, ECS, Nomad, Docker swarm. They all do a great job of managing your resources in a cluster of computers. But because of these networking port mapping problems and also the problem of how the hell to find your instances, let's say I want to talk to a front-end pod container, then I need to build a service discovery mechanism where a container comes up, registers itself like console or LCD, and then other containers must know that that service discovery mechanism exists, and they have to talk to it and get the current state of the cluster. So that's still not a proper microservice platform. How do we get from a scheduler to a microservice platform? Well, solving the networking problem. So it's for me, I come from an application development background into the world of operations, and it's really weird for me that these are hard problems, this whole networking thing and with the ports and everything, but it can be solved, but it's tricky. So first thing we do, we have to introduce a virtual network for our containers in the container cluster. So if we introduce a virtual network, suddenly all our containers can get their own IP addresses. So suddenly they can run on the port they wish to run on, so this problem has shifted from development to operations into the platform. Now, to make that happen with the virtual network is a tricky problem. It's pretty much solved now. We have overlay networks, and we have IP tables magic happening, at least I know more or less how it works in Kubernetes. It's all a lot of magic in like rewriting packet addresses in IP tables. It works. It does the job. We have IPs per container. I'm still a bit surprised that in DCOS this is still an extra feature you turn on, but it is there. Again, it works and you can wish away the whole port mapping problem. Of course, ports will still be mapped to certain node ports, but now it's all randomized, and basically as a developer you don't have to care about it. The next step is solving the service discovery problem. So actually, even before we started distributing all our applications, we had service discovery mechanisms even inside like single processes, like for example Java Enterprise Edition or Spring basically instantiate components and then make sure components can find each other, but it all happens inside one process space. So it's a whole whole different kind of logically the same problem, but of course the solution is much actually simpler also, but also very different. How do container orchestrators, and now I kind of distinguish between like the schedulers or just that the scheduling part is just placement and like orchestration is like all these extra features. I'm not sure if I'm using that term correctly, but I don't think anybody has ever defined it very well. So the container orchestrators DCOS does this in the form of whips, basically enable you to create something like a service that is a virtual IP address inside your virtual network, and if you talk to that virtual IP address your packets will get routed to the right backing containers. So now when backend wants to talk to the front end, actually because it's usually the other way around, but whatever, you get the point. So the backend can talk to 10.10.8.10, which is not the IP of any container, and the requests get routed using a load balancing mechanism to any front end container that happens to run. And however tricky it is to implement this whip mechanism, the scheduler, it was possible to build this because the scheduler already has all the information necessary for this. It knows where those front end containers run, so it's possible to, and once the virtual network has been added, now suddenly you can create virtual IPs in your virtual network and create something like a front end, a virtual front end service that happens to be backed by a number of containers running over some servers. Add a bit of DNS to that and call that IP address front end.internal, and suddenly you can even hard code the address of the front end into your backend containers because you no longer even have to configure that. You can just rely on the cluster DNS to be there. Let's say if it's an internal application and you kind of know that you will be deploying to that production environment that you always do, then you can even hard code front end.internal that that's the address of the front end and just let the orchestrator take care of it. So the service abstraction is essential to building a microservice platform. This takes away all the pain of discovering also load balancing from the developers to and moves it all into a platform managed by operations. You can see this evolution happening in the past few years because the best is to see like Spring, does anybody here know Spring Cloud, the Spring Cloud project? Okay, not many people. Spring Cloud really is still a project built on working around these limitations of public clouds where this functionality doesn't really exist. So there is client-side load balancing, client-side service discovery using something like console etc. This is all stuff how Netflix did their thing because they were running on Amazon and they didn't yet have all this container orchestration goodness available. So yeah, I already told that. Yeah, so which ones can we consider which orchestrators or container systems can we consider like true microservice platforms? I would say Kubernetes, DCOS, lesser degree Docker swarm has less features but they do have a lot of the stuff that is needed. So Amazon ECS doesn't do any of the networking stuff for example. Mesos in itself doesn't do it. So these are the levels of abstraction you're looking for. Unless you're doing some very specialized thing you should just go for DCOS. If you're using Mesos just go for DCOS because you don't want to burden developers with all that stuff. You want to provide a platform for them where they can just deploy. I'll get back to that point. I'll have a few slides about DevOps at the end. So next thing is of course you need to monitor your distributed applications. So to defeat the observability monster as I can tell in this talk. So monitoring is a very interesting case to show because, let's just go back here, because it was considered a big problem to monitor containers because virtual machines were monitored by an agent running there and monitoring a bunch of processes that were running on that agent. Of course the kernel and stuff and then application processes that you were interested in and then you configured which ones to monitor and that data was fed back to the monitoring system. But with containers you don't know where your processes will be running. So suddenly this was a big problem that any process you're interested in can just start up on any one of the nodes. It turned out not to be such a hard problem because there is a very interesting thing with container orchestrators. They all have very nice API and they have tons of metadata inside them about your application. So suddenly one of your monitoring tools is the orchestration tool itself. You can just go and ask a lot of data about your application from it and what the monitoring tools did, so here is a picture illustrating that that's this USUI of course showing some data about one of your services. And then what the monitoring solutions did, here are some examples of cloud native monitoring solutions that can call themselves cloud native because they really can monitor natively applications running on container orchestration clusters. So like Prometheus, Techdriver, SISD, Codescale, Datadog, there is probably more of them that I didn't mention but there's just as much space on the slide. Actually monitoring has really caught up with the world of containers. So what they do, this is an example from Codescale, is they actually monitoring has become easier with container orchestration because the monitoring tools can just talk to the orchestrator, ask all about your application and high level stuff like services, not just like this random named container running somewhere, but because the orchestrators now have this service abstraction which was originally designed to solve networking and service discovery problems. Now the monitoring tools can just use that data and know, okay so you have a front-end service with five backing containers, so how much memory is your front-end service using, well is the sum of the memory usage of all the containers that happen to be grouped under the front-end service. And they can grab this metadata so you don't even need to configure this stuff and I could create a dashboard like this in like five minutes in Codescale after it read the metadata of my cluster. So you can see there is a spike in CPU usage of the front-end service and here is a spike in number of containers of the front-end service. The labels are a bit messed up but never mind that. So that's like an autoscaling event on the front-end service when it's hit by load and you can see that this all just makes sense to also not just an operations person but a developer of the application who thinks in his codebase, he thinks in the codebase that I've developed this front-end thing and I don't even care that much that there is five or ten of them running, how much memory is it using, okay it's using a lot of CPU and aggregate of that is very useful. And I'm not saying that monitoring individual containers and VMs even is just not necessary anymore but you can see how this is closer to a high level overview of your system and also how easy it was to implement this. And how the monitoring solutions did this with using the metadata from the orchestrator is also a great example of how you can build your own tooling in a much more easier way simply because the tooling you're using on your base infrastructure has a higher level of abstraction and understands what you are running on it, what your application is. Actually just a side note on that so one of the important things about Docker I forgot to mention is that it's packaging individual application components. No longer do you package like a bunch of things like on a VM but it's always one component so once you run it like this the orchestrator understands it. Yeah log aggregation, super important. Not that much has changed around log aggregation. Again you might want to aggregate logs of all the containers of a service. This is from Google Stackdriver. I really like this one because they did this statistical analysis of your logs so like they find what is the really interesting part in your logs and just show like that error has been happening really a lot of times maybe you should take a look at that. That's really cool and then there is alerting which I mainly it's also super important but I mainly mention it because of this screenshot from the new Doom game where they actually like put in an error message into their system saying demonic invasion in progress. You know you're doing the wrong things once you have to put in or you're really well prepared. So getting towards the end of the talk we are going down to the depths of hell and that's cooperating between people because hell is really other people and that's what DevOps is about trying to solve. Yeah these two guys end up shooting each other a lot instead of shooting you which is not a very good idea for them because then you just shoot both of them. It's a beautiful tale of how people should cooperate. So how do these cloud native technologies do they help with DevOps? Of course DevOps is mostly about cultural change, about not building silos in organizations, about people who know how to run things in production, being friends with the people who know how to write things and maybe being the same people and just having these these full stack engineers who know everything which works up to a level but of course specialization doesn't happen just for no good reason. There is no people who can really keep everything in their head and learn all the technologies on both sides. So development and operations as two separate things up to a certain degree is a reality and probably will keep to be a reality. But what I've seen as obstacles to implementing a proper DevOps culture at companies is a lot of times the tooling and there is people screaming on Twitter about it's not the tooling it's really it's like coaching and people and being nice to each other and they are right up to a degree but I've seen initiatives at companies when the new DevOps initiative is in and then all the operations people get in the same room with the engineers and the engineers show them their Java code and the operations show their 5,000 line puppet scripting and then they look at each other and shake hands and decide this kind is probably not going to go anywhere but let's just have more of these meetings and we'll see. So container orchestration as I said it's really a brings a higher level platform to your applications kind of like a pass so passes are good but they're also super limiting in what you can do with them. So container orchestration or sit between passes and just puppet installed VMs in a nice sweet spot where you can do a lot of things your way but they give you still a high level interface. So here is an example of a manifest file from a Kubernetes application. This describes to Kubernetes how it should run your front end. Maybe I should come with something and just front end back end so generic but anyways. So you can see maybe I can get the pointer over there to show some stuff if I find out which okay no this is super hard this this is not good. Anyway so the important part here is you describe a service you want to run you say how many instances it will have you could define auto scaling for it you could you save each Docker image to run of course that's the most important part you describe some ports that need to be exposed you describe where the health check endpoint of your application is you specify some environment variables like what log level is getting configured so what's interesting about this is there is barely anything in here that an application developer can honestly say that he doesn't he or she doesn't care about it right this is all stuff that really describe that that actually developer should be writing down and it's formal and it's executable straight on the on the production or test or whatever environment so it means that developers can get their stuff into production without asking operations to now write some scripting for it or whatever like that so suddenly this tension a lot of the tension that happens in organizations in the DevOps part is that developers want to push and operations are like because they have to do this and that to get the new release out and whatever and but once your your application platform supports this nice high level of abstraction where where developers can actually like for in a formal way describe their application and just get out a new release without operations getting involved at all that's great that really brings a new level of cooperation because then operations can start focusing on providing new and new services to developers and on coaching developers on how to write a good manifest file and maybe don't translate instances because you only get two requests per second so that's that's I think a very important shift in how we can do DevOps there is still all the coaching and the culture and all that stuff companies do end up with Kubernetes or DCOS and still in a trench war between operations and development if they do if they don't take a close look at the culture and they don't make people work together and don't send the developers on the Kubernetes training for example in this case because then the developers will not want to write this so there is a lot of people things that need to be done but they are all like doable and not really having to understand like the the the puppet scripting and that kind of stuff yeah last slide if you want to know more about cloud native go to the website of container solutions I probably should have put a URL in here but I'm sure you'll be able to find it and we have a new book out by Ann Curry worked with us on it it's called the cloud native attitude and kind of focuses on exactly the same stuff I was talking about here and how to really think about cloud native not the not the specifics there is case studies of companies and there is just just the the mindset you should have when when you think about implementing these technologies at your organization and Q&A thank you any questions it's all very very clear or all very boring sorry lunch yeah thank you very much