 So, I've noticed that most talks you see about microservices tend to be focused on why you should switch, how to break up the monolith, or counter presentations defending monolithic designs, which are wrong. I've not any scientific and empirical research on this, it's just an observation. Now, I've learned quite a few lessons over the years of doing this sort of work. We've tried various companies, I've been at many things that did not work out, and my thinking of around this approach has evolved along the way. And I would like to share some of those observations, specifically what it's like to work in an environment. Now, I assume most people here have heard the term microservices, but I'm curious if anyone here is actually also working in an environment like that. Anyone? Yeah, cool. Anybody's company thinking about moving to one? Yeah, I know. And for those of you who don't raise their hands, let's go over this super quickly. So, just quickly, the promise of microservices, so you get to take very difficult, I've used this slide a few times and I'm the only one, so thank you all, borrow wherever you are. You get to take this very difficult and complicated monolithic application and convert it into much smaller systems, each of which has their own set of complications. Now, in theory, this provides many advantages. I have reduced my coupling. I have forced programmatic contact boundaries that make my team talk to each other. I can scale into things very efficiently. I can move quickly in terms of features once I'm up and running. Now, while this pattern does have some attractive features and make certain aspects easier, microservices come with their own sets of difficulties. Now, if you're working with microservices, you may have just joined a team, or joined a team that's already doing it. You are going, it's gonna seem a bit overwhelming at first. You may feel alone, you may wonder what horrible things you've done to be in this position with all these little complicated code bases and you can't run anything, can't run the entire application on your laptop at any one point, it can be daunting. Or, if you're in a group that's frustrated with their monolith and are planning on splitting up like these people over here, be warned that it will not be easy unless you are a brand new startup. You likely didn't start from the beginning with microservices. Now, I like this image, because it looks like the wrecking ball is caught in the building, like the wires are like pulling up, which I think is an apt comparison, because when starting the monolithic breakdown, you're going to get caught up or stuck in the process. Now, I should say that you should, if anyone's wondering, like, oh man, I wish we had not done the monolith, I wish we had started the microservices. No, you really, if I have time, we'll get to this later, but you absolutely do want to start with the monolith because you don't, having the monolith allows you to identify where your proper functional boundaries are, which are the ideal place to start cutting away and making microservices. And when you're starting out, you don't know where those are. So, either way, the purpose of this talk is to instead provide you with a checklist or a survival guide or things to look out for. And I think these are all useful regardless of you're a manager, you're just joining the team, do not be afraid to ask these questions in your group, even if you're most junior person on your team and if you are the most junior person on your team, you're going to look like a genius. For those of you who are working within a microservice environment, you may know these things already, but follow along with this list and hopefully you can, hopefully you agree with what I'm saying here and if you think there's anything that I've missed, let me know. I would love to add this to the list. Now, I should point out before I get going that this is mostly from my perspective, working from small to medium sized companies. If you work for something like, if you're from a company like Netflix, you're going to have entirely different sets of challenges than I've run into. I should also point out that I only have a half an hour. I've done this before in an hour, so I've cut out a lot of things and this talk is purposely bad. So normally when you give talks, you're supposed to be slow and articulate and look at the crowd. I'm going to go as fast as I possibly can. There are many little things about bam, bam, bam, bam. So get ready. Here we go. First, I've broken this talk up into four different sections, infrastructure, software architecture, team communication and some miscellaneous points. Now this being DevOps, most of the points I'm going to raise and the stuff that I did not cut out is all in this section. Now there's a reason why I start with infrastructure because what may not be immediately obvious when you adopt a microservices is that, yeah, my little independent functional components tend to be very easy to reason about, but I trade that simplicity for a vastly increased amount of complexity in the general infrastructure. Application environments, deployments, build pipelines, you name it. Many talks you'll see focus explicitly on different subjects related to infrastructure for the microservices. Me, gee, here's how we monitor our stuff. Here's how we do deployment pipelines. In other words, your team will be spending a non-zero amount of time doing DevOps and infrastructure work that you wouldn't have to necessarily worry about in a monolithic environment. So first up, how do we manage the logs? Basically centralized logging should be your number one priority. They are absolutely critical. And basically if you have to SSH into a box at all ever to look at the log, to look at our log, then you have failed and you're doing things wrong. Now for us, and if you're not sure exactly where to start, here's what we did in my last job. This worked out pretty well for us. So one platform that I recommend that I have experience with is the IlkStack, which stands for Elastic Search, LogStash, and Kibana. LogStash is the log aggregator, Elastic Search is the repository. We store these things in Kibana as a visualization tool. You can set up alerts and triggers for various events, various log levels you might see, et cetera, it's all very nice and very open source. Now our diagram is kind of showing our first attempt at this. We had many more than five services and many individual instances of them. But basically what was happening is that there was an asynchronous log process that was basically reading the logs as they repented to on each individual service and shipping them to a centralized component known as LogStash. They would take those logs and then push them into Elastic Search and then of course I'm missing the Kibana image and we use Kibana for visualizations. However, we were a small team at the time and managing the Elk cluster can take, requires some focus that we didn't, we're unable to spare it as an individual, we couldn't spare engineering time to focus on it. You have to tune it periodically, set up index rotations or deletions. We will inevitably fill up the disk and crush it. So we were realizing that we were spending more of our time sort of focused on that and we wanted to get some features done. So we started to look for a third party service to help us out. We went with a tool called Logly as they were the most cost effective for us. The switch was pretty painless. You can set up LogStash to forward to both Elastic Search and Logly. So we ran like that for a while until we realized that Logly was doing a really good job and we dropped Elastic Search altogether. So, and now it's Logly solely. Now this is useful for us as a small team. If you're a larger company, you may want to take on that responsibility and not pay a third party services as a dependency on your logging visualization. Next up, what about metrics and telemetry? Do you have any metrics? You have any telemetry. It's absolutely essential that you extract as much data out of your system as possible. Many of you are there and probably saying, well, yeah, of course. However, it can be easy thing to ignore for a while. I've seen this plenty of times. You start in the startup, you start building. Some of the secondary off stuff is sort of falls by the wayside as your CEO is pushing on you, pushing your features and you don't have time to sell that up. Now there are third party services like AppDynamics, Datadog and Spawn can help make sense of these numbers. There's many services and tools out there that can help you with this sort of telemetry. It's kind of staggering how many companies out there will actually help you with this. Do a quick search when you have a chance. It's pretty funny. Also pretty funny, the AppDynamics and the Splunk logo are basically the same. I don't know which one is which. Of these, I'd suggest Datadog. Several people I respect swear by it. My last job, we had to set up our own metric system and we were looking towards switching to Datadog. We kept planning, kept pushing it off. There's sales guy kept calling and hounding us. Hey, to switch. The new job I just started and I hope there's a Datadog representative in this room right now because they are literally on the floor above us. That's so fun. Anyway, if you're like us and you like to host it yourself, so there's a few free tools out there. The world of metrics is an interesting domain. There's different approaches from this. Different approaches to the transmission. You have some metrics utilities that will pull or push and others that require some sort of aggregator to pull you. Different visualization technologies to sit on top. I'm a huge fan of this tool called Grafana, which is a visualization tool that allows you to sit on different types of aggregation systems. Designed specifically around collecting, storing, alerting on metrics and a variety of different aggregation sources. When I was interviewing at the current job I have now, I walked in and they had a wall of nine huge screen monitors with all different Grafana boards and every team had their own TV with Grafana board set up and I was like, that's it. I don't need to go anywhere else. They're doing it exactly right. Now before I go any further, I should probably stop for a second. I'm only two recommendations in this presentation and I just want to point out something we probably shouldn't go on so over. First off, centralized logs and metrics are important in the monolithic environment as well, but they become even more interesting in the distributed system. Now I've just talked about adopting, I've also just talked about adopting two entirely new concepts that aren't simply catchphrases or library you can drop it. These are entire systems that you have to manage. Obviously you want to pay for it, SaaS service to do it for you, but even still there are considerations. Basically, your infrastructure needs to scale along with your application. It's probably pretty obvious to hear, but in other words, you need to ensure that these logging systems, these metric systems can scale and that they can handle the load that your individual services are going to throw at them. If I scale up my individual instances for like the order processing system or whatever, suddenly and I flood my Elastic Search Cluster with log messages, how quickly can I scale to handle that? Will it scale to handle that? Will it top over? Will it fill up the disk and crash? My metric server cannot handle the flood but new connections that are actually coming into the new box. These systems require their own architecture and design. If I'm using a SaaS offering, you should know these limits as well. What happens if I scale up rapidly and need to make a bunch of newer questions to new connections through a third party service? What if I hit a rate limit on this on loggy as I'm trying to bring up the number of services? What do I do if they reject my connection? Furthermore, your individual service may get a bit more complicated in a deployment. You may package everything in a Docker container, but you need to understand if you're logging in your metrics utilities to make synchronous calls or asynchronous calls to deliver their information. If I record a metric or the collector dealt with async, is there a secondary process that I need to have running on my system to ensure? So as an example, I may have my standard standard out logging that goes, that ends up putting all logs and var log and I may have to start up a separate process to run and read var log and then push it off somewhere else. And that has to be packaged as part of the individual deployment container or whatever you use to package AMI, et cetera, that goes out. As an example, I have extensive Java background and we were a big fan of a tool called logback. They provide a plug-in called the LogStash TCP socket appender for logback. And this is actually pretty great because it simplifies a lot of that. With just this XML configuration, I now have an asynchronous process that automatically starts whenever my system starts up and is able to directly append and push to LogStash. So very easy to worry about, but other systems may not have that particular approach. Basically, these requirements are going to start costing you money either from the SaaS service or self-hosting or engineering team spent on them and you'll likely need to devote dedicated engineering time to it or start hiring. These efforts can act as the initial, these individual efforts, setting a metric, setting the logging can act as the initial steps for the founding of a dedicated, cyber-liability engineering team, for example. And I should probably also point out that I'm not one, I'm not what one of you might consider to be an SOE. I don't have the expertise, but I'm totally fascinated with build pipelines and managing Kubernetes, et cetera. And I find that app developers tend to be, let's say, not focused on system resources or constraints. And I mean this in the nicest way possible, but one thing I've always seen time and time again is that ops engineers tend to become something like an exhausted or flabbergasted parent. So I tell my kids, why are you trying to stick your finger in the electrical socket? Similar to, why are you trying to open up a database connection in every single iteration of the loop? Why? Something like that. And so I find that they have to come in and be like, and they can influence app design. Okay, next, how and where are builds done? So I would recommend Jenkins at a minimum. There's a bunch I'm gonna start getting even faster because I'm running low on time. Look into services like Travis CI, I had a great conversation with somebody from Circle CIO today. These tools provide excellent automated build and deployment pipelines that will save you tons of time and provide uniformity in the way that your code is managed. Yeah, I gotta have one in 10 minutes. Which is vital than working in microservice, which generally means that you're gonna have a bunch of revitalize. Now I really only have much experience with Jenkins, but I must say I think it's a incredibly powerful tool, especially the Lidia's versions, Jenkins Pipeline, going back to what is an infrastructure as code. One of their latest things is they require you or they give you the option to create a file just called Jenkins file. And you can put it in your code repos and it describes how your particular service should be built every single time it is pushed to Jenkins. It can also, or it should get up. So if I open up a PR for a branch, Jenkins will see it via web book, build it, and then alert GitHub of their success or failures. Side note, this area is one thing that I think is actually much easier with microservices. Modelists can acquire many long running tests. I'm sure we've all been in an environment where the app takes 30 minutes, 40 minutes, an hour to run the test week. Developers start to ignore it and push code and it breaks. So having a much smaller tightly focused service unit allows for much faster deployments, I think. Sorry, builds. Next, how do we deploy the code? When deploying code the goal is don't do anything by hand. A good deployment process is like a modern assembly line. No need for humans, except for perhaps final examination of the product. You should have push button, one click deployments or some sort of other control plane that allows you to get done in one single way that the different systems may be, or the systems would be different underneath but the actual deployment release process should be uniform across every service. And honestly, I'm kind of embarrassed I took this photo because I do not recommend using Jenkins for deployments. Just don't do it. Do you have any coding conventions? That is, do you force your builds through a check style? Do you have a minimum of code coverage you accept? Do all of your builds, you should have all your builds pass through a bunch of quality gates in order to be ready for release. You should agree on a minimum level of acceptance and outright fail any build that does not meet any of these. So if you have many services being developed independently, it's possible that styles, code structure, and testing quality, et cetera, may start diverging as you have teams that are running in isolation from each other for years. If you have to move team members around, you want to make sure the transition is as easy as possible for them by ensuring that the code that they're going to be looking at is as similar as possible to the last service or project. And of course, monitor things regularly. One tool I love is called SonarCube. Has anyone here used SonarCube before? Yeah, it's great. So it displays a variety of things, one of which is basically grades your code as it comes through. Now of course, I only show stuff that's passed and has nice, relatively nice ratings, but I've seen it fail a build, it gave me an F rating because I concatenated a string as opposed to using a string builder to build up a string. It is very aggressive, very powerful. One other cool thing it does is it shows you how much time it thinks it'll take for you to fix all of your stuff based on, and it takes into consideration stuff like code reviews and scrum meetings and things like that. So it might not literally take you 20 minutes to type it out, but it might take you 20 minutes to walk through the whole process to release. Next, can I generate a service template? By that I mean, when a developer needs to build a new service, they should not have to re-implement the wheel. They should be able to go to URL and get something that maybe has, maybe it's version, but it basically has a structure that allows you to, it basically deals away with the initial, basically it constructs a new service, stuff like setting up a database configuration, how do I deal with general configuration, et cetera, et cetera, project structure. Last in the infrastructure, how do we share code? And this is a bit of a trick question because I really dislike tight coupling and microservices really lend themselves wells who avoid it if you do it right. So with microservices, it is totally okay to re-implement some code on every service. I think it is better to allow, to have duplicated code rather than to have some sort of coupling between some sort of resource, some sort of shared library. You should set up your own internal artifactoring and it is acceptable to share infrastructure libraries, very little things like a common communication pattern, but you should never, ever, ever share business logic between services and never share domain objects or models. Some of you might be thinking, but I have a person, this person, this address object, this person object should exist in every single place. Every service has a different understanding of what that person object represents. One service may care about the, or just the email address, another one may care about the name and the full address. They are, you may call them the same, but they're represented differently in every system. I was wrong, there's still more. How do we manage our multiple environments? And for that, I mean, is your production environment wired hand-by-hand on AWS instances? My God, I hope not. Do we use tools like Ansible or Terraform, et cetera, that are managed to do this for this? Or do we use a third party or service? Because the microservice architecture can lead towards a ridiculous amount of infrastructure work, many organizations are working to solve this problem, both in terms of tooling to full platforms. Now, I have had great success for the past, I think three years with that Kubernetes and Docker. It's much easier to work with now than it used to be. And if you haven't, I would go and download a tool called, research a tool called Minikube. That's a great introduction to how Kubernetes works and teaches you all the basic concepts. It's really good stuff because it can allow you to do things like, so we stumbled across it because we needed a way to take, to create many different environments for our sales guys. And we needed a way to have a way to spin up the entire environment so the developers could work on it because there was no way you could spin up the entire environment on a given laptop. So we started using Kubernetes as a development environment. So with a click of a button, we have our own independent developer namespace as the terminology, and then I could disconnect from the cluster, or from the namespace, my particular service that I was working on and then trick the Kubernetes cluster into thinking my laptop was part of the cluster, which then basically gives me the full environment to test in. Doesn't work so well when you're like commuting on the train, but when you're in their office, it's great. And then that eventually, and then that led into a bandwidth spin up arbitrary, staging environments to the sales guys and then from that on to production. So that leads into a point called, that I like to think of as testing and developing an isolation. Or rather, you should test and develop in your own isolated environment. And if every team member is contributing to one environment, you will step on each other's toes or get in each other's way. So your organization should provide mechanisms that allow your application developers to test their code at isolation before it's released. And it should be scalable. So going back to what I was talking earlier, I'll give an example of something I've done in the past that combines these points regarding builds. Handling one or two builds with Jenkins is fine, but can your CI CD pipeline handle dozens or hundreds of developers constantly pushing fixes at it? So we use Jenkins by default, and we're sorry, we use Jenkins and by default it spawned or it gives you two worker processes to handle builds. That's fine for a handful of devs, devs but it's not fine when you need to scale. So Jenkins has an option where it can spawn actual distinct and isolated nodes as AWS instances to do their work. So our basic flow was something like, you push to get sends a web hook to Jenkins. Jenkins goes cool and starts up a little service, a secondary, a node is basically, I think it's wrapping the Docker container. Forget at this point. It then goes to GitHub and pulls down your changes and it wraps, it pulls down, you basically with Jenkins file and you can basically specify, I also need my SQL database. I need a rabbit instance. I need a bunch of other things and it pulls them down as individual Docker containers. And it runs your integration tests in this ephemeral environment. And then on failure, one of the bad parts here is this will all go away as soon as we're done. So one of the things we had to do was on failure, ship up the workspace and send it over to S3. And then the thing disappears but I still have my error logs and failed tests and everything, I can go look at that on success though. It packages everything up and sends it to Docker. So now I have a Docker container for my particular branch, for my particular feature, maybe master that's sitting and ready to go for deployments. It's pretty cool and then the environment goes away. Totally isolated, takes minutes to run, very useful. Okay, I don't have much time left. This is gonna go real fast. Do you have an overall design vision when you're building other systems? Does anyone here know what we're doing? You need a person with the entire vision of the microservices environment ready to go to answer questions or grow a larger team or a council and help people move forward. Which technologies do we use and how much freedom do we have in choosing technologies? Now I've grouped these next two because this is a really interesting problem. One of the great promises of microservices is that we can choose any language we want, the right tool for the job. In reality, you want to be as boring as possible. Why? Because you'll be easier to hire and because if you hire somebody that knows go and they're the only person that knows go and they quit, what are you gonna do? So you should be very reluctant, very careful in choosing new tech. How do we test an individual service? By very thorough integration tests. Integration tests are the best tests. TDD is fine, but if I can spin up an environment with real databases and real rebuttum queue and flood it with data, that is much more valuable to me because I end up exercising the entire system. I apologize for the brevity, but here we go. How do we test a platform as a whole? That is extremely difficult and slow thing to do. I need to spin up the entire service platform. I need to get all the data into a set, study, state, pump it with events, shut it down, reset all of that for every single test. So, and this is gonna be very controversial and every time I tell it to people, they tell me, I'm crazy, consider not testing the platform. Don't do it. Instead, monitor error rates. So if I suddenly see a spike and a blam, then I know something went horribly, horribly wrong right around that time and I should look and see what was deployed and I should pull it out immediately and try and fix whatever happened. And that is easy to get. Or should be if I've set up metrics properly. So next, how do our services communicate? By that I mean HTTP versus Async events. Everyone I've talked to that does microservices always does point to point HTTP communications and that's fine, but that's synchronous. Async event communication is a little bit more complicated to think about, but is a lot more reactive. We could talk about this for easily an hour, but I have negative one minute. But both of these approaches require infrastructure changes of their own. If I'm doing HTTP, I need service discovery mechanisms. If I'm doing Async events, I need some sort of message broker like Kafka or RabbitMQ, but either way, whichever one I decide, you better have some sort of circuit breaker to make sure that these systems aren't bogging each other down or a dead-letter mechanism to handle if something fails to deliver a message to somebody. Do we follow an algorithm over all architecture style? This would be a whole long, what's the word, and pontificate on this for a while, but basically just if you haven't read this book, go read this book, Domain-Driven Design. It is the best way that I've found to change my life designing big complicated systems and the approach that the author recommends scales out to microservices very well. In fact, there's a link at the end of this where the author, Eric Evans, it gives a talk about a year ago where he's like, wow, this is a perfect application to solving the microservices. How do we create new service? How much is each service responsible for? When do we create one? Basically, initially you design your platform around bounding context from the D2D world. Don't create services just because we feel like it and don't do it, it's not gonna work. Don't create services are basically just crud wrappers around your database. So if all it's doing is it's handling or requesting you going to a database, that's not a good use case. Do make an effort to identify boundaries, communication functional areas. This is a complicated concept, which I should talk about later. Ensure a service will have proper contract boundaries before creating, again, find more time to talk about DDD, this will make a lot more sense. Now this is an important point and it's easily accessible. If I have two services that need to communicate synchronously frequently, they are good candidates for actually merging. So if you're under a situation where I have two servers that highly dependent on each other, they should just do one. Last, the number of services should be less than the number of developers. Just out of hand. Okay, I got like three slides left. Teaming communication, normally I would talk about this for like 15 minutes talking about how we do code reviews, how we organize team structure, how the value of keeping persistent teams. But for time's sake, I just wanna make one point. Who here has heard of Conway's Law? It's real, right? If I, basically it's a law that basically says the design of a system is going to reflect the communication patterns of the actual people involved, so if people aren't talking, your system is gonna be a mess, but they are talking. Then the system comes out with nice, clean APIs. Microservices are a way to actually force a hard, programmatic-bounded context so people are forced to talk to each other. And it's great. So last thing, got two points left. Miscellaneous advice, don't get cute with naming of services. Any idea what these do? All right? So this one, there's one there called a supplication. Does anyone know what the word supplicate means? It means like pendants or whatever? This is a system that basically handled people's applications to work forward, to work with us. You should just call it the application, customer application service, right? Don't get cute. I just started a new company and we started naming a service, somebody named us a new service, Abacus. I was like, no, what does that do? Does it count things? Call it the counting thing service. It's fine. It's boring. It's fine. If you have a new feature, walk backwards from the user, we'll see, you'd be surprised when people start at like the database level and they get to the front end and everything's a mess. It's gonna hop around between different services, start at the UI first, release when the feature is ready, don't be afraid of bugs, just push it, it's gonna be fine. If service A has another service dependency on B, is it dependency, release B first, avoid situations where they both have to go at the same time as it generates maintenance windows. It's a hard one to do. How to bootstrap a new service. I don't have time for that one. API message versioning is just a thing. Like if you have stuff communicating, it's gonna change it over time and you've gotta eventually have to start addressing APIs and event versioning. It's very complicated. We can talk about that for an hour. And finally, don't release on Friday after news. And I'm done.