 Hello. How are you guys doing today? So my name is John Yee. I'm a Cloud Solutions Architect at Rackspace. And today, I'm going to talk about service orchestration with Docker and CoreOS. So I believe that this is actually sort of an important topic, obviously, for me, but for the community at large. Because I think that what's going to happen is all our applications are increasingly composed of web services. And eventually, what will happen is we're going to have a lot more services to manage, which means we have to figure out a way to automate all this. And while Chef, Saltstack, Ansible, Puppet can help us manage it, I don't think they alone can manage doing service orchestration. So I want to go ahead and start with the story. It's the green cookie. So our story starts with the green cookie. It's a startup. And as you guys can tell, this is what they make. Not very appetizing. So they only really actually see sales. St. Patty's Day, maybe Christmas. So things for green cookies is kind of ho-hum. Not much activity. So consequently, their IT infrastructure is fairly modest. They don't really need very much. It's just a simple lamp stack, right? And this is sufficient for what they're doing, because they don't have really any customers, right? Then one day. So they started seeing sales increase in these regions in the US, the dark green, Montana, California, Maine. Now, these two states, Washington State and Colorado, they started seeing enormous increase. They weren't sure what was going on, right? They're like, this is kind of weird. So the CEO of Green Cookies met with the sales and marketing folks. They wanted to figure out what's going on. Why are we seeing such a huge increase in sales? And it didn't take them long to figure out what was going on. So what they decided is they're going to focus on Washington and Colorado State, right? Because they figured, why not go ahead and strike the iron while it's hot, right? So in addition to delivering cookies to those states, they slipped a little something extra. So this is their strategy. And it was brilliant. Sales went through the roof, right? They were like, it was like, it's startup time, right? So their startup had hit somewhat the big time. And they were increasing sales a lot. However, you remember that trust CEO lamp stack that they had? That wasn't scaling well at all. So it was falling over a lot. And they needed to actually do something about that. So the first thing they decided to do is they gave the elephants in the room peanuts and told them to walk. And then a new group of developers came in, the Java dudes, right? So the Java dudes decided, you know, I know what we need to do to write the ship here. We know what the infrastructure needs. And so they re-architected it. They basically promised these things, performance, scalability, manageability, reliability. And so they re-architected it. Congratulations. They brought green cookies back into the 2000s, right? Because this is pretty traditional. So everything a growing business needed, they were doing pretty well. And the business just started to grow. And what happens when business started to grow? Well, there are all these different requirements. Then all these services are being asked. So they started needing to actually do something in terms of their app. They needed to add some additional components to it. So they started integrating this into their main Java app. And, you know, they called these components. And for them, the definition was it's a individual unit of software that can be upgraded. And in terms of this app, that wasn't exactly true. So these components actually lived inside the main process of the Java app. And there were issues, which we'll see in just a second. So over a period of time, you know, the development process starts to continue. And, you know, there are interdependencies, all these shared libraries between these different components. The search depended on like the mail app and the off depended on data. And like the data depended on all these components and off basically was needed by all the components. So there are all these sort of interdependencies. And some of it was, you know, necessary. Some of it was just shared code. Some of it was just libraries that they were sharing. Sometimes search would share a library that off needed. And what you're seeing right now is sort of this creeping in effect of all the different sort of like areas of maybe the main Java app and the different things that are happening as the development process is coming along. Let's see. Now you can't see this as clearly, but they also had other issues. For example, let's say they wanted to upgrade their data component. I try to make that clear, but it's not showing up as much. They wanted to basically upgrade their data component, but they couldn't just upgrade that single component. They had to basically upgrade everything else. They had to basically rebuild, redeploy, do everything that that app had because they were all tied together. There was no easy way to actually break them out. Scaling became an issue, right? So let's say for example, they wanted to scale the search component. Well, it's part of this main app. So if they wanted to scale the search component, they scaled everything. That included auth, data, mail, the main Java app. And the issue of course is maybe those applications or those components don't need all the resources that they're asked to actually use or consume. So this became an issue. And on top of all that, they're a growing company. And with growing companies, that means you need to find talent. And sometimes they're not Java developers. Sometimes there's Gopher here, Pythonista. You've got the Ruby guys. So all of a sudden, you had all these different components. They're all part of this one app and they were trying to figure out how do we make this all work together? And so they're kind of looking at this, right? They're saying, hey, how do we separate out all these different components and how do we make this all interact with each other? Of course, the answer was simple. Make web services out of them, right? So I'm using the term microservices. I know that has a little bit of a storied past, but I think that's the most accurate description of this. And essentially with the web services, you have a decoupling of all the different components. You don't have to worry as much about all the different integration points within the main Java. It just simply calls the web service. And to a certain extent, OpenStack is the same way. It has a bunch of web services or projects to make that work. So this looked like a pretty brilliant idea for them. Now, you've got like a Python, a Gopher, right? You can put gem. Gem's really popular with the ladies, right? They go out to this party, right? And so they're sitting there kicking back and they're basically mingling. And they meet this whale, right? This whale is just saying crazy stuff. What's the whale saying, right? Saying, hey, there's a Unix philosophy that I believe in and I think I can deliver something for you guys that does the same things. We've got these things called containers. Do one thing well, right? You can take a bunch of these containers and compose an application out of it. And then finally, you can even use this thing called linked containers. It's almost like a Unix pipe. So he went, pulled out an appkin and he drew this. Basically, we've got an app service. It has a data store on the back end and within the same Docker container, they link the containers. And in a lot of ways, there's a lot of similarities with how you might use Unix to pipe it to another command. And so he also kind of talked about the ambassador container. How do you get this to actually talk to other services across other Docker instances? They call this cross-host linking, right? So essentially, you have this ambassador container whose sole job is to essentially provide the connection to route a connection to another service. You have an app service that says, hey, I have a dependency and the ambassador basically helps along with that process. So the operations guys, they were loving this, right? They started seeing sort of the potential of what can be done here, right? They liked the idea that there's this nice, neat container and they didn't have to worry about all the packaging and all the things that went inside that. They were super psyched about it. So they kind of looked at it in terms of, hey, on the left-hand side, we've got the build. These are where the developers live. On the right-hand side, that's essentially where we live, right? We make the pipes and everything else fit together to actually go and run this. And as long as the container has all of the developers' mess and we are on our side doing our operations thing, everything is good. So the operations guys were loving this. They liked that there was a clean separation between what the developers might do and what they might do. And they could also add some additional infrastructure with containers. So they did what any startup would do. They went and used a non-production-ready technology and they implemented it into production, right? I mean, wouldn't you guys do the same? You're a startup, man, come on, right? You make green cookies for crying out loud. Anyway, so this is sort of like their web service. So they're running it in a Docker container. About this time, they started running with this for a little while. And that's the thing about new technology. When you start fiddling with it and you start using it, you start to get to know some of its downfalls and what they kind of discovered is, well, some of their Docker instances looked very much like this air traffic control map, right? They're containers all over the place. Now how do you manage all of this stuff? They started really kind of thinking about this and they're like, well, we have these web services surely, surely, someone else is doing the same thing. So they looked around. Maybe they talked to Rackspace. Maybe they talked to AWS. Maybe they talked to Netflix, Airbnb, some of the other folks that are out there that are managing services. And they came out with three sort of fundamental things about being able to do surface orchestration. First, they discovered that there needs to be some sort of central registry. And generally, the central registry looks like a file system except it has on its notes some configuration information. So the idea is when there is a service, so here we've got our app service and our registry client. When it needs to publish config information, it simply says, hey, I'm all right over here. And he says, well, I need to go and talk to the central registry to let who else might be interested know that. So he does that. And of course there's an app client at the other end of this. The app client says, hey, where's data one? Client then connects to the data service. Fairly simple, right? Fairly easy concept to grasp. So essentially in a nutshell, that's service orchestration. So around this time, they're like, they're kind of running with Docker. Everything's going good. And they're a startup. So what do they do? They go out and this doesn't show up very well, but they went to a rave, right? What do you do at raves? I don't know. Maybe they had some green cookies. Maybe they had some other things, right? But the thing is they got pretty hammer, pretty sloshed. One of them was sitting on the street somewhere, right? And he sees this pass by. And the gopher says, hey. So the gopher begins to tell him, you know what you should do? You should run your Docker instances in CoreOS. So the gopher was basically giving him some tips here. And he said, hey, first of all, you know that central registry thing that you're kind of battling back and forth in your mind? Well, we've got that. We've got this thing called XED. And not only that, each instance of CoreOS can be clustered. So they share the same data store. So let's take a look at that service registration piece again, right? So I've got a data service and a service registration container. Now, the service registration container handles that function alone. The only thing it ever does is it says, hey, I see that there's a service. I'm gonna publish it for him. And what's really neat about this is you're decoupling the service registration piece from the actual app. The data service is unaware of any of this stuff. So he publishes where he's at. The app client has another, let's call it a sidekick, has a sidekick as a service discovery container. He says, hey, where's data one? And so the application client can then ask that and he can get that information. Now the application client obviously didn't ask that because he doesn't know. But the service discovery container handled that piece for it. So the application client can now connect to data one. So this is off of CoreOS' website. This actually shows a little bit about how everything is kind of put together, although in this particular case we'll go through the low balancing container. There's one other thing that the gopher was kind of mentioning. He's like, hey, on top of all that cool stuff there's also this thing called fleet. It's a great project, right? He said, well, what can you do with fleet? Well, say for instance that you have an instance that dies, fleet will automatically restart the containers and the resources that are available. So for you folks that are familiar with V-motion, this kind of looks V-motion-like. But the idea here is that it automates having to even worry about failed CoreOS instances. It's like, wow, this is pretty awesome. So business is booming, it's rolling. And so far we kind of talked about microservices. We've talked a little bit about implementing Docker for these web services. We talked a little bit about service orchestration. And we've got Hiana here. So the next part of the story is about a cookie monster, right? Because I wanna be clear, this is not the cookie monster, it's a cookie monster. And apparently this cookie monster was out there one Saturday night and he's loving this company, Green Cookies. He decides he loves it so much he's gonna tweet it to all his followers. And this cookie monster, this a cookie monster or this cookie monster had a lot of followers. So they had this single point of failure essentially. It was a single low balancer on their services and it's a Saturday night. All his followers go and hit it. And of course it goes down, goes down dramatically. Now, keep in mind all the orange components, they don't have any issues whatsoever. It's just this low balancer that had the issue. So they have this sort of problem, right? Needless to say, the DevOps guys, they're really unhappy. They try calling the developers, they didn't know what was going on. It was a big mess. And what happens on a Monday morning? Like the developer rolls in, he says, hey, how was your weekend? Well, I was fighting a fire. It was like our operations, our site was down, right? So what do you do? You kind of like go through a triage and try to take a look at what happened and try to improve your infrastructure. So that's what they did. And they took a look at that single point of failure. Maybe that could be a middle tier or a group of low balancers. They thought of different things. Maybe we could have dual load balancers. Maybe we can do some scheme where we don't have to depend on one, right? Well, they went back and kind of took a look at the different folks that are actually implementing microservices. And it turns out almost all of them are basically embedding the load balancing in the app clients. So the app client, they figured, would have the smarts to handle this. Now, with this particular piece, and you see the registry client, that could be a container and it could also be a container that handles the load balancing piece. And so what you're essentially hoping to be able to do is be able to offload some of that from your developers. So here's a client load balancing scheme. It's kind of simple here. So one of the key pieces of this is that there is a registry client. And the registry client needs to basically tell the central registry, hey, I'm alive and he does this every so often to make sure that the central registry knows this. Now what happens when a node fails? Well, when the node fails, the central registry said, I haven't received heartbeat. I haven't received anyone refreshing the time to live. I think I'm just gonna remove that entry because it must be dead and it would be correct in this case. So the app client, always keeping the registry client updated knowing all its list of services that are available reroutes. And then we have this great scenario, right? DevOps is a hero, right? They solved it and they put it together. We're running now microservices, Docker, CoreOS. And that's pretty much the end. So I did quite a bit of research and some of the links that I found really helpful if you're interested in this topic is Eureka. Obviously those folks actually wrote up quite a bit. Airbnb has SmartStack and then you've got the Docker documentation. CoreOS has a blog about doing exactly what I kind of described. So that's it. Any questions? Another question, can you go back one slide? Cause sure, there we go. So you were saying the client registry is sending heartbeats for the particular service? Oh, I'm sorry, that is what they call. So I didn't want to get too detailed, but it's a watch. Essentially it's like a long pole. So the registry client is basically making this long pole and it'll just sit there and wait. And if the registry says, hey, something changed, you know, I removed a note, I added a note, right? That update would hit the registry client, it would go then say, hey, my list is updated and then it go right back out and do the same thing. That's what that's supposed to represent. Yeah, I mean, the central registry is acting like a load balancer cause it's getting the heartbeats from the different registry clients. And when it doesn't receive a heartbeat, it times out and pulls it out of the registry database. But I'm curious to know a little more details on the heartbeat. Is it just a general heartbeat or is it looking at some of the key aspects of those particular servers? How deep is it pulling those services to know is the service truly down or is it degraded? Oh, so in this case, this particular example is in the case where, you know, there's something called a TTL refresh. So when you create a configuration known, if you can say, hey, I want this to expire in, I don't know, a minute or something like that, right? And if you don't receive a refresh for me, remove it off the list. That's not necessarily the note, the mode that you have to put it in, but you can put it in that mode. And then what you can do is watch, you know, the nodes that actually contain that configuration. So if anything were to happen, they would be able to update the list. And then in that model, it appears that the central registry now becomes that single point of failure and scalability concern. Right. Do you know how to address that or how it's... Yeah, so essentially, and this was kind of hard to diagram, but because EtsyD is on every single instance of CoreOS, you always have an EtsyD service running on a CoreOS instance. So even if you lose that piece, it's actually not necessarily sitting as an isolated network sort of service. It's part of, you know, the CoreOS instance that you're running on. It's just not quite shown here. The registries in EtsyD... In each CoreOS instance, there's EtsyD and they're clustered together. So basically all your central... Exactly. Any other questions? The links. Yes. Was it the last slide? No? Am I going the right direction? This is next to the last slide. Sorry about that. I thought you meant container links. Sorry. Any other questions? Yeah, what's the scalability like in terms of horizontal scalability? So they have a recommendation, but they're eventually gonna lift this ban. And by the way, all the stuff that we're talking about, non-production as you probably know, but they have this sort of idea number or they call it the idea number, which is between three and nine. But eventually what they're hoping to do is they're eventually gonna let you expand this to however many. That's their goal. So right now it's using the RAF protocol and only one of them is actually, when it's selected, the leaders actually making the rights and all the rest of them are slaving off of that. Thanks.