 without touching them. So, here we go. So, who has used ECS before? Yeah, a few people. So, let's spend maybe a minute explaining what ECS is. So, ECS in these companion service ECR are a couple of AWS services launched in about a year, a year and a half ago to manage Docker containers on the AWS. Okay. I expect most of you guys are slightly familiar with Docker. Yeah. Yeah. All right. Okay. And so, as you know, you know, running Docker on your PC or your laptop, that's fairly easy. That's not no big deal. But when you start running Docker on, you know, five nodes, 10 nodes, 15 nodes, 100 nodes, then problems, you know, start to arise. Complexity starts to arise. And so, we launched those two services, ECS, which is orchestrating containers, running containers on a set of EC2 instances pre-installed with Docker. It's free, you know, we hate to say that, but it doesn't, you know, add anything to your AWS build, like, you know, elastic beanstalk or cloud formation. Similarly. And ECR is a Docker registry, a private Docker registry for your containers. Okay. And it's pretty cool because you can run, you can store your Docker images close to your AWS infrastructure. And it's highly available. It's secure. You can apply IAM policies, et cetera, et cetera, all the good stuff that we usually provide those services. ECR mandatory to be used? Absolutely. You can use any, any storage you want. It's you can use, you can use anything you want. What I like about ECR is that it's private. It's highly available. And I can apply it on policies, which is not exactly what you get with the Docker hub, the public one. And if you want the private Docker hub, it's slightly more expensive. But yeah, sure, you can, or if you have your own, you can, exactly. It's, you know, it's like code commits, you know, it's a, it doesn't do much more than the public alternative, but you know, it's highly valuable and very secure. Is ECR quite similar to the Docker swarm? Oh, let's not get there right now. Come on, slide one. And someone's going to ask about Kubernetes on slide two, right? Let me warm up a little. Come on. I'm jet lagged to death. You cannot imagine how jet lagged I am. I will get to that. I promise. Okay. It's, I double checked it's available in 11 regions, including Singapore. Okay. We have a lot of partners working with us on Docker, on ECS. So I'm not going to list them all, but you know, keep in mind that these are available. And you know, you're probably using one or the other. Do we have any Rancher fans here? No? No one? Okay. No big deal. I love Rancher. It's very cool. If you want to run core OS, you're for, as you, instead of the Amazon AMI, that's fine too. If you want to do Docker, continuous integration, Docker security, et cetera. So all these companies are, are working with us to make ECS a better and more integrated product. Well, we do have customers too on ECS, right? This is just a selection. I'm not going to go through them all. You probably recognize most of these names. To answer your question, there was a very interesting article by a company called Datadog. I'm sure you know those guys, a monitoring company, excuse me. I could, I could find that one again if you wanted me to. And they published a study on container adoption. And so they compared, because they have all this monitoring data available, they compared different technologies that their customers use. And one interesting data point was the bigger the Docker deployment is, so you know, the more nodes people run, the more containers people run, the more they use ECS. Right. So I will not give you my opinion on what it means, but I think it's an interesting thing, right? So of course I'll give you my opinion. That means people playing on a laptop, people, you know, playing on a laptop or running maybe a couple of nodes, you know, they might go with alternative technology like swarm or Kubernetes. But when it gets really serious, you know, 15 nodes, 100 nodes, as it turns out, according to that Datadog study, right, ECS is the number one technology. Right. So whatever that means, you know, I think it's an interesting data point. The topic of tonight is scheduling. It's a very specific thing. So let me explain what we're trying to do here. And this is exactly what we're trying to do. Right. We've all played this game. If you guys have children, probably you have this in your living room laying around. And of course we'll know that game. We have to find the right spot for the right shape. Okay. And of course when the grid is three by three, and you have a limited number of shapes, this is a really easy game. Right. So if you had nine containers to put on your one machine, that's a fairly easy problem to solve. Okay. Now imagine that grid is a thousand by a thousand. And, and you have 50 grids. Right. You have 50 nodes, which are capable of holding a thousand containers, or maybe, maybe only a hundred containers. That's a very complicated game. Okay. And of course you will get it, you know, but can you get it in linear time? Probably not. Right. The larger the grids, the more grids you have, the more time it will take you to find the right spot. Okay. And so scheduling time will not be linear to the, to the size of that grid. And that's a problem. Because it means you're, you're not scalable. And I'm not either. Okay. And so the problem that you see as scheduling is traced to solve is no matter how large that grid, no matter how many shapes we have, no matter how many grids we have, we can find the right spot in linear time. Okay. And I have references to articles published by Werner Fogels, the CTO of Amazon. We explained this stuff in more detail there at the end of the presentation. And he has a graph showing that when we scale ECS 2000 nodes and we load that cluster to 90%, scheduling time is still linear. So that's what it looks like. Very quickly. So a number of instances, which are EC2 instances running Docker and running the ECS agent, which is open source. You can't read it, but it's in there. It's on GitHub. You'll get the size obviously. And connected to that agent, we have the ECS backend, which is going to receive orders from you through the API. And it was going to take scheduling decisions, put the right container in the right spot. Okay. And of course you can have load balancers, et cetera, et cetera. Just an example here. So that's the existing architecture. And we have a customer called Coursera. I don't think I have to introduce them. We've been using ECS for a while. And what they do with this is they run, when you run Coursera and you're going through, let's say, Java course or Python course or any kind of language, you have programming assignments, right? And so you write your code and you upload it to Coursera and they run unit testing on it and they tell you how good or how bad a job you did, bad in my case. And so of course, this code is highly untrusted, especially mine. And it's really a danger to their platform. So they run it inside Docker containers to make sure that whatever you upload and we upload, it's not going to take that platform down and you're not going to use their servers for Bitcoin mining or silly stuff like that. And so they also have lots of other containers running there, you know, reporting and statistics, et cetera. But they want the grading jobs to run faster because they're more meaningful to their users. And so they implemented their custom scheduler to do that, okay, to make sure they gave higher priority and more resources to those grading jobs versus their internal processing jobs. And so to do that, they used the AWS CLI and they implemented custom logic, et cetera. So that's option number two here. You can do that. You can call the AWS CLI, get the state, get the list of tasks that are running anywhere, et cetera, et cetera, and decide that this new task should be running here. Okay. It's doable. It's slightly complicated, but it's doable. If you don't want to do that, then you can let ECS handle scheduling, you create a service, you define a task in there, which is the equivalent of the compose file for Docker. You tell ECS how many jobs you want to run. You give CPU and memory hints to the scheduler saying how heavy that container is going to be. And ECS will figure it out. Okay. So you have the, I would say, the easy automated way and the slightly more custom way. But that was before, right? So I reinvent, we announced a slightly more elaborate way of doing things through the placement engine. And the placement engine basically allows developers to specify constraints and strategies, okay, to give you more control on where that container is going to run. So constraints deal with the AMI. You could say, hey, that container needs to run on this specific AMI. Availability zone, instance type, distinct instances, make sure, you know, you spread the containers across as many instances as possible, or custom stuff, you know, tags, whatever you want it to. Okay. So that wasn't really easy to do before. Sure. You could iterate through, you know, the cluster, the cluster state and find that out for yourself, but it was slightly awkward. So now you can give constraints directly to ECS and say, okay, this is where I want to run my container. So here's an example. So we're listing the all the instances running on a cluster. And we can use a fairly simple query language saying, hey, I want to list, just give me the instances which are T2 small instances. Okay. Because this is where I want to run my container. Okay. So you could match on instance family, instance type. You could match on availability zone, say, hey, please run this container in this specific availability zone or run it anywhere, but in this availability zone, right? And you could use all those constraints to make scheduling decisions. Of course, you can combine them. You could say, give me the T2 small and T2 mediums, which are not running or all the J2s that are not running in US East, I don't want to, you know, know why you'd want to do that. But okay, it's possible. So you can filter your instances on all those constraints and use that list of instances to run your containers later. Okay. It's very easy to do. Then the second thing we added is placement strategies. So strategies are deciding how you're going to spread your containers across the set of instances that you've selected. Okay. So you could impact them to just cram as many containers on the smallest number of instances, right? So it could be on memory or it could be on CPU. Obviously, this is a cost-saving strategy, right, to minimize the size of your ECS cluster. Okay. But from a high availability point of view, it's not so good. Okay. But if you just want to have the fullest cluster possible with the smallest number of instances, that's a good strategy. Spreading is quite the contrary, right? You could say, well, you know, spread across AZs. That's good for high availability. You could do affinity. You could say, well, I want container A and container B to be close to one another, right? It could be maybe, you know, a web server and a cache container or, you know, two services that are, you know, linked and you want to make sure they run on the same instance. Or you could have distinct instance where you only want one container of any given type to run on one instance. Okay. Again, you could probably do that before, but, you know, at the expense of way too many CLI calls and headaches and custom code. You can combine strategies. So you could say, please spread across AZs and bin pack. Okay. Why not? But at the end of the day, what you want to do is place those containers, right? So you can use constraints. You could use strategies and then based on this, you're going to run your task. And so that's how it goes. First, you take into account constraints. Okay. So how much CPU is left on my cluster? How much memory is left on my cluster? What about network ports, right? There is only one port 80 per instance, unfortunately. So, you know, if you have five instances and you want to run 60 containers on port 80, well, that's not gonna work pretty well. Okay. Okay. So the custom constraints, which are mentioned, I just mentioned, AMI instance type, et cetera, placement strategies. And then you apply all those filters and you have the list of candidate instances where you can run your container. Okay. So let's look at a few examples. So I'll just show you a few. There are more in the slides, but I don't want to keep you for hours and hours. So here I'm running a task on the cluster called ECS demo. I've got a task definition. I want to run nine tasks. My placement strategy is spread. And I'm spreading based on availability zones. Okay. And I've got nine containers. So try to run this in your head. Know what's going to happen. I could never do that with PowerPoint. So that's why I'm reusing some slides because, you know, the clever stuff I cannot that's why we have program managers. Okay. So I've got nine tasks, which are spread, of course, three AZs. So that means three tasks per AZ. Okay. And in this case, they've been spread across multiple instances. So slightly different here. I'm spreading across AZs, but I'm packing them for memory. Figure it out. Okay. So first spread them and then pack them. Right. So it's a mix, you know, it's a mix of high availability and and costs. Okay. Sure. So you have 12 instances there. Yep. In this case, shouldn't the scale like kill the two extra since they're not used? That's a different discussion. But you're right. I'm just surprised to see unused. So your question is more on how do we scale in and scale out the cluster? Okay. Here I could have a fixed size cluster with 12 nodes and still do that. Right. Because maybe I'm keeping the G twos for GPU jobs. But you're right. The better way would be, in fact, to say, well, as you say, you know, I've got nine instances that are clearly useless here. I'm paying for nothing. Right. So I should be able to detect that my cluster is over provisions. And I would scale in and remove instances. Okay. And you can clearly do that with auto scaling. Okay. The traditional way you would say, well, CPU, that's like out of VCS. It's, yeah, it's easy to stuff. You use the easy to scaling to scale your, your hosts. And then the instance will then be scaled. I will spread to be more correct. Exactly. So it's, it's really, it's two different discussions. Okay. How do I, how do I allocate enough in situ instances to my cluster? So it could be fixed size, or it could be auto scaled using the easy to auto scaling that we will know. And then based on that, where do I put my containers? Okay. So you can do that with ECS services as well. So here's another example where that first service is going to be, so the yellow one is going to be bin packed on memory. And the purple one is going to be spread on Aces. Right. So bin pack means just cram, cram them. And that other one, you know, is more, let's say, maybe more critical for your platform and you want to spread it. Okay. So you have that flexibility. You can go really crazy with this. Here's another one, distinct instances. So the yellow one needs to run on G2, probably it's the GPU container, but they need to be on separate instances. And the purple one is T2 small or T2 medium on distinct instances. But hey, we have a problem here, right? Not all of them have been placed because I only have two G2s and I have a third task to run, but, you know, there's a constraint there and same for the purple ones. So the only option here is to add instances. That's where your ECS doesn't do this game. So you have ECS metrics for cluster-wide metrics and service-wide metrics that go through CloudWatch, RRMs, blah, blah, blah, the usual way. And at the end of the day, you're adding more nodes to the cluster. Okay. But do you have ECS metrics as well? You have visibility? Yeah. So ECS is selling metrics for CloudWatch and you have your auto scanning. Exactly. The EC2, the usual way. And so if we want to add some more stuff here, same story, you know, blah, blah, blah. Okay. So be careful with distinct instances because that's exactly what it means, right? It's one per node. So it could be costly. Okay. So that's what we just saw there, you know, constraints and placement strategies that just give you, you know, you could do that before, but you would end up writing a lot of code and no one wants to do that, right? We're lazy. So now you have an easier way to do it. Okay. But let's go one step further. Yes. Yeah. The affinity will be based on like a tag, like a group name. So you could have two, three, four. Yeah. More than two. So now let's go a little further. So we added another brick, which is called the event stream. And the event stream is exactly what it means is whenever, whenever something happens on the cluster, that's going to trigger an event. So, you know, container starting a container, stopping, dying, blah, blah, blah, anything happening. Okay. In real time, you want to get those events, send them to CloudWatch and obviously do something about it, right? If you're one of your instance dies and it takes a whole lot of containers with it, you know, you want to do something about that. Okay. Quickly. And so we can send those events to different services, but specifically what I want to show you tonight is something even weirder. Now we get into weird territory, which is blocks. Who has heard of blocks? All right. No one except you and Kai, I suppose. Okay. So we are in weird territory. Okay. So what is blocks? So blocks is not an AWS service. Okay. Blocks is, it's an open source project that AWS launched to, you know, to create a community, a development community around ECS and ECS projects. Okay. So I'll go back to that. But for now, just keeping in mind that what we're trying to do, we're going to, we're trying to grab all those events happening on ECS clusters. We want to send them through CloudWatch events to a queue, right? Right there. And we'll have blocks here, reading those events and taking action. Okay. Specifically, we have two blocks. We have what we call the clusters, the cluster state service, which is basically a key value store, which knows exactly what the cluster looks like. Okay. What are the containers running where, et cetera, et cetera. And we'll use, we can write a scheduler based on the events that are coming through CloudWatch and based on the current state. And by combining both, we can implement pretty complex scheduling strategies. Okay. With this architecture. So now this is what it looks like, right? So you could be, well, you could be that guy here, right? He doesn't want to know anything about this. Okay. And just do it the previous way. Okay. Just make sure I have three copies of container A and I have five copies of container B that I don't care where they run. Just make it happen. And the load balancer just will find them. And that's a perfectly sane way of doing things. But some customers like Coursera and others, they need more control and they need a lot of control on what's happening. And they will be the guys on the right-hand side going through writing their own scheduler and going through something like blogs or their own code to implement complex scheduling policies. Okay. So you have the easy way, you know, not worry about it and the powerful way, which, you know, you need to work on. So while I explain this, let me start my demo because of course there's a demo. This is the most dangerous of demos to do with jet lag, but let's do it. Yeah. It's probably still too small. Can you read okay in the back or that's okay? All right. Just yell if you can. Okay. So, so we're going to do this in Singapore, of course. So I need to create two clusters. I will explain why in a second. So first, I need to create a cluster, an application cluster where I'm going to run engine inks on that one. And I'm going to launch, or actually I can do it in the console here. I'm going to launch the CloudFormation template to create all this, all the block stuff. Don't worry. I will explain this in a minute. Is that how I call it? Yeah. Blocks cluster, blah, blah, blah, blah, blah, blah. Next. Next. Yes. Okay. Go. It's all live. No pre-cooked demo. I live dangerously. Maybe I'll regret it in five minutes. Okay. It's all for you guys. Okay. So, so how this works is, like I said before, I have a cluster on the right side that I call the application cluster, the one that really runs my containers. And it's going to live its life and stuff is going to run, stuff is going to crash and, you know, the software life. So whenever something happens, events are going to go through CloudWatch events to an SQS queue and to blocks. Right. And the scheduler is going to receive those events and do something. Okay. Right. So, and this is running on an ECS cluster as well. Okay. So the CloudFormation template that I launched creates all of this. Okay. Creates an ECS cluster with the scheduler and that and the SQS queue and the CloudWatch rule, et cetera. And the other cluster, the application cluster is this one. Okay. So basically this is what I just did, run my template to create blocks, which creates an event rule, the SQS queue, the scheduler, the state service. It's also using an HCD for a key value store. There's an API, which I can invoke to get to the scheduler. And this is the other one. Okay. And it should be ready in a minute. And once this is all running, right, I'm going to use the scheduler API using a tool that is included in the blocks code to basically start managing the application cluster. Okay. So let's see if this is done now or should I wait a little bit? Okay. Obviously I should wait a little bit. So let's just wait a second. So I can see my blocks cluster here. I could see my, the application cluster that I called web cluster here. Okay. That one is ready and the blocks one is almost ready. Okay. Okay. Let's wait for a minute. Do you guys have question while we're waiting for this? Yeah. Is there any plan in the future to have blocks like this service? It's a, it's a good question. It's a good question. I'm sure there was this discussion at some point. I was, I, I should mean I was extremely surprised. I was at re-invent and I was extremely surprised to see this announcement. You know, I thought we would keep investing in ECS and, you know, more, I would say more on a managed service level. But I'm not saying we're not going to do that. I'm, I'm, I'm quite sure we will keep seeing features in ECS completely managed, completely, you know, on the shelf, off the shelf, et cetera. But we also have this open source project. And I think it's nice. It's an interesting way of doing things because I guess the open source project can move much faster. All right. That when it's a managed service, it's like just updating the managed service. Maybe, maybe that's, yeah. You have to run the, the cluster to manage your stuff that manages the other cluster. Yeah. Manage the cluster that manages it. Yeah. No, the blocks cluster is super simple. You know, it's a, it could run on an EC2 instance, you know, it could run on. Can take away the EC2. Yeah. That happens. Sure. So, so my point is, I think it's interesting to have both the managed service, which is, you know, stable production grade. And that, that keeps evolving nicely, you know, not too fast because you can't break the production of, you know, all our, all of the customers. And to have an open source project where people who really need to have the bleeding gauge stuff can experiment and contribute. And maybe, you know, it's going to feed back into the, it's going to feed back into the, we're almost there. It's going to feed back into the, the, the managed service. I think it's interesting to have both. Right. And I'm not an expert on, on Kubernetes, but I have the feeling this is pretty much what is happening there too, where you have the Kubernetes, you know, the actual open source project that goes very fast. And the, the managed version by Google, which is probably, you know, more conservative. So I think it's, it's pretty interesting. So, yeah, the EC2 instance is, come on, we should be ready now. More? Yeah, yeah, yeah, yeah, it's getting there. Okay. Do you try to refresh the cache? Yeah, maybe just to console playing tricks on me. Okay. Well, we'll just pretend it's done and start launching stuff. Oh no, it's the API gateway that's slowing me down. All right. Come on, API gateway. Okay. One more question. We have time for one question on anything. Come on, ask questions. Yes. Anyone knows why we have Canadian pizza in Singapore? No, that's a good question. I don't have the answer to that. All right. So the cluster is ready now. So, maybe I can just see that. So the first thing I want to do here, the first thing I want to do is to get rid of the silly branch. What's it called Dave? All right. Okay. So I've got a few tasks definitions ready. It doesn't really matter what I'm going to run on that application cluster. It's going to be engine eggs. We don't care so much about that. It's just an excuse. Okay. So first, what I need to do, okay, let me show you that thing once again, just to make sure it's perfectly clear. Okay. So this is the blocks cluster. This is the application cluster. And right now, nothing is running in there. Okay. If I show you this one right now, it's the other one. It's web cluster. It has three instances. And it's totally empty. Okay. No tasks running there. So what I'm going to do now, I'm going to call this thing here, which, which is hopefully, yeah, the right one. And so what I'm doing here is I'm invoking the scheduler API on the blocks cluster telling it, okay, please create an environment which I, which I call the web environment. This environment is going to run on the cluster called web cluster, which is my application cluster. This environment includes this task definition. So my engineering is the sweet definition. Okay. And that's okay. So basically, I'm telling my blocks scheduler, hey, you can manage this environment based on this task definition on the cluster called web cluster. So this puts web cluster under blocks management. Okay. Right. But this is just declaring it. Okay. Nothing is running. What is going to make it run. And don't worry, you've got all this stuff in the slides, but it's so much more fun to show you demos. What's going to actually do the trick is this, okay, create a deployment, which means, okay, now do your thing. Yeah. Now do your thing. Okay. So now you've got this application, this web cluster under management. Do your thing. But hey, what's the scheduling policy here? Right. That's the question you should have asked. There's a deployment, there's a scheduler, right, in that blocks cluster. What's the policy? Does it been packed? Does it, what does it do? So this guy here, including this demo is called a demon scheduler. And basically it means it will run one container on each one and only one container on each node, like a, like a demon, make sure that container is running everywhere. Okay. And now if I look at my container, once again, I see three tasks and I see that indeed each task is running on a separate instance. Okay. So what about, what about adding nodes? Okay. So I'm adding more nodes to that cluster. And if the scheduler works, okay, try to understand what's happening there. More nodes are going to join the cluster. That's obviously going through the event stream into the scheduler. Let's see. Hey, more nodes. Good. I'm the demon scheduler. I need this engineering's guide to run on every single node. So I'm going to start three more containers, each running on a separate instance. Okay. So you're adding EC2 instances. Sorry? You're adding EC2 instances. Yeah. Hopefully. Just give them a minute to start. Yeah. That should work. Just need to wait for them to join. Okay. We can check that out. Okay. So here, all right, maybe I should show you just the project for a second. Okay. So there's a, there's a GitHub page for this, which is here. Okay. And you will find, so that's the demo CLI tool that I used to send scheduling commands. And that's the cluster state service sources here. And that's the demon scheduler here, right? Okay. So it's all open source. You can, you can check it out. You can send your pull requests. You can create issues. You can yell at the maintainers. You know, it's all there. Uh, no, no, uh, no, there's no, there's, there's nothing like that. I mean, obviously you can take your images and run them on ECS. You can reuse your, your Docker compose files. We support that. But I would say that's it, right? The, the common, you know, the common thing between the different orchestrators are, you know, the Docker images and the Docker compose files. So I've got my six instances here. And of course I have my six tasks running. Okay. Right. So this is a super basic strategy, you know, the demon scheduler, but again, it's all open source. You can go and, and, and learn from that and build something more complex. Okay. And, and, and sure I could have done this using, uh, using a service and using the distinct placement strategy. Okay. So this is really, you know, this is really heavy, heavy stuff to get to a fairly simple result, but, you know, uh, it's a, it's a demo. It's a simple demo. And, and again, it's open source and you can run from that. Okay. So I would say for 90% of the people, you know, you, you probably will need that, but keep in mind that if you have custom strategies, you know, uh, and, and when you get to, you know, 70, 80 nodes, 100 nodes, you know, things get complicated and, uh, and you have to do that. All right. So we saw that stuff. All the comments are in there. If you want to replay that demo, you can do it. Nothing hidden. So these are the articles, the Verner articles I mentioned, pretty good read. That's the blocks URL. And there's a page there with all the ECS demos, uh, DCS, sorry, videos and demos from re-invent. So check, check them out. There are lots of good stuff there. Okay. Well, I'm done. So thank you very much. Uh, if you have questions, you know, I'm still there. Thank you. And, uh, you know, really cool to be in Singapore. Thanks. And of course, I need to put you guys on Twitter.