 I'm a systems engineer working at Lifex, basically building the cloud services for our internet connected bulbs, essentially putting the internet in the internet of things. About this talk. So this is meant to be a how we do things and our experience. It doesn't mean that you should do this. You should check first and see if it's appropriate for you. Just because it worked for us doesn't mean it works in everybody's situation, but you might be able to take parts of it, particular bits or modifications, what have you. So step one of building a cloud infrastructure is writing something to run on the cloud. As a sysadmin you may not get input into this part or that said you might have a lot of input. At Lifex I work right next to the developers and there's a constant back and forth with them asking me what I think of particular things and vice versa. In particular, microservices right now are a very popular thing. When I first heard of microservices in my last job I almost blew my mind with the fact that I have to run 40 different services just for one tiny little website. But it is a design pattern that works very well with continuous delivery and if your company is practicing continuous delivery which has many benefits then chances are that's something you want to try. So when you're developing microservice try to keep as much state outside your application. Try to keep it in a database or something like that. Try not to make them too small. If your app only has one endpoint possibly it's a bit too small and don't make it too big otherwise it's not really a microservice. When you're deciding where to put the brakes between your applications try to think about where which bits you would replace. So if you have an authentication part that might be something that you'd migrate to a different way of doing things later like you might use social authentication later. So that sort of stuff is the sort of thing that you might want to factor out later. You want each bit to be independently deployable. If you have to deploy four things atomically things become a lot bigger problems than just deploying one thing independently. And think very carefully about circular information flow and dependencies. If you take down a service and it causes a circular dependency and the only way to get that service up again is to get another service that depends on it up and you're going to have a really bad day. So for all the things I talk about I've got the Gartner hype cycle and where I think that thing is on it. I think microservices are well and truly into the just before the pletio productivity I think we're starting to use them a lot more now and I think they're actually showing a lot of benefits. So the next thing you've built all your applications you have to package them up. Packaging basically means that your dependencies are either included with the application or specified in such a way that they can easily be installed with the application. You want your deployment or your packaging technique to be small and preferably cashable so that you don't have to fetch out every single little thing. You might want it to support multiple versions on the same machine which some packaging machine mechanisms don't support. And preferably you want to use the same package packaging mechanism in your dev environment as you do in your production environment. So the way we do that is we use Docker. Docker is basically a whole bunch of file systems layered on top of each other and then a bunch of technologies to run that as a container. Because it runs as a container it isolates your applications from each other and you can run a local Docker registry to increase the speed of loading those files. And the devs can run it in development on their own laptops and you can run it in your staging environment and all the way to prod. And if you use the same Docker images then there's less changing between each space. Which means you get less of, it worked on my laptop. I don't know why it doesn't work in prod. And the other thing, Docker has a miniscule performance degradation compared to spinning out a new VM for each microservice. That said, Docker is probably just past its peak on the way back down. I think it'll probably, I don't know about the future of Docker. There's a lot of other alternatives spinning up right now. But I think containers are a go. I don't know whether Docker or not will make it. It'll be interesting to see what happens in this space. Third step, deployment. It needs to be as fast as possible. If you've just deployed some buggy code you want that code out as fast as possible. Then whether that means you deploy a new version without the old code or your rollback, it basically has to be fast. And the faster it is the more often you can do deployments. You can deploy once every minute, once every hour, something like that. You want your deployments to be minimal human interaction. The more time someone spends deploying something, the more time they start hating their job. Because it's always the same thing over and over again. If there's minimal interaction then there's less reason to write release notes and less reason to have change advisory boards and less reasons to have all the other process that comes with that. And the other thing is if it's fast you recover from failures really fast, preferably. So the way we do that at Lifex is we use our mesos and marathon. Mesos is basically like a game of Tetris. You have a cluster of machines. Mesos gets a whole lot of jobs from other things and decides where to put them based on where they fit. Marathon is one of the clients of mesos. It basically coordinates long-running jobs and telling mesos, hey, I have this long-running job. I need you to run it for me somewhere. And occasionally mesos will say, hey, this job just died. And then marathon will go, well, I need you to spin it up somewhere else. Specified to marathon as JSON applications. So basically the deployment procedure for a marathon Docker app is basically to push the Docker app to your registry and then submit a JSON file to marathon. And that's all you have to do. It will go handle the deployment, switching everything over and around. And it also handles, as I said, task failures. If something dies in mesos, marathon will tell mesos, hey, go start another one. And marathon now has the ability to do health checks. So even if the task hasn't died, you can detect that it's not working correctly and kill it and start a new one. That said, mesos and marathon are fairly new. It's unlikely that you'll, I believe that mesos and marathon and other coordination sort of technologies around containers are coming very fast. And I think you'll be hearing about them a lot very soon. But they might not be for everyone just yet. For extra credit, everybody has those tasks that have to run like once every day or once every week or once every night or once every minute. And generally you want to schedule those sort of things. Cron works, but it's not really HA. And there are some HA crons, but they're a bit complex to install and set up. And if you've already have a mesos cluster, you can use something called cronos. And cronos basically is another client for your mesos cluster. When the schedule comes up, it basically submits the task to mesos. Mesos picks somewhere to run it. And then your task runs. It can also, if the task failed, it can restart it for you and keep trying to run it until it succeeds. It uses ISO 8601 intervals to specify schedules. So you don't have to remember which star does the date and which star does the month and what you have to divide by to get particular things. And that way you can also use your spare capacity. So your cluster might have a bunch of spare capacity. You can run extra tasks in that time. It also handles job dependencies. So if you have a task that has to run and then when that succeeds, you need another task to run. And then when that succeeds, you need these five tasks to run. And if two of them succeed, then you need another one. You can basically specify all those. And it does record the statistics on the time that it took to run particular jobs. That said, cronos is even newer. It comes from Airbnb. The code is not perfect, but it's getting there. We've had a bunch of problems with it, but we're still using it and it's still working the majority of the time. It's just something to watch. I wouldn't suggest not using it, but I would suggest being careful, probably having some backup mechanism. So in summary, this is what it looks like to run a cluster. Those two boxes, the Mesos Master and the Mesos Slave, you'll run multiple of. So you want an odd number of masters, three, five, seven. And you want as many slaves as you need capacity for. But basically you have Zookeeper up the top, deciding which master gets to play the game of Tetris. You have Mesos Master talking to the Mesos Slaves and deciding how to play the game of Tetris and what moves to make. You have Marathon and Chronos telling them what pieces are coming and how big they are and what shape they are. And then the Mesos Slave uses Docker to launch containers, although Mesos Slave can run stuff in C groups instead, just tasks directly and using C groups to limit their RAM and how much sleep you share they have. So in theory, I have a demo. On my machine, you can find it at this repo. I've got a Vagrant config and some Ansible configs to spin up a tiny little Mesos cluster. And we can see here, this is our Mesos UI. If we have a look, we've got two frameworks running on it. We've got Chronos and we've got Marathon. And if we look here, we have two slaves connected. I've just named them by IP address because I didn't want to spin up a DNS server as well. We can see they have one CPU each, 254 meg of RAM and three gigs of disk. This is Marathon and normally you could click Create Up and go create one here. But instead, I've automated it so that less goes wrong. Alice. So I've put the data in task data, Marathon. And so this is a, why did no one tell me I was making a typo? So this is probably the simplest thing I could come up with. Basically, it's just launching a busy box container and it's running date over and over again every 10 seconds. What you do now is make a call request to submit that to the Mesos cluster and it would start it. But I'm out of time, unfortunately. If you want to demo, are you happy to come up to me later or the GitHub URL I posted earlier? Thank you. Can the next speaker come up and give you a few questions? Yep. He sets up. Any questions? Hi, I was just wondering how you do service discovery. So you've got, you know, 40 containers running. How do they know about each other? Yep, so in the Marathon repository, it comes with an example script that basically reads the data from Marathon about where everything is running and builds a HAProxy config and then applies that on all of your slaves. So essentially, you configure all your services to talk to local host and then HAProxy sends them to the right host and load balances them between all the other instances that are running. Is that like HTTP or is that... So we don't actually run that. We run a custom version that also does HTTP as well and it's not too hard to extrapolate from the script you get. Cool. By the way, I've got Lifex bulbs to give away, so come down later and grab one. One more question? One more question. I have one more bulb to give away. Got one yesterday. What are the similarities between that architecture and what Google are doing with Kubernetes? With Kubernetes? Yeah. It looks very similar, actually. The two are kind of covering the same ground. It'll be interesting to see what happens with both projects. Cool. Okay, thank you. Thank you.