 So about me, I'm a systems engineer at Lifex. Lifex is a startup company in Melbourne that is building smart light bulbs, basically. We're building light bulbs, basically reinventing light bulb as a Wi-Fi controlled device rather than something that you have to go to a switch for. Basically, my job is to put the internet into the internet of things, essentially. So basically, we decided early on with our developers to make all the code that we wrote ourselves completely stateless, which was really handy because that made it really easy for me to dockerize. And then we started using tools like Mesos, which is basically a constant Tetris game trying to fit machines on your cluster. Zookeeper helps Mesos decide who's playing Tetris today. Marathon, which basically throws the pieces at Mesos. And Kronos, which is basically a replacement for Kron that runs alongside Marathon, talking to Mesos and running things as well. And because we have all these services we've wrote ourselves completely stateless, we run our databases outside the Mesos cluster, basically the same way you would any other cloud service. So that ends up looking like this. You can see Zookeeper at the top does the election. We have the Mesos masters, which Kronos and Marathon talk to. Now that big box there, you multiply that by the number of masters you need, which has to be an odd number so that you don't get a split grain. And so then that talks to the Mesos slave. And in our case, everything's dockerized. So the Mesos slave launches docker containers. And you can see those two containers down the bottom. And also you multiply that box down the bottom by as many machines as you need to run all your infrastructure. So we've had a bunch of problems with this, which is not surprising. Docker's security was one thing that concerned us at the start. I would put Docker's security somewhere between a trute and KVM, which is a fairly broad sort of area. That said, with recent kernels and recent versions of Docker, root inside the container doesn't equal root outside, unlike on a standard trute. And just one thing, running untrusted docker images on your machine is a bad idea. Building them is even worse. You should just review what you're doing whenever you're working with Docker, just because we don't know what the problems will be yet. So the ways we worked around this sort of problem, most of our apps don't run as root inside the container, which means that even if you did break out, you'd end up with a lot less privileges. We use a recent kernel because there's a lot of improvements to the way Docker runs thing in the recent kernels. And in each container, have as little as possible and make sure it talks to a minimal number of other services, which basically means that if one gets compromised because you've got a security flaw in your own application, then you can limit the damage that it can do. And if possible, use one statically compiled executable as we saw in the Docker talk before. And also run SE Linux on the host. SE Linux can restrict what Docker containers can do. SE Linux isn't namespaced yet, so it can't run inside the Docker container, but it can stop people from breaking out and going anywhere. The other problem, finding things. So because Mesos plays Tetris with our cluster and puts things wherever it feels like, wherever they fit, when you want to talk to a particular web service, you have to be able to find it first. Because everything keeps moving around, we have to be able to keep track of them. Now Marathon asks Mesos to place them, and so it knows where all your services end up. It's just a simple REST call to a JSON service, and it just spits back JSON. But the problem is you don't want to alter every single one of your applications, and if you're running third party things, you don't want to have to alter them in order to be able to find all your services. So what we did, so it's suggested in the Marathon repository there comes a suggested script that will basically take your Marathon state and push it into HA proxy. We written our own custom version of that because we wanted a few extra fancy things like HTTP routing, and essentially we run HA proxy on every slave. This means that when you want to talk to something, you just talk to local host. HA proxy picks that connection up, affords it wherever it needs to go. This works for us because we run Docker with host-based networking. You can also run Docker in bridge mode, in which case you talk to the local gateway, which ends up going to HA proxy and then routing to it. And yeah, we use a custom script that lets us put environment variables with host names, and then HA proxy uses HTTP routing and access lists to send those to the right machine. Another problem if you've got machines all over the place is collecting logs. Currently, Docker doesn't really have a logging solution. There's Docker tail, and you can follow that and pipe that to something, which is OK. You can mount devlog into a container, but if you restart syslog to make a change to where the logs go, then what happens is the first thing our syslog does is delete devlog and then recreates it, and the Docker container still has the old iNode mounted, and your logs go nowhere. MISOS also does a good job of collecting standard out and standard error for you in storing them, but there's not really an easy way to access it, and it doesn't store timestamps, so you don't know when a particular file came in. And to be able to correlate logs between lots of different systems, you really need timestamps on them. So the solution, we basically centralized all our logs by making assyslog log to localhost, but it logs to localhost, which goes to HA proxy, which then forwards it into our cluster. You have to, if you're going to do this, because the cluster won't be properly up as you start each machine, you want to configure assyslog to queue messages, but you don't want it to lock up if the queue is full, so you need it to drop messages. We found that most of the time the cluster starts up in time for no messages to be dropped, depending on how much disk space you've configured it to have. Yeah. You can mount devlog into the container. The problem is you just have to use systemd. Systemd creates the socket for assyslog, and it doesn't mean, and it means when you restart it, systemd leaves the same socket there with the same inode, and that forwards the log straight into assyslog. Then we run several marathon log stash tasks, which receive the events from assyslog, and they basically categorize them, perform any processing that we want, and then push them into Elasticsearch. There's a project called Elasticsearch MISOS, which will run Elasticsearch in your MISOS cluster and manage the number of nodes running and moving them around for you. You can also set up a few small engine X tasks running Kibana, and then, ta-da, you have fault tolerant, decentralized logs. So this is what it looks like eventually. You have all your containers mounting devlog, which then forwards to assyslog, which then forwards to a random log stash node, which then forwards to Elasticsearch, and then you can read them using Kibana. Another problem you'll have if you do this sort of thing is troubleshooting. It's handy to be able to jump on a box and strace a process, but if you don't know where the process is and if you can't get inside that container, that said, breaking into a container is a lot easier than breaking out because you have root access outside. So the way we do a lot of our debugging is you load up the marathon UI or use the REST API to find out where the Docker instance is running, and then you use Docker exec to launch a shell inside the container. Docker exec is in newer versions of Docker. Older versions you can use a tool called NSenter, which is a Go executable. Unfortunately, if you've done really small containers, we have a few Go binaries that are compiled completely statically and that's the only thing that runs inside the container. If you've done that, you won't be able to launch a shell inside your container because there's nothing else but your app. And you also won't have other tools in there like GDB. On the plus side, though, most debugging tools will work from outside. If you have Pprof for Go, you can expose a port on your applications and then just connect to that using Pprof and analyze your app as it's running. Java has a similar concept with JConsole and it uses JMX. And then the old favorite SysAdMintools, GDB and S-Trace can always be used from outside the container once you know the process ID of the job running inside. And that's about it. Any questions? Sweet, well timed. Yep, there's one up here. Just interested in the cost of one of these party lights you make. Not quite sure. There's another number but it's not coming in my head right now. I'll give you a free one for asking a question. Now we should get a second question really quickly, right? What sort of number of devices are we talking about here? The infrastructure's managed to... So the number of machines in our clusters? No, the number of light bulbs. Oh, okay. So when I was asked to build this, because the cloud architecture hasn't gone live yet, but when I was asked to build this, the number was 100,000 and scaling up to a million eventually. We don't run a Docker container for every image. We have a bunch of connections that just go to a couple of Docker containers and stuff. And what sort of logging volumes? What sort of logging data volumes are you expecting this architecture to handle? What sort of... Data volumes. I don't have the numbers on me right now. I can't remember them, sorry. You can have a light bulb as well. And I only have two, so... You mentioned that you didn't really have a good way to get the logs visible after Mesos took them out of the containers, if it was just tailing the standard in, standard out. Yes. Have you looked at something like Hecker from Mozilla and then perhaps putting it into something that's actually designed for that kind of log viewing like Android DB? Yeah. So if we do want to get the logs from standard out, our general approach is to run the logger command, which can basically just push lines to syslog, and then we just digest everything the same way. I have a question. Is there a more elegant way to do the log munging than sending each container to syslog on a base level and then running log stash on that? Isn't there a... There's... It seems like a little... Convoluted? Yeah. There's probably other ways. We've found this way to be really stable and really reliable. Once we started using SystemD for creating the socket. Okay. Yeah. I just wanted to answer that. Hecker actually has a few options for just grabbing every container on a host. Yeah. So if you wanted a more elegant option, you might want to have a look at Hecker because it does have a few options to deal with that. Yeah. And you can also run one of the Elasticsearch log collectors inside the containers if you want. But we want to try and keep our containers as small as possible. Any more questions? Going once, going twice. All right. Thank you, Daniel.