 So welcome to the both Grecian Management Outlooks of Ferdinand Bortis and please, if you want to discuss, please come here because if you see the hole is full, so I would be nice that I don't need to run around. So if you want to discuss, please come here. Thank you very much. Okay, welcome. So we're not very few people, many people, so let's just talk about it for a few minutes and maybe enjoy the key-tiny. So how many of you use some kind of configuration management system, puppet, chef, something like that? Okay, and what do you use? Let's just pop it. Okay, everyone use this puppet? Anyone else using chef or something else? Okay, so I've opened a Gobi document if you want to take notes there. CFGM and GMT. Okay, so how big is your puppet manifest? In my case it's about 10 to 12 machines, I'm entirely sure, something like that. In my case it's about 10 to 12 machines, I'm not entirely sure, but it's better. Okay, from my side I use puppet both at work at the Wikimedia Foundation and at the DSA. I usually manage a lot of machines. At Wikimedia we have around 800 servers, all running puppet, and all of them managing exclusively the configuration of servers. So there's a different part. Many people use puppet to manage all aspects of the system, so it can be reinstalled and with about run everything, while others just do the basic configuration and then do manual things. And both of those sites have pros and cons. You were saying, because I was opening the document here, what we use is mainly, we have several servers that are used for doing computations and they are mostly used for, well, mostly completely configured with puppet. There are some others that are file servers that are just a few minor things, they're not entirely done there. Okay, anyone else? Does that mean that at the Wikimedia Foundation it's completely stateless what's on the machine itself? When we want to do an upgrade we just PXC boot the machine and then provision it. We're doing everything from scratch. So a challenge has been what to do with private data. So, for example, we roll out certificates using puppets as well. So the github was public, so we also have a private repo that's being mixed into the same puppet manifest as to be able to include private stuff. How do you actually limit it so that each client can only retrieve what is not at all? It's a typical server setup, so we don't do local puppet. Many people do local puppet, they check out the github and they puppet apply locally instead of running a puppet master because the puppet master doesn't really scale up. And what's your solution to this? That it doesn't scale? Yeah, many people do gith checkouts on all of the servers and then run puppet locally. The other thing that doesn't scale up is external resources. So basically everything that touches the database. If you don't use external resources much then kind of scale it up. So we have a single machine that Wikimedia, for example, has a powerful master that handles all servers. Okay, what are the challenges that you're facing? Well, it's a mix of several things. We have WM machines and Red Hat machines and CentOS machines and several versions. And getting that into sync is the most difficult issue that I have right now. But all of that seems to just work for me. Anything with that rent of other tools? I mean, puppet was there pretty early. Do you know if there's an advantage of using Chef? I didn't get that. Okay, sorry. There's not only puppet, there's also Chef. Puppet was there pretty early. So most people, I guess, pick puppet on the grounds of being early. Do you know if there are advantages using Chef? I know that Chef has a syntax in Ruby, but it's still a good thing for stacks running on Ruby and Rails and so on. Nowadays, puppet supports that. You can write manifest in Ruby. You just trade out manifest that ends in .rb instead of .pp and you can use a special syntax and write something in Ruby. I think Chef scales better, but I'm not really sure. I've never used it. This is what I've heard. Puppet has a very good collection of ready modules that you could reuse. I'm not doing that a lot. Many people do. They even have up-get-like tools where you can do puppet module, install something and fetch it from the internet and execute it, which is a bit scary. I'm not sure if you're reusing existing modules or writing your own. Yeah, I mostly do that. I have written a few modules myself, mainly modules that would detect which local hardware we're running, which particular server and then, depending on that, pull in the right extra thing so we could install the local hardware support packages. But other than that, I don't really use modules. We just use a plain... Plain manifest. Sorry? Plain manifest. It's me. Personally, I copy terribly from DSA's puppet. From DSA's puppet repository. The only thing that's a bit annoying with modules, I think, is that if you declare a package that in the whole chain for one machine, it may only appear once. So that requires touching the modules, introducing another layer of abstractions so that you don't double-include packages, which is a bit silly if you're setting it to the same value anyway. There's a solution to that, but it's not very clean. You can do, if not defined package, then... Define the package, basically. Thank you. But that's not very clean. But this could work on such scenarios when you have multiple things trying to do the same thing. But it gets repeated after a while. You can do it if not defined for everything, basically. So you get a hold. What was your expectation from the audience? Talking about problems and learning about solutions, basically. There's not a very large audience. Do you use anything else than puppet? I'm collective. Some remote execution stuff, like fabric, salt, all that. Maybe we want to sit here. I'm going to sit next to you. So we do use shared resources with the database setup. It's a bit of work, but it's fairly useful once it's worked. Use what? So the distributed resources thing, where resources are stored in the database. It is useful in that it allows me to generate, what's it called, Nugil's config files. So I can just say if this server has that, then define that shared Nugil's thing and automatically Nugil's config file is modified and Nugil's restored. So it's already there. I don't have to do much. So the Nugilator stuff in puppet, this is how it works. It's really, really stupidly implemented. Yeah, but it works. Up to like 50 machines, maybe, something like that. So the problem is that instead of trying to instantiate them in a file, they're having a single file and then every time you define one such resource, it tries to scan the whole file to find if it's there and if it isn't to add it. When you have a large file of services, for example, that takes a... It does have the ability to specify in which file to do that. Yes, but then some things don't work. In my case, it's not an issue because, like I said, I only have 12 machines or something. So there's something in the puppet tree, not in Debian, which is called NagGen. It's a gross hack. It doesn't work as it is. I've modified it to make it work. So it's in Wikimedia's repo. It's a thing that connects to the database and dumps all the resources and then creates them on the file as a file. It's a gross hack. Yeah, that's what I was about to say. But when you have, like, thousands of resources of services, like Puppet would take around an hour to run the complete cycle, an hour of, like, 100% CPU nowadays, too. That's pretty bad. Yeah. And of course, running it once wouldn't even be sufficient, right? Or do you have it so that everything will work on the first try? So, yeah. Why wouldn't it work on the first try? I think I added that it starts some information on the puppet master and only retrieved it when it comes back. I mean, it's obviously the case for SSH keys for which I also use it. But so in the first run, it starts it in the database and only in the second run, it really fetches it again. Yeah. Things like that happens for a bit, yeah. So, yeah. And the Nugget thing, I had it with my previous job as well, which was much smaller than Wikimedia, and it was still, like, couldn't scale. It had, like, 4,000 services, and it was taking, like, half an hour or something like that. Okay. Anyone want to do a subject on the table? Do you know if the scaling issues are worked on upstream? I think upstream is working on it. I think they're selling a product called Papa Enterprise, which is supposed to scale. Well, if you use the standard puppet master, you're running some special default built-in web server thing, but you can also go to some other solutions there which supposedly would scale better. I don't know how much of this is true. I've never tried it. But one of the options is that you run Ruby in an Apache module. Oh, yeah. Yeah. I don't know if that helps. Yeah. It helps a lot. Or mongrels. But the default web installation really doesn't scale more than to find two machines at most. Passengers works better. Yeah. It's much more efficient. And multiple mongrels, like, I used to run, like, eight mongrels at one point, and then having engine X, load balas across them, this also kind of scales. What they're working on upstream is, they're working on a replacement for the database stuff. So nowadays, now they have this thing that connects with Rails to a database, and it didn't scale, so they added a Q layer of PuppetQD to Q updates, and this didn't scale either. So now they're written in alternatives to PuppetDB, which is a Java thing that uses HHQLDB or something. No, I think it's in Clojure, actually. Written in Clojure, I think. So it's pretty bad. With the JVM. I don't know. So this is the biggest bottleneck, in my opinion, like the database. That's why many people don't use it. So this is Guy on IRC saying, and also in Coby, that there's something called Ansible, which... I've never heard of it. I've never heard of it myself. But it would be faster to get started, although it's not as mature as Puppet, as he wrote in Coby. And that's what he said on IRC. I don't know. Maybe we should ask him. This is going to be weird. I don't know about it, but it might be useful to mention if it's indeed useful. I don't know. Apparently it's based on lots of small YAML files. Isn't that what Puppet does as well? Puppet uses YAML and PSON and whatever. I've heard of another alternative. We're evaluating if we're going to use it. It's called Salt. Salt, S-A-L-T. SaltStack.org. So this is a mix of configuration management and remote execution. Remote execution is the concept of being able to do DSH, basically, or something like that. Push something, but it's not a state thing that you keep in configuration management. And we're planning to use it for that, for the remote execution part, because the alternative is M-collective, for example. Puppet has this single M-collective, and there are other similar things. SaltStack is written in Python, and it looks quite good, but I haven't tried it yet. And the problem is that if you try to use the configuration management capabilities, then you lose all... You have to write everything, basically, or you have in Puppet. Apparently there's somebody who wants to ask questions. And asking to ask questions is not the most efficient way. Yeah. There's also your rail for this Ansible thing. Ansible.github.com. So... Okay. We're talking about our setups, problems, solutions, and so on. Feel free to join if you want anything else. Should we wait for Lucy's? Should we wait for Lucy's to ask? For the benefit of the people who watch this later. Are there any project-wide plans to integrate configuration management into packaging? For example, I think that it would be very useful if certain packages provided algeas lenses to make it possible to automate the configuration. That could be useful, I guess. That could be useful, yes. I'm not aware of any project-wide plans. I don't think there are any, but... I don't think so either. Well, apart from what's in algeas itself, but there is this config model thing that the other guy has been working on. What's his name again? Maybe it would be useful to see what we can... I'm not sure if it would be better to provide them in their packages themselves or to put them in the core algeas package. That would be an interesting divide, maybe. Because algeas is currently trying to... It's working like having a single tree of multiple lenses and tries to do everything. But yeah, that would be very, very useful. I think that there's no project-wide plan as far as I know, but the only way for it to happen is if someone starts opening by the board. That's what I was about to say. I think it could be useful to have some organized work on that area. But on the other hand, I think with the config model stuff, there already is organized work in structured configuration, final modifications, slash parsing. So what could be useful is some way of using the config model data and then focusing more on the config model thing, because that's already going and seems to have quite some... I haven't looked much at config model other than some blog posts of Redum Planet. If it's mature enough and it supports all things, then maybe we should focus instead of integrating into Puppet instead of OGS. OGS is not very pleasant to work with anyway, so maybe the solution is some kind of interface between the two. So I don't know if it's on the copy document, but you mentioned some kind of Puppet module repository. Is there also some kind of package manager? So there are multiple repositories of Puppet modules. There's PuppetForge. There's another repository that the rise-up folks have made. Micah is involved in that as well. But I think the upstream is trying to create one. I think they're deprecating PuppetForge because they're their own and they have some Puppet module thing. Let me find it, which uses a canonical location to pull stuff. Let me find it. I think the forge.puppetlab.com is the old one. I think. So there's another question on IRC, whether Debian actually uses configuration management. The answer to that is yes. We have Puppet server. Is there something Puppet Debian Org? I'm not sure. The Puppet tree is a mirror of the Git. Puppet tree is on Git Debian Org. Anyone can clone it, see what we're doing, and provide patches as well. That's what I thought. So there was a Puppet module tool, which was merged as a Puppet face on Puppet 2712. So with newer Puppet, you can do Puppet-module, search, install, and so on, apparently. Very nice. Puppet space module. And I'm not sure where it's finding its modules, but it does. So it searches for the Puppetlab.com. I've never used it myself. I've just seen some modules. Possibly the question is also if it supports Debian. If it's what? It supports Debian. And it's file layout. I would expect it. I expect various stuff. Anything else? It should be closed here. Thanks, everyone, and thanks to the people joining us mostly. Question. Mostly scalability issues, I'd say. The question is, what are the problems faced by Debian since it means one using Puppet? I'd say mostly scalability issues, and the other problem is Puppet's weird configuration syntax thing. So it takes a while for you to get into the Puppet mindset and try to create stunts that are meant to be what needs to be in a machine rather than what to do. One more thing. There's some, well, recommendation that you use multiple branches and one for testing and one for production and stuff. Is this something people actually do? Or is it, I mean, I do have a branch and a virtual machine that is used for testing, but I don't usually use that. Usually it goes on the live production immediately unless it's something really, that's really changing a lot of my infrastructure. But do you actually use it? So there's a thing called environment in Puppet. Yeah, that's what I mean. Which has some problems in its implementation. One of them is that it doesn't work if you don't use modules. It doesn't work properly. It kind of works, but it has multiple bugs that are known by upstream and upstream just switch to modules already. If you use modules, then it works. I've tried it. I know that also some other people are using it, like Roth Solberg. And I know that people do something else, which is really cool, that they have some post-merge hooks that map each git branch into an environment. So you basically can create a new branch, do some work, push it, and then have it as a different environment so you can test on one or two or whatever clients you want. And if it works, then merge into master. This is an interesting way to work, I think, because basically, feature branch is for infrastructure. We don't use it with media because with media doesn't have modules yet, unfortunately. In DSA, we have a staging branch. Does it work well? Does it work well for DSA? Okay. So you can actually select it on the client on each run? You can set puppet D or agent, whatever you want to call it, dash dash environment, whatever. There is some weird interaction with external resources. They're being tagged differently in the database. So there's another field that indicates the environment, but then if you are realizing external resources from somewhere else, and it's not in the proper environment, you won't get nothing. So it might destroy your naive configuration, for example. Okay. Thank you all.