 Now we have the mass deployment buff with Thomas Langer of University of Cologne. Yeah, hello everybody. Hello at the TVs and on IRC. So what I like to do is talk or discuss about mass deployment. There was a talk before which was a special case about deployment of embedded devices. And some things about me, I'm a developer since over 10 years. I also doing the fully automatic installation tool since more than 11 years. But that's not the main topic today. The topic is what are the general problems during mass installation and maybe not now especially for the embedded devices but for deploying servers or laptops or mainly desktops. So what are the problems there? Are there still or which tools are available there? Which tools are you using and do you have any experiences with those tools? Do we have Debian packages for those tools? Or which problems arise when you scale to a very large number of machines? So this is what I like to discuss with you. There are some keywords that are often used when doing mass installation. It's sometimes called provisioning or Linux rollout software deployment. So if you look for some tools that are doing installation or deployment. There's also the software deployment topic. We had also hardware inventory that is often part of the installation. What we will not cover here is the monitoring. So there are some products or some tools which can install your machine, update your machine and also monitor your machine which can also have the user database within a certain database. Also the life cycle management of your hardware started from buying a machine, getting all the serial numbers and whatever things like asset management. I think I will not cover this topics. I'm mainly interested in the topics, the initial installation, the update of the systems. What about different boot media with mass installation? You normally do network installation but is it also possible or how easy is it possible to turn this network installation into a bootable media so you can install sites that are not online at a certain moment with CD, DVD or USB stick? Can you use the same installation method or deployment thing also for creating change routes or for setting up your virtual machines or building a live image or a bootable ISO image? Then configuration management. This is a special part of the installation and sometimes people mix those topics installation and configuration management together. We can also have a little discussion about is it the same or which are the difference between those disaster recovery like something crashed, just reinstall the machine. I'm very interested or what I'm using in Fi is also that I have sort of automatic documentation so if I done a roll out I can have also the documentation which machine were installed in which way. What about supporting different Linux distributions or even this different OSS? Is this important for you or can you say I only like to deploy Debian machines? And sometimes people say yeah, I have a tool which I can use for deploying desktops but I do not deploy servers or I cannot deploy notebooks with it. So these are some topics that I like to discuss. So any questions or any comments? Who is maybe the first question? Who ever did an sort of automatic installation of one or 10 or 100 machines? So I would say yeah more than half of the thing. So did you had any problems or what tools were you using? So maybe someone can just tell us what he's using now, what is he missing? So two people are using Fi. So for our 300 server setup we had like a customized promo network boot and the bootstrap from there and then using Huppet. So everyone could hear it? Network installation with the bootstrap and then customized some wrapper scripts around it I guess. That's what a lot of people are doing. Yeah, just starting from scratch with Pixi network boot, doing the bootstrap and that's it. Question for me, how do you do the partitioning of the hard disk? Yeah SF disk. We actually deployed just a single partitioning, standard size on all machines and everything beyond, so we installed Routofs there. Everything beyond that was managed from Huppet with various commands. Yeah, Huppet is a good keyword. Huppet is the configuration management part. So for me that's a different part. That's a customization of the tools. So who else did some installation and wants to tell us which tools all, which kind of scripts did you wrote? We used to have Red Hat, right now it's CentOS. It is kind of two-step installation, like there is an installation CD ROM. The first install is like doing partitions and installing Windows for Toolboot. And then in the second step you choose the Linux installation setup and it basically gets IP address via DHCP and then it does an R sync. Yeah, that's basically it. So yeah. So you are not using the kickstart method from Red Hat and CentOS? No, no. But this setup is going to die anywhere and that's why I'm here. Okay. I work for university and at the start of every semester we roll out a new image to every PC. That means the desktop machine is in the labs. It's about 150 machines. And we just set up a network bootable, pre-executable system with network booting, which boots very basic Debootstrap system for an installation. And we do the tweaking of one machine because these are uniform machines, more or less. Well, in Linux it's a bit easier because X can auto-configure itself for the hardware and we make some tweaking on top of that. But mostly it's just Netcat and TAR for making the partitions and populating the files and the end setting up the bootloader. Okay. Yeah. Okay. Well, for Windows we use the NTFS clone package in a similar way. We set up, we make the three things on one machine and populate these to the others. And then we join the domain and that's all. So for Linux you're also using kind of image that you put on the... Yeah. The init script is the one which does the installation and the init script is a special script which fetches these images and formats the hard disk and partitions the hard disk with SF disk. So one question for me, how do you create this image manually or do you have some scripts that automatically creates your image that you then deploy to your desktops? I boot Greml CD on one of these machines or this kind of environment with the network boot and there is a special mode which interrupts the automatic installation mode and this way I can upload the image. I just open a netcat on one of the ports and do it manually or one command. Okay. So but preparing the images is mostly automatic and or is there much manual work you have to do before you can create the image? Mostly I just do an update and this upgrade on the machine. Okay. Yeah. So one important point for me that I think that doing deployment with this images, sometimes it called you have a master or a golden image. This does not scale in my opinion because often this or mostly every time people are creating this images manually. So even if you think, oh, I have all my 100 computers are the same and need the same software, you create, you do a manual installation of one machine and doing some customization on it, then creating a table or some other image. But this does not scale because there will be times where say, oh, we have different hardware or another department needs some other software and then you will create the second, the third and so on. You have a lot of images that you always create manually. So I would suppose try to, if you're really need images that you want to deploy, I think it's a very bad thing. But then try to optimize this part that your images are created automatically. That's very important. There are three or four flavors of these things. But this mostly means if in one lab there is a special NVIDIA card in the machine, the installation process knows that this is the special lab and installs one depth package and copies over one or two configuration files. But yeah, it's not that easy, but... Yeah. Philipp Hahn for Univention. We have a Debian-based distribution, our own distribution. And basically we are doing profiles. We have our own install program, which gets a profile with information about partitioning. Then installs a base system and later on we have a LDAP-based management system where you can configure on which computer you want, which software installed and it regularly pulls from the LDAP or software to install and which software to purge and to update. So this is that you have your configuration management database in LDAP and get all the information from LDAP and then have some script that customizes your machines. And you can also hook in custom shell scripts if you need real special things. And you can also prepare in CD or DVD with a profile included so you can do installations via DVD or also via PXE. So for customization you only need shell scripts or do you use something like CF Engine Puppet or...? Mostly shell scripts and we have an extension which is called something like registry system where you can set kind of variables which will trigger an update process which will modify the configuration files on the system. Yeah. Here's a list of some tools and the third line, the CF Engine Puppet Chef BCF G2 these are more tools that are only for the configuration management and you really have to distinguish between deployment or installation and configuration management. You cannot use Puppet or CF Engine for installing a machine. You can only run CF Engine Puppet and the other tools if you have a running Linux system on this. So what's FI doing and some other tools, you can take a computer with only zeros on its hard disk and then do the network boot, partition the disk and install the system. Often people are using then something like CF Engine Puppet for the customization part but CF Engine Puppet and the other tools are no installation tools. These are only for configuration management. Can you put the mic a little bit more? You have two kind of things where I work. We have desktop for the university, about 100 desktops and they are installed with an image because they have two windows and Linux. The image are made manually and in the end it's created on the reference machine and the image is propagated for all the labs using a proprietary software that can replicate windows and Linux. In case of Linux, there are configurations. I do the best to fetch all the configurations in some different package personally so what in the end just need to edit some, configure some files and my package try to read the personality of the computer, try to configure everything. The hardware is similar but not equal. So if X can detect all the hardware, sometimes there are some bugs. You are there so when machine boots, I try to detect model of the computer by the graphic cards and have a program switch conf that change the X files configuration whatever it needs about the hardware for the best thing. This is done on the boot so the image can be the same for all the computers what is different inside there and inside on the boots. This idea take to the servers. Servers have a script, a manual script to follow so where it says how much partition I have in the machine, size of it and I use meta package for the role so this is a server. I have a group of software to install. This is a firewall. I sell more software and I use the same kind of in the end I just edit some source refiles, configure something manually and in the end I have a machine installed and pre-configured what is the base for all the servers that I have to manage. So you have a lot of configuration information inside Debian packages and if you make changes to your configuration information you need to create a new Debian package, deploy it and update it. I know a lot of people doing this kind of thing. I would not like to do this because there is an additional step. You cannot just change the configuration files and deploy them but you have to create the Debian package which contains the new versions of the configuration files and I think it's one step less if you do not put them all in a package but distribute them in another way. That's also possible and that's what we are doing in Fi. Problem I see is if I distribute another way I will have to have a server for distributing that files and my way, the only thing that is central is the local repository so everyone that wants to manage server just have to fetch what software needs. But also if you add some packages you also need a server where you fetch them from so I think that's no difference. Yeah but I think once you have those things figured out once you have centralized your infrastructure to actually provide central updates and installations then putting configurations into a package becomes natural and that's the easiest thing you can do in your info you just roll out a package everywhere. So some more opinions about putting configuration data into packages? Okay. I think yeah, I would not do it. Here's a short list of image-based tools. I already said I'm really not a friend of those things. One other problem I see if you just write a simple shell script and do some SFDisk things that you will have problems in the future because SFDisk only supports MS-DOS disk labels so if you have really huge disks this will not work. The other part is if you like to have software rate or LVM configuration does anyone does this with some sort of automation or scripts or is then the normal installation or I'm booting an installation system and then I have a shell where I do then CFDisk and my normal commands and then maybe some automatic part runs afterwards? Yeah so certainly at work they use it to manage their desktops for all the developers and essentially they use as of kindly some work that Phil Hans did the auto installer magic combined with some preceding and the automatic mode and that really does so essentially the IT department when I first got there were entirely Windows users so we basically gave them a Debian CD told them to go to automatic mode, hit enter on a network and it just installs a system to them and it puts all the developers tools and the tool chains they need and puts everything under LVM and the rest of it so that does work quite well for them but we do put all the configuration still in one package itself as well. Okay so does anybody think that the network may be bottlenecked during deployment of desktops? Do we need multicast deployments or? I thought Clown Zillern uses multicast to speed up installation. Yeah maybe it uses this but yeah image based distribution is I think it's not good even if you use multicast and I think people have to pay attention that multicast is not that easy as using just the normal network for example I heard people were using multicast a multicast protocol but they had really dumb network switches that yeah could not really do multicast but sent the packages one after another to every port so I always say keep it stupid and simple and I know a lot of really big or huge installations which has a good network and they never accounted network problems it's always the question for examples in Fi we're using NFS during the installation and people always say oh you will have network problems because NFS does not scale but we will never have problems we only use it read only without Fi locking and so on and it always shows us that yeah it's working so don't be afraid that you will have network problems So if you're going to redeploy and enter a classroom in a school environment you usually do not have expensive network hardware or expensive servers and there you really need to use multicast or broadcast some legacy software support set you know to not saturate the server uplink basically I yeah if you saturate the network uplink this is in my opinion maybe perfect because then the server will use all its bandwidth to give the packages to the clients so in my opinion this is no problem and if you do for example if you have computer classes and needs to reinstall after each class 20 or 50 machines I would say just by a 10 euro network card that is only used for this one and during the installation this network card of the server can be saturated why not? Yeah but it's still a time problem then you know when I was doing this it was common to have 100 megabit switches and then you only have so much bandwidth and so much time so it was a requirement basically this probably has changed but not so other questions what could be a problem or what do you wish from a perfect installation deployment tool? I suppose just going back to the multicast question the other thing we what we actually produced we produced IPTV set top boxes and so when we're deploying say a million of these Linux based boxes then you need to use multicast but one of the things I would say unless you're dealing with something on that scale and even when you are in fact then unicast scales a lot better than multicast unless you really really know what you're doing you end up with all sorts of interesting bugs to do with and if you don't know what an IGMP query or election is then you really don't want to go anywhere near multicast because it will create all sorts of problems even in some of our other products so for the large more traditional distribution based ones we're still just using HTTP and that does work really quite well multicast will cause a lot of pain unless it's a very specific instance that you're after and I also think if you're doing the deployment the huge amount of data that is transferred via the network are the packages everything else it's just peanuts so it's very easy to set up a second or maybe ten of HTTP demons on different machines which just provides the packages and that scales very well so if you're doing a larger installation just try it oh if one server is not enough maybe I just use three or four servers for the packages yeah comment on this proxy caching proxies I would just go with maybe an other mirror of the packages that's maybe easier than that apt cache and app proxy for example I know people that are using caching like this have sometimes problems but yeah if you say a simple HTTP proxy and you know how to configure it and it works yeah fine just do this before starting unicast anycast or some very complicated but cool network protocols torrent or maybe if we have apt torrent working just out of the box yeah I will also give it a try we use multicast for distributing the image for the desktops and the bottleneck is the quality of the switch when there are new switch the server are dedicated their fast gigabit for the VLAN where are the clients and you can distribute the image of 80 gigabytes to 3 hours if you go to place of the building with low switch and the time goes to 10 hours or worse than that or even instead of using multicast you may need to switch to unicast so our problem is not software software is good enough the multicast team on Debian for the routers is good enough problem is the switch switch needs to understand multicast protocols yeah okay I think the network topic is we know how to solve this once so are there other things I like to switch back to my foils and what about creating virtual machines change route environments are there people that also have some sort of scripts or automatic building things or do you use the same tool for installing your desktops and your virtual machines and creating your change routes or do you say oh building a change route is just calling the bootstrap and then jump into the change route and yeah call my lovely editor what about this one tool for all of them change route virtual machines for development we have created virtual machines using our regular installer but we save these regular installations and clone images from that because it's just fire and forgot clone a new image started it's running in a couple of seconds you do your work and if you have done you're just deleted so for software development using disk based images is really nice but only if you create your disk based image we are an automatic installation tool not if you do this manually yeah so all other people are just doing manual installation of their virtual machines and then copying the disk images does anybody has some sort of partly automation it's a slightly different emphasis because we're actually doing embedded devices but we're still doing a mass deployment with an automated installation mechanism but it's actually a little bit louder we're actually doing it on an embedded basis so we're actually doing it as a binary flow into the land so we're trying to prepare the image and sending the image directly onto the flash so it's a different kind of mechanism but it's still a mass deployment on that basis but on the server side we generally do it by hand for the actual two routes and things that we've just set up well one question I have is when doing an installation with embedded devices I think it took a long time if you just would do the normal installation and not an image based thing yeah so we do the majority of the actual preparation of it on an AMD64 or something with a big fast hard drive and then there is a slow stage where you've actually got a configure on device but then you do that once you put the image on the server and then as you go through production it's just a lump of binary and you just flow it straight onto the device so the bandwidth of the local drive is not that bad but if you would do the normal installation the slow CPU would be a problem if you just would call dd package or apt-get for every package it's not just the actual CPU the CPU isn't that bad it's right into the to the nan storage that's what takes the time reading from the storage is fast right into the storage takes a long time if you've got several hundred megabytes to actually write individually in the way the d-package does it which is to unpack it, check it, move it around, delete it then it's just going to take forever what we are doing in FIS when we call the d-bootstrap part we also need this and before installing additional package we put a RAM disk onto a wildlife d-package and this speeds up the installation a lot because during the installation a lot of temporary files are created there and if you just put everything it's for a desktop machine we need about 90 megabytes of RAM disk if you put everything into a RAM disk then install additional packages and in the end put the content of the RAM disk back to disk this speeds up a lot we haven't got that much RAM it's an immediate device but maybe you could swap the other thing is that you don't necessarily need there is no swap so you don't need to use something like d-bootstrap which is focused on preparing at your root on a system other than Debian and making sure it's only really the pure base system if you use other tools to create your root FS then you can actually put all the packages you need for your base platform onto the first run of the root on the fast hard drive and then bring that out and you've only got a handful of packages to install on top so we use that with mDebian and with multistrap and we do the whole thing in one operation it creates the device nodes as well if you need them to I think embedded devices are really a special case where you cannot use the normal deployment tools except for I create an image and I try to put the image onto the local disk so other questions what I like to show you is a list of users that have been using Fi as I already said we are doing this since more than 10 years and what's very nice is that we have a lot of experiences with very different hardware architectures there were even some installations with Itanium PowerPC in the past maybe it's a little bit too small we also have people that are using Fi for installing Debian machines on IBM mainframes so I'm a little bit proud that during this time we have proved that Fi is very flexible even on very different or complicated hardware architectures and that it could also be used in very different environments like just a lab that I want to install or for example in the second line the L for M insurance it's a German insurance they deploy it I think it's not 10 maybe 12,000 machines spread all over Germany and they have very low bandwidth so they would like to have something like apt delta or d-package delta that they only could transfer very limited packages to their mostly notebooks I think during some time and if they have all packages on their local machine then doing the reboot A question from IRC is is it possible to automatically handle LVM partitioning when doing deployments like this can it be preceded or handled through another mechanism for example a given logical volume takes 50% of the disk space Yeah I'm not sure if DI can do this what I only can say that doing the partitioning with DI preceding is in my opinion horrible because it's not that readable it's very hard to partitioning the things and if you have very complicated partitioning setups I think this will also not work what I can show you is what we are using in Fi it's like an extended FS tab so I'm pretty sure everyone that takes a look at it then can just write down such a configuration and we've written a tool in Perl which parses this input and then creates part at commands which then creates the partitioning file systems LVM software rate and so on so this tool is also available as a Debian package and you do not need to use the whole Fi environment so if you like to just use this tool for partitioning that's available and very nice to use there's a question from Peter if it is available and very nice why don't you make a UDEP for it if it is available and very nice why don't you make a UDEP so it can be used from Debian installer so the question was if this tool is available why do not we have a UDEP what's this right so I'm I'm not sure is Perl available during DI installations okay we need Perl and part it so but there was this question if it's possible also to say try to get 50% of the whole disk for partitioning this is what we can do with our tool you can have a minimum and maximum size and sure it's also possible to define a percentage of the whole disk there five minutes left so other commands what do we need or do we have all the perfect installation tools that solves all our problems I was everyone happy with DI with the new installations so if there are no more questions then thank you very much for the discussion and if anybody likes to have more information about PHY I have also some flyers here or you can just ask me later thank you