 So good afternoon, everybody. Thanks for being here. This talk is titled Running and Small to Mid-sized Enterprise on Debian. But because I'm notoriously bad at naming things, I would add a subtitle like Maraging Debian Across the Whole Fleet. So just a couple of words about the presentation. It's the success story kind of thing. It's mostly what we do at my day job. So I'll make a short introduction, and then I will try to cover aspects of installing, managing configuration, managing packages, and getting your people involved. So just a quick introduction. I'm Apollon, also known as Apikos in Debian. My day job is Head of Infrastructure at Scrooge's GR, which is what we'll be seeing in a minute. I've been a relatively long time Linux user and Debian admin since more than 10 years. I started contributing to Debian a while back, and I became a DD finally two and a half years ago. So nowadays, I do mostly packaging work, and most of my packages are server-oriented. But I also act as a local DSA contact for Debian machines hosted at Gernet. So Debian Across the Fleet, a success story. The company is called Scrooge. That's a funny spelling, but that's how most Greeks would actually spell Scrooge using the Latin alphabet. So it's pretty intuitive when you're Greek. It's basically a product search and comparison engine, or search engine for products available online. And by many rankings, it's the most visited Greek web page right now. So we average about 600,000 visitors daily, and 5 and 1 half million unique visitors per month. That's almost half the country's population. And the whole company has about 150 employees all in Greece, which makes it actually a mid-sized enterprise, let's say. So our infrastructure currently consists of about 85 physical servers. And some of them run around 280 to 300 KVM virtual machines all managed by Gerneti. And they're all dispersed in three physical locations collocated in different data centers. We strive to have a redundant infrastructure in many high availability features. And what's actually the thing with small and mid-sized enterprises is that you can't have huge teams of people doing things. So currently, there is a team of four system administrators doing operations and infrastructure work, and the plus one more doing office IT support. So what do we use Debian for? The answer is almost everything that can run it. We run our production servers on it. We run our routers using Debian. Many developers run Debian on their workstations or laptops. And we use Debian for the majority of the non-technical stuff workstations as well. And this also includes some Raspberry Pi that we have connected to televisions, like for dashboard displays. And I'm probably forgetting a couple more things. But I mean, it's almost everything, basically everything. And as far as I know, we just don't run Debian on our switches yet. So when it comes to servers, we're running a pretty modern full HTTP stack, including HAProxy, Varnish, Nginx, and so on. So our main application is a Rails application running on top of Unicorn. We use, as I said before, we use Ganetti for virtualization cluster management for our KVM virtual machines. And apart from the website, which is our main service, we also run a full-scale supporting core infrastructure, including things like DNS, email, LDAP, radius, all kinds of monitoring, which quickly adds up to a big number of machines. Everything is managed using Puppet. And we use Debian packages for practically everything, apart from the web application itself, which is deployed using Capistrano. So when it comes to routers, now this is something I personally don't see very often. But in our case, our core routers are really pairs of redundant 1U servers with a bunch of gigabit interfaces. So we use a bird-routing daemon for BGP and OSPF, BGP with Rapson peers and IBGP internally, plus OSPF for our own routing domain. On the client side, we ensure things are running smoothly using KIPA LiveD, which does VRRP. So the gateway IP addresses in our network are all floating between two machines. These machines also do stateful dual stack firewall on the border of the infrastructure. So it's both IPv4 and IPv6 using Firm, which readily offers this capability of writing dual stack rules. And we also use contract D to replicate the firewall state between the machines. So if the active router actually goes down, then the backup replica can continue serving already established connections without an issue. So the busiest of those machines routes about 1 gigabit per second of traffic. The network's not very big, but we still have three distinct locations, three data centers, five different uplinks from two different apps and providers, plus one internet exchange that we're connected to. So it's really nice to have your router behave like the rest of your infrastructure and be able to manage it using Puppet and even have your things like your BGP peer data in Hayera. Plus when it's exactly the same as the machine you're working on, you can get rid of SNMP and do things like monitoring like a human should do. So using actual scripts to pick whatever data you want out of the system. Now, as I said before, there's also a number of workstations that are using WNR company. So we have different uses, and we have also both technical and non-technical users. So technical users that is engineers and developers usually get a laptop with full disk encryption pre-installed with Debian, and that's it, they can do their own support from then on. For non-technical users, we mostly use desktop computers, and we do manage them using Puppet as well. They all run GNOME as the desktop environment, but we have gone along to adding G-conf and D-conf settings to Puppet so that we can ensure uniformity across all these fleet of desktops. Now, let's go on to bootstrapping. How do we actually start a machine's lifecycle in a company? We use Debian in-storey preceding actually across the fleet. So there are a number of ways we use to get Debian installer running, for servers and workstations that's mostly PXE booting of a network. Laptops nowadays don't have Ethernet cards, so we boot them over USB, and we also use the Debian installer to install virtual machines, which I'm going to say a bit more about later. So the aim here is to have completely unattended installation for most classes of systems and just bring the systems to a point where they can run Puppet. We don't want to answer the Debian installer's prompts. And we have succeeded so far. I mean, we wrote the first preceding file back in the Squeezy rifle, if I recall correctly, and it has been running smooth ever since. The only, let's say, bad part is that apartment recipes could be a lot better. I don't know how many of you have ever tried writing a full precede configuration. It's just probably the hardest point is to get partitioning right. So one might ask, why do we use Debian installer for virtual machines? There are tons of images, and yeah, actually we were running using full image ourselves before. It's just that full images need to be kept up to date, and that's additional work. So you have point releases, you have security updates. Somebody has to actually go out and update that image so that it's the latest and greatest. It's not fun to get your virtual machine up and running only to discover that it needs another 500 megabytes of updates and another reboot. Another factor is that care must be taken to actually strip sensitive data out of those images. SSH keys, some randomization using UUIDs and so on. So I heard a story from our next colleague about their own infrastructure that at some point they found out that the whole fleet of virtual machines simply had the same ECDSA host key everywhere. Because they had a script in place to just strip the host keys out of the master image, but it was only RSA and DSA I were. So with Jesse, when we started having ECDSA keys as well, the script did what it should, but it didn't strip the ECDSA key. So for us, using the Debian installer just solves all of the above. So we threw together a GANETIC operating system provider that's actually a set of scripts that GANETIC calls to provision a newly installed virtual machine with an operating system, which all it does is that it puts an ephemeral KVM instance surrounding the Debian installer with the preceding configuration, actually with the URL to the preceding configuration. And then it captures and logs all the output and will abort if prompt appears because that's supposed to be non-interactive. There is no way in GANETIC that you can interact with the installer on operating system installation time. Using a local apt cache and using tricks such as write back caching to speed up things a bit. The installation time has come down to approximately two minutes per instance, which is something we are willing to pay. We're not a public cloud. We're just creating a couple of virtual machines a day. And after finishing, everything is fresh. Everything is the latest and greatest and no need for additional reboots. So this is something that I also intend to package for Debian. Obviously it is a Debian package already, but I need to strip all site-specific stuff and make sure that it can play without expecting the preceding file to be in a globally reachable URL of some sort. So you should be able to feed it with a configuration file and that's it. So once this is done, I intend to upload it to Debian as well. Now after everything is installed, we obviously need to continue managing it. Like many others, we are using Puppet, which is one of the most popular solution these days, but it could have been Chef or Ansible or name whatever else. So a configuration management system nowadays is essential for maintaining anything more than a bunch of machines, but people tend to abuse it. So for instance, for me, configuration management must augment the package manager and not override or replace it. The fact that you can ship arbitrary files to the systems doesn't mean that you actually should do it. So what can you do to write puppet manifest play nice with Debian? In our case, we follow a simple set of rules that seems to make sense with Debian and our modules play well enough. So the first one is drop configuration files in configuration directory, is it possible? So there is no reason to go manage ETC app sources list, just drop a file with your own repositories and sources list D, since the distribution actually provides a way to do this. The second is in order to allow local administrators of each machine, which may not always be the ones that control puppet, that we create some exclusively managed snippet directories wherever supported. For instance, our R-Syslog setup is like this. We have ETCR-Syslog D, which is managed by Debian end users. And then we have ETCR-Syslog Puppet D, which is exclusively managed by Puppet. Anything going there, which Puppet knows nothing about will be removed. The same goes for firewall rules using firm and a couple of other places. The third guideline is just don't ship whole configuration files. If the changes needed to Debian defaults are relatively few, you can use things like your guest to modify defaults. This will make these upgrades actually very easy. Plus, you will be able to have your module functioning, especially during times of transition from one stable release to the next one, even if the configuration file has changed. You don't want to go through a disk upgrade and a three-way merge, only to have Puppet replace it with a previous version again on the next run. So just change only the minimum things that you need to change. And the fourth is use the package provided facilities like the package divert or start override to play nice with the package. Just don't enforce permissions and content on operating system managed files. Divert them or use start override to set the permissions. On the Debian side, what I find, what I call Puppet friendly packaging are packages which just basically, they provide configuration in a way that is easy to manage using Puppet. What it all comes down to is using include configuration from directors by default, and if possible, splitting out the same defaults from sample values. You want Debian specific defaults to be left untouched which leads to easier and safer upgrades while giving the admin or user the ability to override only sample values in a different file. So after having managed, I'd say, yeah, a lot of systems using Puppet and Debian, and I've been thinking of whether Debian Puppet module would actually be of some use. So the standard Puppet types just manage users and files and execute commands, and yeah, they also manage services and a couple other things, but that's pretty much it. It's enough to do almost anything, but still you do need to write boilerplate code in some cases. For example, when you ship or modify a system D unit these days, you must trigger system CTL demon reload. This is something that almost everybody wishing to ship system D units on the Debian system or any other system for that matter should do. And also we don't make much use of Debian's tools like the package divert or start override. So at least in the scope of the Puppet group, we could provide a batteries included Debian Puppet module that would make the life of Debian's admins easier and expose things like app sources management or multi-architectures or alternatives or the package divert and this list can go on for a long way. So this is open to further discussion and I intend to at least start a discussion propose creating such a module at some point. The other thing with configuration management is the question whether we have two or three rows in the end. So both the file system hierarchy standard and the confile handling right now basically assume two rows. One is the role of the distribution and the other one is the role of the local system administrator. So the question is should we assume that there is also a third one which is a config management system or let's say site-wide defaults. In a sense that the configuration management system should be able to override the distribution but then you still get the local admin of the machine who should be able to override the CMS. So currently we have USR local but I mean where should we drop files using Puppet under USR or USR local? Where should we place system D units using Puppet under ETC or under slash lead or a third location? So that's also open to debate I think. Now moving on from managing configuration to managing packages. As I said we're using mostly that's probably more than 99% Debian packages from Debian stable as they are and a few from back ports. And for the rest 1% it's either not in Debian it's too old in Debian or it is site-specific and not worth including in Debian. For the 99% we use a squid deproxy as a cache so that we don't hammer the local mirrors and for the 1% we have a local repository using Reprepro. Of course we try to minimize the delta by contributing wherever possible but still there's always a set of packages that we have to maintain outside Debian. So unlike the Debian archive we need multiple versions of the same package for each distribution. Some examples include mostly clustered services or databases where you want to run one version and one cluster and another version and another cluster for some reason. So there you go MongoDB elastic search I'm sure there are a couple of them more but I'm forgetting right now. And another difference is that we also need some thin partial distributions for certain needs. For example we are rebuilding Ruby and Libcurl against OpenSSL 102 because 101 has broken alternate path checking and it turns out there are some cross-signed root CAs out there that break the chains when tried to verify it with 101. And the other case is for NGINX and HAPROXY running on our front-end servers where they have to be rebuilt against 102 to get ALPN support for HTTP too. So we don't want to rebuild everything against OpenSSL 102. We don't want to ship OpenSSL 102 to every system so we have to create small partial distributions only for the nodes affected. We do this by doing heavy use of components. So we don't actually create distributions we have only two distributions and we create profile components mostly which are then tied to specific puppet classes. And we also use some app references magic to boost the preference of the profile components related to the rest. So when deploying a new package to production it's not like the usual Debian upload to unstable thing it's more like stable release management. So you don't want to deploy a new package or an updated package to the whole fleet you want to do that gradually and in a controlled way. So we use two main distributions we have Jesse Scrooge and Jesse Scrooge proposed updates inspired by stable proposed updates of course. So both distributions are configured on all machines. They have different app priorities that's 940 for Jesse Scrooge which is always preferred versus minus one for the proposed updates. So proposed updates must be installed from manually explicitly. And we also boost the profile packages by another 10 points over main. So all our packages enter the proposed updates and then we do test them, deploy them to certain systems by hand and after the quarantine period is over we just copy them using repreplocopy to the stable distribution. When it comes to building now these packages the thing is that they're too small and too few packages and only one architecture it doesn't really warrant setting up and build the infrastructure. So what we do is we run the builder on our workstations and in order to get things right and make builds as consistent as possible we have created our own wrappers around the builder. We actually use a couple of scripts to manage the trutes to create and keep them up to date and also ship some custom configuration and hooks that ensure that things built for a profile component will actually use the correct build dependencies. When you build Ruby against lib SSL 102 you want it to actually pick that dependency. So then we have a wrapper around the build which builds packages and enforces a correct distribution which is always proposed updates and also captures the component for which the package was built in an additional field in the changes file. We can then pick it up with a wrapper around the preprocessing coming and place it in the correct component of the repository. So deploying security updates. Security updates are hard actually. Keeping more than 300 machines up to date is difficult. For workstations there is the lovely unintended upgrades. It solves every problem because workstations are rebooted once a day and we don't really care. They don't run any services. It's perfectly fine. But servers are a different story. First of all we want gradual rollout. We don't want a regression to just kill all our machines instantly and we also don't want any unwanted service restart. So we can't rely on automatic installations. Currently we have a custom solution based on Puppet, Servermon and Redis. Servermon is actually a piece of free software that was written in my previous workplace. It's a dashboard that displays information Puppet knows about the hosted managers. So part of that is that after every Puppet run all available updates, all sort of package updates that are available on a given machine are posted back to Servermon. So there's a central database that knows which packages need an upgrade on which machine and vice versa. So this is all displayed on central dashboard with a handy nice padlock next to security updates. And then what we actually do is that we have a system to manually approve these updates. We call this staging. So we have a CLI tool which can filter on packages by name. It supports globbing. It can filter by hosts also. And by whether it's a security update or not. And what it actually does is that it just places a key in Redis and says, this host should get this and that package update. So on the next Puppet run, every staged update turns into an app to get installed dash dash no remove dash dash keep hold config blah blah blah for every package. And so this happens. I mean Puppet runs every 20 minutes in our infrastructure. So this means that if you globally whitelist an update for the whole infrastructure it will be gradually rolled out during the next 20 minutes. And once a package has been installed then Puppet reports back to the Puppet master and we have a report processor there that deletes successfully installed updates from Redis so they won't be retried or notifies us if app get installed exited with an on zero value for any reason. So this system has worked well enough. We still don't handle replaced libraries. We do this by hand. Although now with system D it's really easy to find out whether a given process ID belongs to a service. You don't need any heuristics to find if it's controlled by any script anymore. So this is something we will be working on in the future and I will also try to see if it makes sense and if I can strip all the site specific thing out of this and make some kind of standalone cluster security update manager let's say for Debian machines but it's still in early stages. I mean it works for us at 300 machines. Most of the time I would like to have the ability to automatically whitelist certain updates but it's still working progress. So for the last part people you won't essentially to get your sysadmins involved. Why? Because it's benefit both ways. It's for them. It's for the company and for Debian as well. The truth is there's still a relatively high barrier when it comes to contributing even for experienced sysadmins. Most people are reluctant to report bugs. So another thing is that build environments are still not revealed to set up and most people will just use the build when they first want to rebuild the package with not the best results and of course you can't rely on every sysadmin reading the Debian policy or the new maintenance guide especially when they are under work pressure. So the question is what can we do to lower that barrier? I mean as a lead or as a senior sysadmin you should just lead by example. You file bug reports but you file them yourself but at least you keep your sysadmins in the loop so that they see what's going on and you explain why you opted for that severity or why you used a specific tag or what policy issue this was about and get them to install things like how can I help? Well these are all trivial steps but they tend to help to get people on the right track. Things that we could do in Debian. I think most of the complaints I actually hear about are BTS related. The usual complaint is that the interface is ugly, it's inconvenient, blah blah blah. Personally I find it pretty convenient that I like the email interface but I can understand that many people are put off by it. So I think we should give some effort to get things on that front improved a bit at least in the search and the interface department and then you're still relying on ReportBug by default having a working MTA on the system and when you're a sysadmin, you will run ReportBug on the affected server which might be behind three lines of firewalls and even on a, I don't know, an air gap network so this really doesn't work. We have to find a way to make things easier in that front, adding an MTA less mode so I don't know, be that SMTP directly to Debian org servers might work or creating an alternate transport for bug reports might also work. These are just a couple of ideas and I mean, it's just something that we at some point should discuss. So I'm sure there's a lot we can do here. These are just a couple of suggestions so I'd be glad if anybody has any more or anybody would like to discuss after this presentation. So a couple of links. This is the server I was talking about and the other one is a link to Vincent Bernat's post about local corporate app repositories which was the basis for the design of our own repositories. So I guess we're a bit early. Yes, there's time for a couple of questions. Thank you very much. Any questions? What's your solution to avoid or start on package upgrades? To avoid or start? Service or start on package upgrades. Okay, so we don't have a single solution for that. What we basically do is that we do the updates to affected packages manually, completely manually as in SSH to the machine and install the package at a convenient time. This thing is getting a lot more difficult when you're dealing with clustered services. For example, when you have elastic search, you don't want, even if it was acceptable to restart elastic search on a single node, there are constraints between different nodes. So you can't go and restart two nodes on the cluster if you have only two replicas of each side because at some point you will have one chart that will be completely lost. So our solution for the time being is do this manually and we do have some policy RCD harness in place but not for upgrades for different kind of uses. But clustered services are really a problem in that respect. I mean, even if you solve it at the machine level, you then have to solve it at cluster level. So something like a hook in the packages transaction at the point where it should run the pre-installed post in script would make things a lot easier. So if you could actually have a policy layer that will say if it starts acceptable or not at this point would make things a lot easier. Other questions? It's not really a question but I wanted to point out that ReportBug already supports sending mail directly to a devian.oc server on the submission port and it offers this in the configuration at the end. But it's not used by default, right? It's not, well, ReportBug has no saying default configuration when you started the first time. It asks you a lot of silly questions and maybe it should be the default with that. Yeah, I mean, I first run ReportBug a long time ago and I have the same configuration ever since. Okay, thank you. Other question? No questions? Thank you very much. One more question. That was the last minute question. In your old Debian workstation environment, what do you use for internal communication, just IRC or things like that? Do you have something entirely free sorted that works across your organization? Well, we used to have IRC and then we had a Java server as well. Currently we're using Slack because that's more convenient for non-technical users. So, and there are also people with some people with Macs and some very few people with Windows workstations so that it was becoming too diverse and people were not accustomed to IRC. But we were using IRC for a long time. So you just got the IRC but with GIFs now. Are you paying any attention to unfixed security issues and how they might affect you? To, I'm sorry? Security issues that haven't been fixed yet. That haven't been fixed yet, so. There's a, like a JSON feed from security track at Debian. Yeah, I know. At some point I started writing an integration, a bridge between the security tracker and servermon so that you could have a list of which CVEs affect which machine. It didn't get far. I mean, go to a point where I got everything to a database and I could use it on my own. Then I had to do some real work. So, like every weekend project just fell behind but it's something that I really like to do. I mean, honestly we've solved things at the machine level pretty well with Debian. Now everything is expanding so fast that I think we have to provide tools to at least provide some deep insight into big clusters running Debian. And getting things integrated with security tracker or UDD is really, I mean, for me it's a way forward but this also needs work on the Debian side. I mean, right now security tracker just exports a huge JSON which is generated every time. There's no easy way as far as I know to query things like incrementally. But I think it's something worth investigating. So, yeah, we do check when we know that something is actually important. We do check the security tracker but it's not part of the day-to-day workflow right now. So, the day-to-day workflow is just updates that are already in security. Okay, thank you very much. Thank you.