 Okay Not for me there. Okay, that was really soft there. Yeah, and someone asked me earlier, if this is based on my experience with the school in Iberidiva, it's not really, it's mostly based on my experience from working for Norwegian Telecom and the universities in Oslo. Well, the university in Oslo, I'm from there. And the main topics of this talk will be a few stories and how things work when you have a lot of machines and a lot of users and have to take special considerations when you do things. And then a few tips for those who write software and then I go into more detail with multi-level configuration. And some people do not realize the difference between administrating one computer and a few computers and administrating a lot of them. And the least few things you can do with one or two computers, which is impossible to do when you have thousands. Once we screwed up at university, we managed to distribute the SSH D.com file for all the MacOS 10 boxes, approximately 700 boxes, which broke SSH. So the only way to get in and fix it has been broken. And how do you fix 800 machines? You do not walk around to each and every one of them and try to fix the configuration file like you would do if you only have one computer or 10 computers. It's not possible because 10% of them are in Kuala Lumpur on a research trip. Some of them are in a locked room where the owner of the key has taken a two-year research trip. It's not possible to only walk in to display all the machines and fix them. So we had to come up with a better way. And in this particular case, we thought about it for a while, a few weeks, we tested on a few users and came up with a transcript that was available on a bulletel report. And we made a one-liner and a web page that would inform the clueless users how to cut and paste, start the terminal, cut and paste, run the command of actually recover SSH so we could get in the fixed rest. And it's not an obvious solution for standard old machine system administrators. And this way we moved the workload from the few staff that were sitting centrally. We had two people administrating the standard Windows 10 boxes into each and every system administrator locally and the user himself. Which is kind of important when you're running on large installations. So, another common problem we have with crappy software isn't like locals threw up. It's more like a common problem with a lot of user applications in the free software world and in the commercial world as well. If you have a lot of users and a lot of home directory servers, the user will stay there for probably like six to ten years. They will move from one home directory to another during this period. And when you do this you tend to break GIMP, Mozilla, OpenOffice, Opera, I think KDE at some point will get confused. There's a lot of applications that I can believe that the home directory of a user will never move so we can actually record a path, an absolute path from the root instead of from the user's home directory. And, yeah, this is a really annoying problem. That's so easy to fix. Almost no free software has actually done this because the author has actually never realized that home directory is going to move. And then you have the other problem we have with lots of machines. We have approximately 900 units of machines and some of them run KDE, some of them run GNOME. And some of them don't use the same version of GNOME at KDE. And when a student goes to a student lab and logs into one version of KDE and then goes to another student lab in another version of KDE and then go back to the first one. This breaks because KDE as one prime example do not handle downgrades of configuration problems. It will automatically rewrite the configuration on upgrades and then go crazy on downgrades. And I suspect that a lot of people when I try to tell them this it's like, well, don't downgrade. It's like stick with one computer and stay there. It's not an option for us. We have too many pupils or two students, too many teachers and too many machines. And then we have the old binary problem. I like Solaris in this regard. They are really taking backward compatibility seriously. I can take a 15-year-old binary on Solaris and run it on the latest Solaris machines and it works. And this is actually important because when you have professors telling their students to write some piece of software and they take this piece of software to production and they start using it in the research group and then the student finishes. The master is over. His home directory is killed and the Solaris is lost forever. There is no way to recompile for the newer version of the operating system. The only option is a rewrite and the professors don't tend to prefer that option. So backwards compatibility on the binary is actually important. Not all the software can be rebuilt. And then you have the main reason why we don't use Linux as a file for our university. When you need to resize the database partition you don't really want to take down the application or throw out the users. Please, I don't want to send an email to 30,000 users telling them that I'm sorry during the last so the next hour you can't do everything because we are like resizing the file system. It doesn't work well in an environment where it's impossible to actually reach 30,000 people and make sure you get the message true. So we prefer Solaris HQ 264 I think have the feature to online resize or actually extend file system. That's the most important thing. And recently Red had got it for X3. I hope Daniel will get it for X3 as well. I know Red has had it but we don't trust it. I know X has had it but we don't trust it either. So we stick with the well proven solutions that actually work. And it's important to get the Linux into the enterprise. It's not production ready before you can do file system resizes or extensions without taking down the applications. And then another problem we have we have approximately 500 servers in our machine room and most of them have arrayed systems and arrayed is splendid if you have enough spare disks and get an application when one of the disks fails. If you are not told when a disk fails and the next disk fails and the next disk fails like it did when I was working at the Norwegian telecom media which was a piece of the enterprise in television telecom that was making from directors but anyway we had an arrayed system there old deck system which didn't have something as much raw. It was standing in the machine room and we were very rarely in the machine room and it had been standing there for I think six months flashing a red light before the last disk failed and the whole system went down in flames. So our arrayed system is not good and notifications when the system the disks fail and this is a really big problem with most arrayed systems on the Linux and the only reason I like soft arrayed because you can get information about when the disk fail. We have as I said 500 machines in the machine room approximately 300 of them are using arrayed system where we cannot get automatic information about the status of the arrayed. So we actually have operator spare time students pay to look after the machine room walk around every morning with red lights and then they send an email and we track the problem and we do have some tools to actually get information out of these arrays are particularly a good example of a crappy tool you can get status out of it if it's goes into interactive mode I want you to say yes and no and stuff so you can't actually script it and we want something that we can put in a script and send off to our monitoring system and all the arrayed solutions in Linux tend to put the information in different forms some of them are improved some of them don't some of them you need a binary tool which is very available in the version for the distribution you need and if you are lucky on a few different systems where the is improved the file format of the file is improved are completely different on all of them I would really like all the current developers making right drivers to agree on one common format and make sure the status information is available improved I don't think much I just either okay or something or but as long as I can get a message that's not something is wrong go look at the machine check out whatever is wrong there has to be really great for us I think we have like mega arrayed two previous versions had that support taken out in the latest reddit kernels and it's not in the mega arrayed two drivers in Debian I think so it's really painful with reds and then one of our favorites installing software when you need to answer questions or click through a GUI or switch CDs when installing software it becomes very painful and I think it was painful the last application you have to do this you have to run the GUI application on the machine and press yes or no and fill in the part correct time on different architectures it takes several days just to install an application that's just so amazingly painful that stupid and prone to errors it's impossible to make sure we get it right and we want to make sure that when we upgrade we do the same as last time so we get the same configuration and the only way to do it at the moment is document the process and try to make sure we actually get the steps right but putting the first between the documentation and the installation application tend to unknown errors and then you have all the program from the Norwegian Telecom it was a large enterprise with a lot of different companies and being organized together with a common network and the network and the administrator the complete network is necessarily agreeing with the network and the state from the local network and this system and the administrators and we needed to get some application working we needed a few ports open and they were already closed because some other application were using these ports for all things and they didn't want to expose them so if you have some application that can't be modified to use different ports depending on your needs you are going to screw in these kind of situations and then one of my favorites versus you in a script language or a programming language given three PHP applications which version of PHP should you use to get them all running with the same PHP implementation if you are lucky it's possible if you're not it's impossible to get them working on the same PHP after the same way take a lot of take a lot of applications are impossible to get running with the same take a lot of implementation and this makes it a lot harder to to administrate these applications because matter we have different versions of Tickle we have to secure three different versions of Tickle we have to install we have to upgrade it's very painful when you have to do this for a lot of applications each problem like this just accumulates until it's impossible to administrate a large network and then one small story from a program I tried to use a few years ago I clicked on the menu nothing happened I clicked again nothing happened and it was printing an error message said that it couldn't read the print cap file and it printed to a standard route this was a GUI application so I was waiting for the window and I didn't look into dot X errors of X session errors is it before a few hours later and I started to suspect that it was seriously wrong so if you have a GUI application provide the error messages as a dialog box or a pop up or something because the user is not going to check the standard I need selected the application on the menu and then on for some clues for the clue less I will go through these in some detail but I have only 15 minutes left so I'll do it pretty quickly three levels at least three levels of configuration files I'll cover that with new level configuration later on but it will help us a lot because we are configuring applications for the university and then provide some hooks for the local administration that they can all the sites like the department administrator at the mathematical science department they can do some overrides and then the local professor can do some other overrides and we don't want to step on each other too so it would be easier if we could have different files and the result compiling software never ever asked question during compile time we need to script it we need to script it so it's the same when we do security patches and updates and there are some factors that believe that there is some administrator that allows to sit down and answer 10,000 questions is this file going there do you actually have this library is it working this must be scriptable probably has it fairly well they do go into interactive mode but you can override it and go for non-interactive mode if you really want to have this question asked make sure you have overrides so you can get rid of them and it would be nice if you when you make the install target and you need some special privileges on the files to install then just split the installation task into two steps one to actually copy the files in there and the last one to fix the permissions when we install the university we compile everything as a non-privileged user and when they try to set permissions it doesn't work and normally these stupid applications will fail to install because they couldn't set root ownership on a directory or stuff like that please make it two steps so you can do one step as a normal compilation and one step with the sudo access and another thing make sure the software can be installed anywhere it's very painful to try to get something installed in trash local bin when the binary packet only accepts to be used in local bin it isn't mostly a problem for commercial software but it's sometimes also a problem for free software where you have to need program or change the source to be able to install it in some non-default location and I guess it's pretty obvious for you but please make the source available so we can actually fix things when they break and yeah we tend to have software for 10-20 years and the architecture we got done might not be the same that we need to run it down in the future and as I said earlier make paths into the home directory relative to the user home directory always even if the user specifies we've write it to be relative to the home directory because if not the application will break when the user moves and a few conflicting clues depend on as few libraries as possible because when you try to get a few packages or a package installed and it depends on like 100 libraries sometimes we just give up and go with the job but it's easier to install request tracker is a good example it depends on heaps and heaps of third modules and try to get them working in middle paths on a Solaris machine it's not really really fun and then of course to use as many libraries as possible because it's a lot easier to upgrade or fix a security bug in one library instead of fixing the bug in all the different implementations of the same feature but it's like think about the dependencies and think about the pain you add when you depend on a library and consider if it's worth it when you await it against the convenience of security updates and make sure libraries and programming languages are backwards compatible because it's very painful to have to maintain different versions on libraries to get different applications working and it's not too obvious to a lot of people they tend to think that our license is ready it's fun let's make our own but when you have to call in the legal departments and spend a few months talking to the administration to get them to accept a new kind of license it's not so fun anymore so stick with a well-known license because big enterprises do need to evaluate licenses and if it's like here's homemade license it's maybe not worth the pain and we'll just not use the source because we don't have the time to evaluate the license and when you write source make sure you stick with the standard and not use vendor-specific extensions to the language this is a big problem with free software and probably you don't realize it because ETC is a full of extensions that you don't have when you're using extensions and up until now it's been impossible or hard to get rid of the extensions and still have a working working compiler and working system and when we try to compile it on I think it's seven different compilers some compiler which is very good Ansys compiler the AX compiler which is a very good Ansys compiler the H2 compiler which is getting close to Ansys compiler 264 one is quite good as well and then GCC and normally GCC accepts everything and the other try to limit it to Ansys C so it's a pain to get things compiled on the compiler compiler versions or compiler implementations so if you can please use minus Ansys please use minus pedantic it might only be a warning in GCC but it might be a fatal error in other compilers pedantic does not do anything useful read the manual it used to now it will read the manual you mean the seven part of the manual where I can see the Ansys C GCC manual GCC-pedantic options not doing what you want we have a quality overhead that the pedantic option doesn't do what you want it doesn't do anything convenience it doesn't do anything useful please read the manual I guess it means yes oh well and then we have the write portable code please avoid iftas place some architecture because it's very confusing when you have a bug that's very long in architecture and it's not in the other architecture or we have some feature that works this way in other architecture and this way in the other architecture and you try to debug them and you don't really understand what's going on let the program do the same of all architectures let it fail the same way let it read the same files let it read the same or use the same applications try to be consistent across architectures and it's also very important to get software to work out of the box because when you have two thousand machines to administrate you don't have a lot of extra time to play around with software so if it works out of the box the process that we actually start using it is a lot better than if we have to scan for the week or to read the manual and take the configuration before it's just anything useful and also the box is very important to like see the result that this is going to be useful you can configure it to do exactly what you want when you actually understand that this software is something I need if you don't get to the point where you can see any results at all at least we don't have anything else because we don't have more than five minutes to evaluate this piece of software and resource leaks there's quite a few pieces of software leaking resources like mad so it's impossible to keep them running for a long period of time free software has gotten a bit better than Wagner did it seems like miracles to some pieces of software but still there are highly scripted leaks there are X resource leaks there are well memory leaks shared memory segment leaks there are a lot of leaks there are all these kind of resources and when you need to run the software for years and years at a time you can't just restart a no running server to fix the problem especially if the server is important for other services and when you need to interrupt a few thousand users just because some programmer fucked up his source you don't really want to use this software anymore I don't think that the most free software developers seem to get right at least when they're making demons is the use of syslog but we do have a few resources that believe it's not better to have their own log file stored away somewhere no one actually looks at it and this is the log file into our log server where we collect and process all the messages that happen and we have quite a few pieces of software common in free software as well that will crash and die with no traces whatsoever what happened and when you want to debug a system that just vanished you really want a few of why when you have a long running bind server actually but let's take a DNS server if you have a long running DNS server that sometimes once in a month just disappears and you have no way to actually attach anything to it to find out why because it's so busy that if you try to run it in a in a in a in a in a in a in a in a in a in a in a in a in a long file or tell us why crash have some say fault handlers or anything to give a clue of why I think crash because it's so painful to try to debug normally normally we don't have the time we just give up and restart every week kind of feature to the system but it's it's not the best way to keep the system at least and a few final clues Reuse configuration when possible if the information is already present on the system. Do not make your own copy of it One nice user application k touch which is a touch type Touch type kind of training system. It needs to Be told the keyboard layout to use when it displays the keyboard And this is not really new information to the system X knows the layout X keycaps can Obviously extracted from X. So so good k touch and it's like All kinds of system that we use information the mail domain. It's not like every program needs to have their own mail domain they can just fetch it from the systems mail domain or the IP address you actually need to tell you the IP address can just look at the interface All these things please reuse the information if possible and provide overrides sometimes the default is not good But most often the default is great And it is my second-to-last point on the clueless provide local admin hooks Local administrators want to configure a few things. They don't want to Configure only want the default because they want the software to be upgradable and they know that every time they configure something They will have to test it during upgrades So if you provide just a few hooks for the local admin is it but exactly what it needs to configure in one file that will be kept and Reused after upgrade Then you will make quite a few system understood what happened and reduce flexibility Have you seen the GM player or Any other applications with skins Would you like to tell your grandmother how to recover from a miss skin configuration over the phone? I would not and every time something is complex to explain over phone everything every time there is some flexibility in the system the support costs for the large enterprise increases You get more people calling in and you get to spend more time to explain them or actually first you find out try to find out What the problem is then you try to find out what it looks like on their screen And then you try to help them through the steps that is required to get back to the configuration They expected when they start the machine so flexibility is Nice in some ways and really painful in all the ways and for a bigger organization flexibility is really bad any questions And we go over to the Multi-level configuration This is Nothing new a lot of applications have this already, but it's a very nice way to provide local books Upgradability and Different levels of configuration for the site The long-lug administrator or the host administrator The point is that We want local changes local configuration to be kept during upgrades and we want to Keep the defaults when possible and we want to make sure that the different actors Wanting to have a say in the configuration and do so with their own file and hopefully also with fixed overrides Say a mail client you want the site's configuration to say this is your mail server This is the mail server everyone should use And then you may be one of the local departments to be able to say okay our users could use the local mail server instead And this is not very easy with a few mail clients today and you have that browsers maybe the Site that the enterprise is providing a proxy. You want to make sure all users use this proxy And maybe it should be optional maybe it should be enforced there is different ways of doing this and Providing with the level configuration to Make that possible. It's quite convenient And well the example here is read conflict from several files. I suggest that you user share through conflict site share through conflict. It is if you come to use wrong directory for conflict And then be fixed going back with the fixed files The point is that when you need to enforce something you put in the fixed file When you only want to give the user sensible fault you put in the Non-fixed files And this is the common thing in KD The top D directories give similar features. This makes it a lot easier for the rest of us trying to use this software to get it upgradeable and make sure it's Configures open box. We Distribute the configuration part to all machines For system that use more popular configuration it just works and Some software almost got it right You can provide default email configuration for K-mail I think But it's not that easy to put the username in the default configuration And you want to specify the username in the for example the Connection stream to the IMAP server. So make use of some variables with Well common user features user home directory username Maybe use a password Most of the stuff like that For like when you're writing some kind of a creation file, I mean love this Is there some kind of standard way to write a parser or does everybody have to write your own parser? People are variables institution that kind of thing. There are several standard parses Yeah, do you have any any suggestions no, I think that's fight against windmills I Just to make sure that I understand your last slide was a fix that fix configuration I mean this configuration is used if the user or something Kind of up in his home directory It would not work with a current installation or what is the sense that the last direction as users share full country fix Even a configuration option in the program It will be read first from the third file and then in the next file And if it is present in the second file, it will override the setting in the first one And then reads the next file and then you override it if it's present And it will continue like that and if it's present in the fix Which is after the user or the user configuration it will override the user's configuration Thank you. I'm sure the user have a web proxy set Then you put it in the fixed file because if the user tried to disable it in his own configuration file It doesn't take effect because he made the file you don't write it. It's a user intended to To have no proxy or whatever If you put in a fixed file that's the point If you put it in the fixed file, if you only want to make an option, you put it in the normal file And quite a few programs but it writes up until the fixed file where you can provide system default to several levels but you cannot enforce any of the settings I guess that's based on a political decision to do the software author want the system administrator to be able to override his users or not I think it's a useful feature at least for over enterprise You discussed basically technical issues The whole presentation was about technical issues. Do you feel that there's a need Especially from free software to More support in some way I don't know, maybe it's the regression test reports or anything like this Yes, there is Free software is getting better. This is already quite good that we already get Quite good support for some place of software But we could improve definitely And We need to Take into account the need of the enterprise Because it's quite different from the desktop users But I decided to focus on the software and writing pieces of the enterprise program So that's why I stayed away from those problems you mentioned there Do you think that DBN is actually enterprise trading? No, not even close. He asked Yeah, the DBN system is lacking quite a few of the features I need to be able to run DBN on all the Linux machines at the university But the biggest problem that I didn't mention is Financial business problem We had to choose a Linux version that was supported by Oracle And we only had two options at the time We had the Red Hat and SUSE We were already using Red Hat So we decided to not switch to SUSE because we didn't gain anything And then of course Red Hat has since then gotten the extended Resizing of artists and their feature And a few others making it more feasible to pass a file server But DBN is not as far as I know It doesn't have any of these features And would you say that the other DBN distributions are enterprise trading actually? For some things, for other things, no Red Hat is getting close to being useful as a file server It's already used quite a lot as a web server and database server It depends on your use, but you can't use Linux all over In the university at the moment to replace all the unique machines That's not possible There was a question over there Some questions First, before the water Is someone intending to do something for DBN to go into this market for enterprises? Because I think it's more a technical question It's more a financial question Because if people are buying the hardware from Dell or IBM or wherever They want to have the support And DBN is far away to give any support for those kind of persons or enterprises So DBN will never go into this market for enterprises I think that's quite wrong A typical drug company is providing support for Dell And I don't think Dell needs to provide enterprise support as such Because they have companies already doing it And we have quite a lot of businesses using them in production systems already The innovation university network organization is using them in almost all the network equipment And it's already in use and they get support if they want it They can buy support from a lot of companies So I don't think that's a really big issue I don't think Dell needs to provide enterprise support I don't think that DBN needs to provide a gap between the hardware servers And basically the enterprises who are using the hardware Because if you buy hardware from Dell and there is support from RedHack or SUSE I don't intend to buy from a third company another product To what I usually get almost free from the hardware servers So basically if Dell tells me I'm only supporting SUSE or RedHack What are the chances that I'm going to install DBN on the hardware? Well I think they're quite good if you want them there And they'll have a few... No, I can get any support from Dell I can get it from Samsung or IBM If I want to install Oracle for example I don't have an option except using Oracle Head or SUSE Oracle is a very good example that you can get support But for other things you can get support And we are... When we install, when we buy 200 machines from Dell We can decide what they should install on the machines So when you buy a lot of machines you can pick your own operating system Then you get support from somewhere else and support For the software support from Dell I wouldn't pay much for it Just a comment on software support One of the potential solutions for getting software support For more than just RedHack and SUSE Or some other of the high-profile customers Is the LSB And assuming we can get that to work That may end up being a solution for getting enterprise support on W Any other questions, comments? I guess we are running out of time Thank you for listening For your interesting talk I'm sure we can discuss a little bit later on Yeah You're still here So thanks for listening Just a comment This was my first talk on this topic So I'm just trying my slides to get a lot better Just provide comments and feedback I would really appreciate it Okay, now we have a 10 minute break We will be