 Hi I want to introduce you to our next speaker Jesus Clement and his co-worker Guido Tadar is also here. They're gonna talk about Debian and Google and yeah So welcome Can you all hear me? Okay, I If you need to interrupt me, please do but I would like you to keep the questions to towards the end So that we're gonna give like 10 15 minutes and we can talk about Like I can answer all the questions as best as I can So my name is Jesus Clement. I'm here with Guido Tadar We're both system engineers in in Google. He's based in Dublin. I'm based in New York And I'm key here to talk about a bit a bit about how we use Debian in Google First of all bit of background Why do we use free software first because we can we can modify we can change things we can Contribute back we can solve bugs anytime we can and anytime we have a little bug like for example the other day We got a DHCP problem that was trying to figure out What an IP address for the first in first ethernet the first ethernet was not initialized until well before The initialization of the kernel we just like went to the code fix the bug continue working Why Linux gene? Sorry new Linux? It's fast is reliable It's like under heavy development a lot of people are using it a lot of companies are contributing to it and It's just the best option that you can have when you have a few thousand machines out there running all the time And why Debian this is the question that I'm going to try to answer through the through the talk We used to use something else an ancient version of Debian called Red Hat no more we've move along and Basically most of the things that that we need to answer is like Why in the terms of like development and stability? the availability of the resources that we have Like management of the whole fleet of computers that we have and then maintenance and repairs and and how you do testing Most of the stuff that I'm going to talk about is based on one product that we ended up doing Actually, Guido is one of the main developers and most of the questions about that product will be headed to him But I'll talk first about development in terms of Debian Debian has a huge Strut like huge community of developers that are continuously contributing back it's an open project meaning that anybody with the wheel and and with the like technical knowledge can Contribute back to the project by becoming a devian developer by sending patches back It's it might not be easy to join as a devian developer But you can also do maintenance of packages without being a devian developer so for a company like Google is very easy to Engage in that development. It's very easy to hire new developers and in opposition to to other companies which like sending patches becomes like a difficulty you need to like Belong to a certain company or belong to a certain community Maybe and it's not such type of of enterprises like it's project based on well you probably know about this but It's a project based on like volunteer contributions and it has a lot of companies engaged to it so that you can like it's it's fairly easy for Google to like engage people into contributing and Take contributions from the project and then integrating into into our products Another thing is that it's a truly free environment We can get packages knowing that we can modify them and then we can contribute them back at the same time that we can like This is like the the eternal topic on like the price. It's like it's free as in like Be your turn. We don't have to pay money for it Even though you can buy services from other companies But it's also free again when you run like your services in like several thousand machines Paying licenses becomes a big burden on on top of your services There's nothing to say about the Capabilities of the of the people that work in Debian. They're like some of the best programmers in the world are working there The Debian policy it's one of the best Descriptions of how a package has to be maintained and has to be set up so that it conforms to the Debian policy and it becomes Like a package that you can install easily and quickly Without human intervention in most of the cases On top of that it has a very well-defined communication channel like we have the bts we have Mailing lists. It's very easy to to contact the maintainer by just opening the description of the package and reading who did the package Contact the person if you have a security bug you can actually send it to the security team without Exposing immediately a security problem that that the package might have so all those things make Debian as a very well-suited Product to be used internally because of like the flexibility that it gives To all the things that that we do in the company The second part of the talk I'm going to talk about mostly the use that we do of Gannetti Gannetti is a product we introduced a few years ago when we realized that Maintaining clusters of virtualized machines was becoming more and more difficult Due to the amount of of computing power that we were using Gannetti, it's a product that Removes the burden of maintaining a cluster to To a level that is like much higher so that you can tell Gannetti to switch off all your instances bring them back move them to another machine The system works in a way that a virtualized machine is just Instead of having a clusters of of machines that are working in parallel that are physical machines what we use It's a Shen or KVM to virtualize all our systems in a way that Switching on an offer machine it becomes an easy task by just telling the cluster master to switch off certain Instance of one of the service that you're running it's an open source product that anybody can collaborate with and In fact, probably he can tell you more about how you can collaborate and you can contact him if you want to have You want to be part of the project? It's hosted in code.google.com at the moment So it's very easy to download the source code find box codes like of course patches are very welcome and It's it was conceived deployed and Developed in Debian So it's very easy for us to just like create those packages and and start using them immediately after we We get a new package. We get a new version of the of the software So this brings me to Availability of of the the resources that we use in in in Google in in most of the cases we are like having a huge amount of computing power, but it's basically physical machines, so the use of kinetic allows us to Immediately and quickly bring up Instances so whenever you want to have a new DHCP server It's just go to an interface that we have created internally There is another group working on on an external version of Of the system that we use. What do you remember the name? the External version of Virgil Virgil there is Genetic web interface, which is basically an interface to manage genetic clusters. It's developed by the Oregon State open source lab and Basically it substitutes Some of our internal tools based on Gannetti Regarding Gannetti a few points that I wanted to make is that Basically, we're we strive to be one of the best behaved open source projects at Google. So we have both the corporate Policies for example, we do code reviews for all the code. We have an extensive Test suite and QA for all the product, but at the same time what we do we do all in the open So any patch any design contribution is discussed on an open mailing list that anybody can join So it's very easy for new people to contribute to our community like for example in Debian Which is not always true for open source product in businesses in general Thank you very much. So Can you hear me now? So he's asking Sorry, repeat the question again Project exists and is called Gannetti since Why would we change the name? one okay one thing that maybe is Lost or something is that we don't Sell or plan to give virtual machines out on this this is a product that we use to run our corporate infrastructure and You can download it and use it to run your own infrastructure, but it's not something we sell you or we offer you services on So that's that's a big distinction. So in that sense, no, it won't change name But it's not a Google mainstream product that you actually like buy from Google or use in the sense of Google apps or Google Docs, right? Oh Yeah, so the question is like if the Gannetti web interface will become part of the of the Gannetti package So far we don't have any plans for doing that Gannetti web interface. It's it's a different project that is developed in in sync with Gannetti But we haven't yet Had a discussion on if we are going to integrate it or not So basically using 180 web interface we have an internal version that it ties to to our Infrastructure because we have a different systems that have to be tied together So we have created an interface that you can basically Request at any time a new instance of their service and then a few minutes later It takes a bit longer because we we usually wipe the discs before we activate a an instance a few minutes later You have a working environment that you can start using Like immediately we combine several open-source projects like puppet we combined Gannetti and we combine The capabilities that the devian installer has with preceding to do everything very easy very quick And it's totally unattended. It's like if there is a problem during the during the installation You will get a pop-up in in in a ticket that says that your system couldn't be installed correctly But mostly it's it's completely unattended So requesting services is very easy and very quick once you have a need for for new resources Which is really really really important when you have again several thousand machines running in the background And then you need a thousand machines more to to install the new fastest caching system that that you want to develop and deploy in all the offices that you have around So it having such an interface have such a flexibility to request those services or to destroy those services once You have finished with your work. It's it's a very important thing to have Again, there is a Greek team, but it's also developing Different interface to integrate to interact with with Gannetti and to request services, that's another open-source project, but it's Actually, he can talk more about it raise your hand There you go so Another thing that is very important and and the Debian offers us it's the capabilities of managing the fleet It's it's it's inherent Feature of Debian to have Like again in Brought by the policy. It's very easy and very convenient To have packages installed so internally we actually use the same unstable Testing and stable tracks for all the packages that we manage internally So once a team has finished a release of a new package They have a building system that automatically brings the source code into into a package And then it gets distributed to all our repositories in in unstable Along the company from their upgrades are automatic in all the systems that track and stable and One of the good things about using Gannetti is that we can give administrator access to the users so basically you bring an instance up and then you can like a New team is creating this new product and then they have full access of the whole machine They can like change whatever they want and if they break it in non-repairable ways, they can just bring the instance down Brings the instance back and start from scratch. We use this even for for remote dex stops So all the all your home directory is basically on an NFS That is very close to the to the cluster where you're running your your instance So if you break your own machine, you don't have to call text up or you don't have to call Anyone to come and reinstall everything for you you just like go to web interface a destroy recreate and Immediately after you get a fresh installation that you can start breaking again One of the also one of the very very good things that we have again is I mentioned before puppet so To create a common Infrastructure across the whole fleet we use Puppet for distributing configuration files so that it sets up the machine like once after Preceding has finished and you have the operating system There's a cron job that immediately on the first boot runs puppet Preparing the machine for proper use like installing tools that are used internally for the collaborative development installing Source-controlled clients like gate like per force that you we use internally Like all the basic Kerberos configuration hooking up to your home directory and we use Modified version of LDAP to distribute groups so that you can actually like get full control of of who access your machine So you can create groups of users that that can help you to to work on on your pet project Maintenance and repairs are also based on Gannetti It's it's a very very well Tested feature right now to like to do live migration. So whenever you have one machine, but it's like The machines are replicating the hardest drive using the RBD So basically when you have a system running, you're actually saving your data in two different nodes of the same cluster So if you want to repair one node, you can do a live migration of one of the systems to a different system And then it continues working reading the data from the new system So you bring down the one system that you want repair or to bring to like change a memory module You run your service at the same time when their machine is being repaired and one that machines comes back You can re-sync the hardest drive to the old machine and then start like Re-initialize the connection from your instance to the old machine And then if you want to bring the machine back to that bring the instance back to the old node where he was working So it makes repairs and maintenance and like upgrades very very easy operation again this whole this This is one thing that that you could probably do with many other systems and many other distributions, but Debian provides the flexibility of The whole package being chest fit to do whatever we want to do in the right way So there's a question there Yeah, so While using the rdb what happens if a little machine fails because I don't know of a hardware problem while running Sorry, if I you said you are using the rdb So what happens if a machine is failing because of a hardware issue for example while it's running and Would you Consider switching to a project like remus that would wouldn't mind with such issues Right now it's that's question is something that we don't we can probably answer that later at the end of the Or you can answer now so mostly the Switching to another project wouldn't solve the fact that the virtual machine as far as virtual CPU and virtual memory Anyway runs on one physical node. So if that node goes down We can reboot at any time the virtual machine on the second node But it's basically a virtual machine crush what we do to alleviate that is we monitor the physical machines very closely And when we see that a physical machine for example has memory errors or has disks that are going to go bad because Have some back bad blocks. We have equated preventively to avoid the downtime But if the machine actually crushes any kind of distributed block device will not help us basically recovering The status that was in memory and there are Projects like camera or others that allow you to run the virtual machine on top of two physical nodes But we think that they are basically too expensive for them to be useful in production and we'd rather have machines that are expendable and have On the services a further layer of Availability so any service for example is load balanced and Multiplied available that way actually we've been requested to remove the RBD for even increased performance because some of the services Don't care at all that the machines go down and can easily fall back to other machines or to other data centers But prefer the more performance that they can have Yeah, that's one of the one of the things that we try to do in our internal services It's to like all the services that can be spent then we spend them in the sense that For example, most of our DNS data is stored in a database. So most of our DNS instance instances are Unique on on the hard disk drive. So whenever you have to remove that because like of a hardware failure then Immediately you can bring another DNS instance and then dump all the data and then start it again So that that's a problem that that we try to solve on the on the software layer by making our tools more resilient Instead of like trying to to make the hardware better In fact with like the the huge amount of machines that we have like we have repairs happening every single minute Like in a in a group of 10,000 machines one Six of the sorry six percent of those machines are having Usually some kind of issue and then we have like a lot of people working on on trying to repair those but it's not as Like having the possibility of bringing instances on machines that are healthy. It becomes a really really a Quick task that the don't require like people running around with segways as we used to have before And again like another thing that that allows allows us to do it's The the easy way of like bringing up for testing like a thousand machines But you need for for some kind of a product testing test everything you want there like put stress tests put Capacity test put OS constraints because in the moment that you build your instance You can say how much memory you're gonna have how much hard disk drive you're gonna have so you can see like for example LDAP used to be increasingly growing Tree that we had on our systems and we wanted to know exactly what was the limit on the memory That that an LDAP system could could work freely without having any constraints So it's really easy just to bring a machine with like say two gigabytes of rum Put all the LDAP tree there stress tested at the moment that it crashes then try with four tried with eight and And you have like a really easy way to select the the memory You can also go back to all the creative systems to see if those Operative systems still work in in a new platform So the moment that you get like the new set of brand new computers You can you have a service that only runs in a specific OS And then you can just like bring a couple of instances on that OS run Your your project on on that Hardware and see if it behaves correctly or it has some bugs introduced by new chipsets or by new Features on the on the processor and again like removing all that it's like an easy task of like just going to a web panel And then select all your machines Remove and then disappear providing that survey that that capacity back into the pool for everybody else to use and Like last thing but not least Most of our Debian based I mean we have Unfortunately, we have some products running in in Windows and and other things that I won't mention But most of our engineering workstations and laptops for internal use they work on on Debian Like Intel base. We also have Mac OS, but that's a little shame that I have Most of our corporate services LDAP DNS cashings web servers working internally they all work using Debian and Most of them work in Genesee. We have again like dedicated machines without any any instance and without any Clustering layers due to the the requirements like we need like really fast cashing systems in in our engineering offices so we put bare metal but it still is running Debian and Again like corporate infrastructure. We have a pile of data centers running Gannetti on Debian so that we can like provide services for new projects services for for engineering remote desktop and for like test of new products and I believe this is question time Could you clarify Hi Could you clarify what Debian based and which percentage of you have of Debian based systems and Debian systems? I've written the press about Ubuntu and Ubuntu, but I'm not sure and what How much Debian you use? Thanks. I wanted to ask the same question But basically with with what version of Debian do you use said? So, okay, we have various systems using different things Most of engineering's laptops and workstation actually use an internal flavor of Ubuntu Which is as you know a Debian fork, but it's not exactly Debian now for example for development of Gannetti developers mostly prefer Debian itself I Personally use wheezy we're thinking of migrating but and we are in the process of migrating the physical fleet To Debian and I think for now we're staying on Lenny But we want to move to squeeze later on for various internal reason related to the hardware We have to run it on so It's it's really different ones We have an OS team that manages the OS images internally So most of what they do is based on Ubuntu and they have a close relationship with Ubuntu people but what we do in the Gannetti visualization platform layer and on the nodes itself is At least at the development side and in the future probably also in the production side Based on Debian itself and the version of course is in flux most of what I can tell you is that even if it is going to be Lenny or squeeze it's going to be heavily modified with at least back ports and Some other internal things like the version of Gannetti is not going to be the one that is included in Lenny The same is for the RBD and the packages we care about will tailor it to the ones we tested and think work better together Any more questions. Yep. That's that microphone is for ambience noise You were just on the internet. Hello Marga by the way Hi, um, I just I think it would be useful to Can you stand up so I can oh, yeah, yeah, I think it would be useful to elaborate a bit on what made Debbie and more attractive than Retard enterprise so the fact is that made you choose to migrate Like as I said one of the the most most most Interest in important things and most interesting things was the fact that the community was open to to collaborate like from us and and like The pool of devian developers is much bigger than than redhead developers the People working for for Debian were eager to either like join Google and get hired by us and continue working in their projects One thing that that we all do in in our team is we contribute back to the netty in one way or another So every time that you see a release produced by widow or by another of our colleagues call use thing It's mostly the work of all the people that work internally one way or another like reporting bugs contributing code Another thing is the fact that In internal people that were already working for Google they could contribute back in several ways like through new maintainer process Like being Debian maintainers or becoming devian developers. We have actually a list of I think five or six people that started working for Google before they became devian developers and We have a list of 25 plus devian developers working So it's it's something that like if you want to collaborate with the project that you're actively using Debian was providing something that Like red hat when I joined wasn't another thing is the devian packaging The devian Deep HG so the devian package system. It's much more reliable than anything that we had before So it allowed us to do and and maintain and supervise upgrades Like most of the Mondays when I come to work my Debian Workstation says that oh by the way, we have installed this and this and this new package and you just have to reboot and then I get a little Sign in my message of the day saying that I have to reboot the machine because like new kernel packets had had been uploaded in in other systems that including Solaris and and like red hat and What not that Was a bigger problem. It was a bigger burden created on on on the maintenance of the fleet so that you want to maintain Like all the computers more or less the same that answer the question Okay Short he doesn't have a mic It's it okay. Yeah Has Google used some day CF engine because I heard that you also used CF engine Yes, we used to use CF engine before but we have moved to puppet. Why? CF engine wasn't providing the capability of centralizing the configuration across the whole fleet It was much more difficult to run on Windows and run on Solaris and run on Mac OS With with puppet we can for example change a DNS Server for a whole region in all the different flavors of operative systems just updating a puppet manifesto or a puppet Okay, and second question Is it right that Google developers normally develop on a virtual machine instead of their laptop? Sorry come again Google developers do they? Develop on a virtual machine hosted on the cluster or do they work on their laptop regularly the The situation right now is very varied like we have a lot of people working with their own workstation but that usually happens when when you are on a Non-engineering office is much easier for you to like develop locally than develop remotely That is changing all the time like some people move to remote Instance so that they can develop over there and whatever they use it's either a laptop or a or a dumb terminal So they connect to the to the instance and then they have the whole Like a big bunch of clustering machines that they can send their jobs to be Compiled in a distributed fashion and then they collect all that in in their In their instance if you do that if you do that in in places like Finland where we have a one or two engineers Whenever they compile locally they have to send either like compile everything locally or send everything over the wire to a To an engineering hub where it gets compiled and then it brings it back So it like all that transaction it becomes much slower. So usually it's it's those people are using Instances in a data center. Okay Sorry oh well, so in the pen and recently Google has gone public with the fact that you have been attacked apparently by hackers from China and Understandably you have probably done some things to react to that. So my question is what are you using for? best practices around security and specifically are you using se linux? If we're using what? security enhanced linux well one thing that we've definitely been using about security which is public is the Has authentication so this we've been also trying to push to our gmail users or to Corporate users that have Google apps and basically consists in the fact that you can have a not e p generator always with you To to authenticate to your gmail We have that we have some corporate politics policies regarding like systems you can use to access Google content or What kind of content you can access from where and things like that? I'm not of course going to discuss this in detail And I don't know if I can answer any One one answer that I can give you is that we use again as I was saying before we have an OS development team and In very close fashion with that OS development team. We have a group of security operators and they introduce Policies for our laptops so when you use a laptop and you want to connect to the Google infrastructure You have to use an approved laptop. I don't believe it uses the linux But I cannot answer further that question because I'm not in close Relationship with those people. I'd like to repeat I'd like to repeat the questions the first question actually about the Extent so that you use Debian within Google because I feel it wasn't fully answered Like about a rough estimate of the percentage of of the use of Google in the infrastructure there was Partly it was partly answered about personal machines Again like I cannot give you like detailed numbers because first I don't have them and and probably if I had them I would be breaking so many policies that they would fire me but What I can tell you is that it's a continuous flux like we started again like three three and a half years ago When I started we were like using actively red hat and it went through a flux of changes in the desktops We added go buntu, which is again like a Google specific version of Ubuntu Right now we have a big bunch of our fleet is based on Modified versions of that go buntu for servers but there are services that coming up that Require a specific version of Debian because they they want to have specific Set of tools that are only provided on certain Versions we have a development working like for example ganetti that is working in purely Debian And it's more and more moving to to work in in Debian. In fact there is one project that I finished a month ago and shamely I haven't published yet, which is a Like a USB stick that you can install like an image for a USB stick that you can install on a computer and every other Computer that is connected in in an internal network. You can make it again at the node immediately in like five minutes. It's all based on Debian and Like the Debian USB stick is Debian installer for Like modified to to provide all your services that you need so It depends on on the project mostly and and we basically don't have control We just provide through the ganetti interface all the operative systems that people can use Any more questions? Okay, one two I'm gone. Thank you very much