 All right, so thank you for coming. I'm going to talk about making Debian excellent in Google's cloud platform. So first, I'm going to go over the goals of the session to tell you what we're doing with Debian. So it's not everybody in Debian knows what Google's been doing with the Google Compute Engine and Debian. Give you the backstory of where it's been over the last year or so, year and a half, actually. Give you some of the open issues we've been working on. Share some of the things that have been going well, some of the things that have been going a little less well. To discuss some of the ideas that you might have or that others have proposed on how to solve the issues we're currently struggling with. Then we'd love to get more of you involved in packaging the Google software for Debian properly and also get you involved in building our images. So our marketing people would like me to use a lot more of their slide deck than I'm going to because this is Debian and this is not a marketing focused audience. So I'm not going to do a lot of marketing slides. I took one slide from the marketing deck and one nice picture at the end. So this is the Google Cloud Platform. It's a whole bunch of services you can use on demand. I'm not going to go through all of these. I'm just going to point out briefly that they run the gimmick from infrastructure as a service. So running a VM in the cloud all the way to App Engine, which is a sort of very managed platform as a service where you just give us your code and we deploy it for you. And a lot of other services in between from storing files to database queries of different types. So this is about Compute Engine, which is running a virtual machine in the cloud, renting on demand. So we very prominently use Debian as one of our major operating systems on Compute Engine. I have a couple of slides here with screenshots from our website. This one is our operating systems page. Yeah, it's cropped, but this is still the top of the operating systems list. There's other ones below it. And many of our documentation examples, I did not click on a tab on this document. This is how to use one of our features, a startup script feature. And if you go to the page, the first thing it tells you is how to do it on Debian. So it's definitely majorly featured. And I heard a stat from one of my colleagues earlier this conference that maybe about 85% of our users, or possibly larger, are using Debian on Compute Engine. So we build virtual machine images for Debian similarly to how you would do it for KVM, or Amazon EC2, or Windows Azure, et cetera. We use a tool that is on GitHub. We are not upstream. Some folks in the Debian community are upstream of this tool called Bootstrap VZ. So a bit of history here. Originally, Debian built Amazon EC2 images based on a tool called EC2 build Debian AMI, which was adapted from something called EC2 Ubuntu. We, at some point, this was written in shell, by the way, we contributed support for building Google Compute Engine images, and they renamed it Build Debian Cloud. And then other people wanted to add vagrant and open, I forget maybe OpenSec, also certainly OpenNabula, Azure, KVM. It got virtual box. It got more multipurpose. And they realized it doesn't scale, even the most modular shell code, bash. They use bash arrays. They interleave things in a smart way. They did as good a job as they could in bash. But it was reaching its limits of the language. So they rewrote it in Python. And now it is actually a pretty nice, directed, acyclic graph build tool. It has a bunch of tasks that you list a set of tasks. You can have plugins. It's good to work with. And a lot of code is shared between providers. So stuff that we do can benefit other provider images, stuff that they do can benefit us. It's wonderful. And a package of this is now in new, as of within the last few weeks. It doesn't quite have all of our changes in it, because some of the recent changes are on the development branch, and they package the master version. But it's going to be in Debian soon enough. Shoutouts are very much deserved for Andrew Zingerman, who started both Build Debian Cloud and Bootstrap VZ. And Tiago Iliewe, I hope I said his name close to right. Those are the two maintainers of Bootstrap VZ. Thomas Ryback. These are all non-Googlers. Ported our code for Compute Engine from Build Debian Cloud to the current code base. And Marcin is working on the packaging for Debian of Bootstrap VZ. So they all deserve a lot of credit. So what sort of image does it build? We have a KVM-based hypervisor. So it's a similar setup to what you would have for KVM. The sort of core element is a raw disk image. It's called disk.raw. And it's put inside a GZ-compressed tarball for, I guess, transmission and passing it along. We also use sparseness wherever possible so that all the processing is faster when the file sizes are smaller. So we create a sparse disk image and then we fill in the blocks that are needed and we put it in a sparse tarball. So the hardware that's involved, as I said, is KVM-based. We use Verdeo SCSI as our current primary disk device, VerdeoNet, as well as the network. And it supports serial console. Notice I didn't mention any Google-specific drivers. We studiously avoided that, so we didn't have to add any code to Linux. There were some patches from newer Linux that we wanted to get into easier backports for good support of performance or specific features, but we didn't actually have to write a Google module, which is great. That saved a lot of effort and convincing. So as I said, there are some features that didn't make it into Weezy that are useful for Verdeo SCSI and VerdeoNet because these are reasonably new technologies. Verdeo SCSI hasn't been in the kernel for more than a few years, which is new by non-Google standards. So for example, multi-Q networking was not supported by Weezy's copy of VerdeoNet, but it is in the Weezy backports kernel and there's a major performance boost by using that. There's also been lots of improvements to the SCSI driver and so overall, we have a version, we shipped two versions of Debian, as I showed a few slides earlier. One version just uses stuff from Staple because it's Debian, we should offer Debian. Another version includes the kernel and for performance reasons, SSH from backports, but it's still all from Debian as well. We ship images as best we can, at least once every stable release. This has not always happened, but usually has. Sometimes we ship images more often, which is something that I was discussing earlier this conference with Mika Prokop, among other people. The pattern of shipping one image per stable release is good for the Debian installer case because Debian installer applies updates when you do your install. So if there's been security updates since the CD was cut, you still get the fixes. In the case of a cloud image, I don't know if you really want a long update cycle before your image and maybe a reboot cycle before your image is ready to use, so it's nice to have. So for example, we pushed a new image after Heartbleed. That's a very high prominence example. We also have some code that we add to the images, very much free software that we've updated periodically for future reasons. So what goes into the images as generated by these tools? First of all, any questions so far before I keep going? I don't want to keep talking at you and boring you. So the question was what we would suggest for use. Yeah, we generally suggest using the backport's image because it performs better. The Weezy kernel doesn't support every nuance of the hardware in the way that gives the best performance. The only caveat is simply that, of course, the backport's kernel is not supported by the security team in the same way nor the SSH pack. So if security is paramount, then Weezy stable, but for most customers the right trade-off and keep in mind if there's a security vulnerability, the fix we'll get into testing, it will get into backports so we will release new images periodically and if you apply updates anyway, you'll get the fixes even before we push out a new image. So depending on your trade-offs, yes, usually we recommend backports. That's what most of our customers are running actually is the backport's image. Other questions before I keep going? Okay, so we do pretty minimal the bootstrap install. Less is not installed much to many of our annoyances nor is the full version of the BIM package. We do add some things. For example, we add the dependencies that we require for our integration code, including Python. And we change certain things. In every case that I'm aware of, the changes that we make have a specific justification. So for example, our DCP servers serve as a host name and that gets set on the image. There's a long domain name that's based on your Google Developers Console project name and with a dot Google dot internal suffix and so it gets a bit long. And if your host name is also long, then some common software which is used in the cloud case namely Java 7 and lower has bugs in that situation. If it exceeds a certain total host name length and seg faults, this was fixed in Java 8 but not Java 7 and earlier. And it's just one example. There's been other programs that have misbehaved in this type of situation. So we have a DH client exit hook that shortens it. We tweak a couple of specific settings in the SSH config. We try not to change much because we don't wanna stomp all over Debian's defaults but for example, we disable password authentication because we have a key-based management system in place. We set a SSH keep alive also right now because if there's no traffic over 10 minutes the network drops a connection, a few things like that and we do a TP sync from the host. So there's a couple of open issues as far as what we wanna do for configuration. One question is bootstrap VC supports installing the standard set of packages that is often installed in a DI installer. It's less minimal but it would include things like less, not necessarily VIM but less and some other commonly used packages. So we could add that but it is not the bootstrap default. It is a DI default for desktops. We could also consider automatic updates. Now that's more controversial topic and I know that because Debian does not default to automatic updates and that is in fact why we have not enabled it and on the one hand it simplifies the life a lot of users for keeping their fleet up to date. In many cases they won't even have to take any action because we even reboot a lot of demons when we upgrade on Debian that's also a downside though because customer software may or may not be written in a robust enough way to handle reboots but customer should be doing that on their own anyway if they're doing it in the best practices manner. So we could consider whether this should be added to the Weezy images as a using unattended upgrades. This is actually supported by Bootstrap VC via a plugin. We could consider making it the default for Jesse Cloud images or there's a lot of possible options there but that's an open issue. So we do add some software. There is no licensing obstacle. This is all Apache licensed. I think some components may have dependencies that are under compatible licenses like the BSE license. So we have the Google Cloud SDK. If you're running an image inside Google's cloud it's nice to be able to work with the environment that you're in. It's analogous to Yucca tools in the Eucalyptus and Amazon world. So access the API to the list or delete instances or to grab objects from our storage, et cetera and when you're running it from within Compute Engine you can have integrated authentication so that you don't have to deal with the usual OAuth flow that you require from your workstation. This is not currently in a dead package so we put this in user local to respect the file system hierarchy standard. We put it in a directory under users local share Google and we sim link into user local bin. We would like to get this package and I have more to say about that on a future slide. We also have a GitHub repository of what's specifically integration glue for the Compute Engine environment. So one bit of this runs user supplied startup scripts. If it notices that you're running on a multi-Q networking kernel it'll set IRQ affinity to give you good network performance. This last part is not currently working but it tries on first boot to run app get updates so that when you try to install a package it doesn't say not found. It would be nice to at some point use cloud in it there's been a variety of obstacles in place but that's a work in progress sort of discussion. We have an account management team and that is actually a very convenient feature where if your user, if somebody has access with editor or owner rights in your Google Developer Console project they can easily have an SSH key get added to the virtual machine semi automatically and then they can SSH in. It also enables a feature that works in our web interface to SSH from the browser without needing any client side plugin. It uses JavaScript and sends over a public key only through the wire not the private key and that actually gets removed after a short amount of time because it's a specific passwordless kind of passwordless kind of thing. We have a daemon that manages the routing table a little bit to enable our load balancing routing events, routing features. And for those who want to export their image currently the best option for exporting we have an image bundling tool that copies of the partitioning information that copies over the file system into a new disk image and gives you a tar roll that could be imported directly as a new image. We also have a new feature that launched pretty recently at the API level to let you convert it one of our disks into an image directly. However, there's not currently a way to export that other than this tool. So we've been making some progress on packaging including at DevConf, especially for Google Cloud SDK. I described how we're currently installing in its user local. Well, Thomas Guaron Zigo has been very helpful in getting an initial package of Cloud SDK into new. He's working on getting it to use setup tools instead of a more custom method for entry points. And it's really great to have that in progress. We're gonna work both with him and the other Debian folks who want to be involved and with the Cloud SDK team. And for the other integration stuff I mentioned we currently build Debs using an unfortunately internal method, but it's just basically putting files into place. There's really no black magic happening. And we put them on an app repository that's used in the build. And there also policies are released on GitHub. We would love to switch to proper source packaging. I am swamped. My teammates are new to this type of specific task. Also swamped though. So getting more people involved and getting them into the archive proper for as something that would be useful just to get install or upgrade. It's a reasonably easy way to help because there's nothing complicated about those packages. Just mostly putting files in place. There's a couple of, there's an init script for one of them or to run up to rc.d there's nothing hard there. So I should summarize what's been going very well in the relationship with Debian and there's a lot. The Debian Cloud team, there's been a lot of helping each other. We're sharing a lot of code, sharing a lot of tips, discussing what should be done in various ways on the Debian Cloud mailing list. I mentioned that com 13 a bit later. It's, that was basically a, I like to say it across cloud, a collaboration love fest or something. It was, we were all attending each other's talks and being very positive. It was a lot of progress was made. Shout out to the kernel team and Ben Hutchings for being very responsive to our patch request. We've also tried to keep those reasonable to be fair to everyone. For example, things like memory leak fixes and small isolated performance tweaks and trying to get prerequisites, trying to get drivers into Jesse before the freeze instead of, instead of five months after release. So we're trying to be reasonable on that end. We asked about using the Debian trademark because we are calling these images Debian. They were responsive while also respecting what makes Debian Debian and not wanting, we struck the right balance there. I think Zach and Luca have both been encouraging to our efforts. They both have feedback about how to respect Debian's needs, which is great because it's good to have, it's good to have, it's good to have people paying attention and helping. The public clouds and official Debian image status off last year in Switzerland, had a lot of great round table discussion with, we were, we introduced a session, but it wasn't really focused on us in any specific way, except by mentioning some examples. And that had a very useful transcript written up by somebody earlier this year, based on watching the video that got sent out to the Debian Cloud mailing list. And certainly the bootstrap Fizi and packaging assistance has been great as well. So thank you very much to all of my Debian colleagues for being wonderful to work with. So the main challenges have been an impedance mismatch to use an engineering bit of jargon between the normal way Googlers think and operate and the normal way Debian typically thinks and operate. For example, Cloud SDK, that team has a rapid release cycle often twice a month. Debian has a enterprise release cycle often half of a release every year, once every two years. And so back ports has been very useful for providing the middle ground as a way of getting some newer stuff in there. I imagine that Cloud SDK will receive some updates through the back ports once it is properly in Debian. Maybe we'll provide an alternative repository for those who need every single update. But Debian in general likes to stay close to upstream defaults for all the software chips and doesn't like to diverge too much in different use cases out of the box configuration. And I understand that, right? Because Debian needs to support the product and we're all volunteers here generally or not at least paid to work on Debian specifically. And we need to know what we're debugging and helping with here. And similarly, Google's used to building things in a context where it can control a surprisingly large portion of the stack, sometimes down to the software, sometimes down to the hardware, but it's used to building things in an integrated fashion and it's used to having uniformity within the Google context more than across different distributions. And everyone is actually acting in good faith and people are trying to bend in useful ways. It's an ongoing effort to make people see how the other group works. But it's actually going pretty well. So I say challenges, these are not obstacles, right? These are ongoing things to work on and people are learning and it's great. So Debian has deep packaging experience, deep community experience. It knows the distro world really well. Googlers often come from an area of knowing their specialty extremely well, like whether it's kernel minutiae or security or network performance considerations and they certainly know how the commercial world works, not all of them have been involved in the open source free software communities, but everyone's collaborating surprisingly well, that's because everyone on both sides is trying. So how do we move forward from where we are to where we want to be? We definitely want to get our stuff packaged and we love help with getting our stuff packaged into Jesse and Weezy Packboards. We're going to try to build off of the momentum from this conference for Cloud SDK. We would also like to get compute image packages, the other integration stuff package so that Debian can feel more comfortable with these images and call them, get closer to something that Debian finds official. We also want Debian to own these images in the sense of right now I've been building a lot of them, my teammates at Google have been building a lot of them, but they're Debian images and it would be great if Debian affiliated people, not just me, were building these images. Once this has worked successfully for an image, build two, three, four, we'd even say, great, you can upload directly what our customers will see promoted by Google as our Debian images. We definitely don't need to be an intermediary long term and there was an idea mentioned this conference by Steve McIntyre of possibly having the CD building infrastructure do this. They already do the live images as well as the CD images and they have a way of doing contained builds. We do our builds currently within GCE VMs but that's simply for isolation and not a prerequisite. It prevents our local environments from either being contaminated by a bug or interfering with the build in some way so it could also just as easily run on the CD infrastructure under DSA control, whatever. And we should figure out what defaults make sense for Debian's cloud images. Sometimes those would match the defaults. Sometimes those might be different. I mentioned the security issue and the possibility of wanting to do unattended upgrades. I mentioned that we disabled the password authentication. At one point I was going to mention the permit with login setting but Collins decision to switch to without password as a default for the SSH config is actually a pretty good balance of security and convenience. It still allows, for example, a forced command SSH key. The wording without password is a little bit scary to people who don't know what it means if they see you can permit with login without password. But what it really means is password is the only case in which with login is never enabled so it's actually good, it's just confusingly worded, okay, an issue was also raised as conference of the init ramfs size. You may think, why is that an issue? Well, it's a 13 meg init ramfs with the default modules equals most settings. And again, why is 13 megs an issue? Well, we did some measurements or my colleagues did, not me. And thank you that she's right here. And it takes about 16 seconds of our boot up time to load the init RD. And we are trying to do whatever we can to make boot up time fast so that everyone can have the best experience possible. And it turns out, going down to modules equals depth in the init ramfs config brings that down to 2.4 megs from 13. So we'll probably make that change for a pretty good reason. We also noticed that most of the remaining 2.4 megs is taken up by libc. And the only thing in the init RD that needs libc is busybox, I think. Maybe bash, I'm not sure, but we could try to see if we can find static libraries or relink those, but that may be a longer process. What, we don't wanna be disrupt, doing disruptive changes short before Jesse, you know. But if, you know, 2.4 is a lot better than 13 anyway. So these are examples, right? So these are changes where we, we should see what the defaults should be for the cloud environment. And also who should decide them? The release team could be involved in deciding this, but I also don't wanna force work on them if they're, if they have their hands full with the normal releases. It could be a cloud team effort to say across different Debian cloud images, these are reasonable ways of deciding the faults. You know, doing the decisions by Googlers, acting under Google, you know, priorities and pressures in isolation is not something that means likely to be happy with and I know that. But it could be a way to prompt a discussion and maybe have Debian say, yes, the cloud team can use these criteria for deciding what the deviations are reasonable, something of that nature, because the answer is not always one size fits all. So places you can discuss this stuff. There's a mailing list, Debian cloud on Debian's normal list server. There's an IRC channel, Debian cloud on Debian's normal IRC server. You can email me. My Debian email address is Jimmy at Debian or that's my work address, jcapelowits.google. We have a Stack Overflow tag for user support. Feel free to answer or ask questions there and we have a Google group which is also of course usable via email where users ask questions. There's also paid options, but you know, these are the free options and they are heavily used. You can also discuss right here, right now, any questions or comments. Eric, is there a microphone? Is there a microphone, pass around? If there's not, I can repeat it, but. Okay, go ahead. Why has Debian decided not to do it? Why has Debian decided not to do it? I wish I knew. I'm going to guess that it's because nobody has decided to do it. Maybe there are other use cases that are not the cloud case where it's a bad idea and you know, does anybody know? I would love to get you a mic. If you can't get a mic, I'll rephrase or paraphrase or something. I'm going to reassemble the mic later. Okay. The last thing I want to have is automatic upgrades of my machines. It is always unfortunate when all of your 100, 200 boxes fell off and that because we have one box. You should stage this security upgrades. So test it on one or two boxes and if it works, use puppet or whatever to get it in all boxes. Sure. Okay, so. If you do it by default on all your, let's say, 2,000 boxes, they were all fell out of the net at once. So since there's no mic, I'm just going to summarize for the video recording. So I think what you're saying is as a default, doing it as a default is risky because one bug could cause many of your machines to fail and you agree that it's good to stage and test updates on a fraction of your fleet and then roll them out via whatever mechanism you want but doing it by default on all of them is risky. And I also agree and certainly I don't think anybody, I don't think anybody who wants unattended upgrades on by default would want it to be mandatory, including Google security, who is one of the people who wants this, but nobody wants it to be mandatory and certainly anyone who's using puppet or chef is sophisticated enough to run that or Ansible Salt, whatever equivalent you wish. They can of course, they should be allowed to do their normal way, but you were saying, Eric? No, I was just agreeing with you. Yeah, the question is simply, what is a good default there? Tony, if we get a mic around, it can come, it can start floating, but until then I'll paraphrase. So you could support a pure, easy image and then folks could enable backports if they wanted and then support a plus image. I'm thinking there are a lot of users who would benefit from a slightly more configured image with updates possibly, right? They need to know the risks and be able to turn them off. But yeah, it's annoying to not have less time to shell until it works. Right, so the Tony's comment was that we couldn't consider offering multiple additional flavors of images with, or at least one or one or two, with additional levels of configuration and more stuff available by default. And we agree and we have some thoughts along those lines, but yeah. My comment was just, still, maybe just two. Oh, he's saying just two. But we see pure, and if someone wants we see plus just backports, that's easy, trivial for them to get. Okay, so you're suggesting that we keep the Weezy image as it is and change the backports image to have more stuff. Hello? Yeah, make that Weezy plus. That's a possibility. Whatever we do, as long as we're calling it Debian, we want, of course, Debian, both the community and the trademark people to be comfortable with it. And if we do end up wanting to do something that's especially customized for Weezy, we would probably discuss with Debian how to balance the two brands in a way that Debian's comfortable with. Yeah, but not to pretend that it's regular Debian. Any other questions or comments? We have a mic. No? Yay, mic. Yay. So ask more questions now that we have a mic. We have a bit over 10 minutes left. I was wondering, is there much more need for more customized images? It seems that most people are using some ready-to-go base images and then maybe call their configuration management. But did you hear of users that they like to have a tool that does a little bit more than creating the base images without using some, yeah, without booting up the base system into the virtual instance and then calling the configuration management? So we actually, so not every user is skilled enough at systems administration to have a complex configuration management system ready to go or they don't know how to manage it, et cetera. So we already offer some additional customizations. There's something called Google Cloud Deployment Manager and we have some click-to-deploy images that basically it effectively automates a startup script for certain common cases like certain development stacks. And certainly there are tweaks that might improve performance or security that may be more invasive compared to Debian defaults that might be desirable in the cloud context or maybe shipping more Google tools than is reasonable to ask Debian to do by default. There's a variety of use cases, but certainly anyone who's going to be doing their own config management will want something reasonably like what we have now, yes. Other questions or comments over there? Just repeat the number you used at the beginning about number of Debian images or percentage of users that runs the image. I think the number was, Venkatesh, you can correct me, I think the number was out of the, for technical reasons we were able to get data on, it aggregated, of course, aggregated data on roughly 96% of our images and instances and roughly 85% of those were using the backports Debian image and additional ones were probably using the regular Debian image. We also offer non-Debian operating systems but I didn't put them on the slide. Okay, other things? All right, I'm happy to talk about these topics or, of course, other Debian topics for the rest of the conference, I'm here till Monday. Thank you.