 Hello. This is Birking. Hi, I'm David. I'm Jimmy. David and Jimmy. And we're here to enlighten you on things that Google is doing with WNOS on Google Compute Engine. Basically, why we're here, oh, I should not stand in front of the slides. We're here because we started using WNOS in our Google Compute Engine product, which is a virtual machine product in the cloud. We're here mostly because we want to build a good, solid relationship with the WN community so that we can make that product smoother and more reliable and even more awesome than it is now. And in general, we want to share with you three things. What we've been doing before, what we're trying to do now, and what we'd like to be doing sometime in the future. OK, so just in case none of you have paid much attention to what Google's been doing in the space, I felt I'd let you know what we are. We have a fairly large cloud platform by now. We have Google Compute Engine, which is an infrastructure as a service. We give you virtual machines in the cloud. We also have Google App Engine, which is a platform as a service. You give us some Java code written in a certain way. We'll run it in the cloud. We also expose various storage subsystems at various levels, SQL, persistent disks, NoSQL, et cetera, et cetera. And we also have various app-level services such as BigQuery and various queue services and stuff like this. All these services are built on a common rest interface so that you can write programmatically an interface with the cloud in a fairly consistent way. So what Compute Engine exactly is, it's infrastructure as a service. You sign up for a service and you tell us how many virtual machines you want, what they look like, how their network together, et cetera, et cetera. And we give them to you and you give us some money. We've been around for about two years now. We launched the Google I.O. 2012. And this past Google I.O. in 2013, back in May, we made the product available to anybody who has a credit card. The key idea here is that part of the thing that makes Google what it is, let's Google make the products that it makes that kind of kick butt, is the fact that our data centers are so impressively built and so well-networked and so forth and so on. And the idea is that if other people are able to use that service, then they could also build some kick butt services and make the world a better place. So Google Compute Engine, like I said, is virtual machines in the cloud. You have various ways of accessing the service. You can come in with a command line tool that we give you or build your own. There are programmatic APIs that you can use that all follow the REST protocol. And we also have a little UI that you can use to launch VMs and manage them and so forth. Now, inside the data model basically looks sort of like you have some virtual machines, and you can construct your own private networks and connect them to the external internet. Each virtual machine has a disk of some sort. It can either be a persistent disk that exists when your VMs aren't running. It could be a scratch disk. If you want some cheaper data storage, or you can stick it in cloud storage, your data. These disks are built from what we call image templates. And these image templates are the key mechanism by which operating system vendors we intend to upload their operating systems to our cloud. And I'll tell you more about that later. So I wanted to get you a little demo of what's going on. There we go. I don't know how to use Jimmy's newfangled Linux computer. Let's see. So this is our cloud console for Google Compute. Is there? There we go. So when you sign in, you get this little dashboard that tells you what's going on. You can see some utilization network traffic, this traffic of your entire fleet of instances. You can launch a new instance by coming over here and clicking on new instance and entering some data. Hello, Debian. Let's see. Let's scroll down, which I don't know how to do. And we'll choose a zone. We'll put this one in Europe. We'll give this one a Debian 7 image, which is, I guess, the default. Let's see. And just, well, let's make a big one. Let's make a big 8CPU1 with 52 gigs of RAM and create. And it chugs and chugs and chugs and eventually gives you a virtual machine. I'll wait for a moment for this to complete. How do I switch out? There is how I switch. So like I said, you can also come over to a command line tool called gcutil instances. And you can list the instances that are running inside of your system. And if we run this late enough, we will see the output of the instance that we've just created. There we go. Hello, Debian. Sitting there in the Europe zone. We've got zones scattered across the globe. You can choose the one you want. And how do I get out? It doesn't actually matter. Oh, tab. Tab? There we go. OK. Should we SSH in? Yeah, let's SSH in. So we can SSH in. I should have picked a smaller name. Ah, what about the password? Sometimes, OK, so we use SSH keys to allow you to talk to your VMs, to SSH into your VMs. We have a small management infrastructure that injects some user accounts into your virtual machines and that provisions the SSH keys that you need to connect. You have to control C and start that again. And we should be in. Maybe. Oh, no, wrong key. I don't know if this is any idea. So you only have to generate a key once per machine you're connecting from. It's a private key and it generates a local key pair on your workstation and it uses some cloud stuff and a cron job in the instance to install it. We need music for this. OK, so again, you see that you've got a nice little Debian operating system here. You can run things. You can sudo an app git. I don't know. Less. Let's app git some less. No, I want to do app git move, right? Aptitude move. Aptitude, that's totally proof that it's running Debian. Is that right? There we go. And then if you're really questioning, there you go. Nobody in their right mind would ever fake this interface. OK, good. And I switch back. Alt-tab. There we go. OK, so again, why are we here? We want to basically work with you guys to make the Debian OS on Google Compute Engine products really sing and really integrate nicely for customers. And we also want to make sure that all of the software that we produce for our cloud and for Google Compute in specific, but throughout our whole cloud, basically make it nicely available with Debian packages, et cetera, et cetera, so that it's easy to use for anybody running Debian inside our cloud or outside our cloud. Plus, I want some swag. OK, so the outline of this talk, we're not presenting. So the outline of this talk, we have just a few things to talk about. First, we want to talk to you about how we're actually going building our Debian operating system images and pushing them in the Google Compute Engine. And then we're going to move on and start talking about the package repository mirrors mirroring system that we have in place, that mirrors Debian packages into our cloud for various reasons. And we're going to give you some previews for the tutorial that's coming, that Mandy over here, Mandy, is going to be giving later today, just a short while from now, about an hour, an hour and a half. And we're going to be giving you previews for the BOFs that Jimmy and I are going to be giving today and tomorrow too. OK, so at that, I'm going to give Jimmy over to Jimmy. Hi, so we are using a tool to build the Debian images, which actually came from the Debian community. We did not create this tool. The tool, at one point, supported only Amazon EC2. And we contributed support under the same license, had sent back to the community to add support for Google Compute Engine. And accordingly, the name was brought in to build Debian Cloud. We worked with Anders Engelman. He was very helpful. And we tested with both the Debian Cloud mailing list and one-on-one interactions. And images based on this tool shift in May, very soon after we beat before Google I.O., there was a blog post. And I mentioned in the Google Compute Engine sessions at I.O. We have images for both Squeeze and Weezy and intend to proceed to Jesse when the time comes. So as you can see, the tool has sub-commands for both Cloud that it supports. And Anders is good about merging our patches. We are submitting them pretty often as well. He is working on a Python rewrite support that needs to be added for our Cloud. But that will happen. Build Debian Cloud is a patching license. Our contributions preserve that. It's all free software. So how does it work? So the parts in green are specific to Google Compute Engine within the context of this tool. EC2 has different ways of doing it. The black row about the bootstrap is shared. So it's like a lot of other disk image builds for OpenSack or VMware or what have you or KVM. We have a local disk file. We put a partition table. We put a file system. We loop back Mounted it, the bootstrap, install stuff. It's a very simple tar ball in the end with a disk.goggle file in it that represents the image. We use sparse compression and sparse. We use space efficient using sparse technologies of that work. And we uploaded to Google Cloud Storage, which he didn't say what that is. It's an object value store, like a lot of the other ones, like S3 and so forth. It's a once the image is uploaded there, we can just use gqtilt to add it. And a command similar to that is used. We would probably add it to a testing project. And then when we're ready to publish it, it would basically be this. We can give a description as well. So what we've been doing and what we want to keep doing with Debian's involvement, ideally, is we want to periodically, every time there's a new Debian stable, minor, or major release, or whenever there's enough new Google updates to provide or whenever it makes sense, for bug fixes and solvability or whatever to produce new images. And test them somewhere, I'm going to talk more about this in one of our boss, but there's no great way for Debian to quality show or test a Debian Cloud image. But I do a quick manual test, and Google has an internal test that we run, not comprehensive, but it's useful. And then we publish it in the Debian Cloud project, which is set up so that every customer of Google Compute Engine can see those images. And our tools and web interfaces have some support for the Debian Cloud project and another one for a different OS. And that's the entire process for us for publishing. So the releases right now are built by me wearing both my Googler and Debian developer hats. And it would be great if non-Google or Debian developers were to get involved. We designed the process so that any Debian developer or really anybody who Debian trusts can do this. They just have to have a Google account of some sort, and they can be granted permission to publish to this project. So we want to coordinate with Debian on this and validate things to make sure that it meets everyone's quality. You need certainly Debian's and Google's. But it should be a collaboration, and that will be great. So the name on the slide is slightly wrong. It's Debian Cloud Experiments Plural, which is a free project with a small shared quota. We're covering the cost of this, of both of these. That project is, let's say you want to validate an image or try out building the image or some Debian-related short term or small scale test. That would be our development. Yeah, we actually gave some access to this to neuroDebian just to give a broader scope Debian-related example. And for the official image releases of the Debian Cloud project, and you can email us at our work addresses, cache, and jcapelowits at google.com for access. OK, so for package repository mirrors, it's always possible to send our customers to the global Debian mirror network. At the same time, if you keep their bandwidth within our network, they can save both bandwidth for them and for Google and money as well for them. So we do have a local mirror that we're running inside the cloud, and our image is default to that. Plus the global mirror redirector. Sure, yeah. So we want the images to be fast and to not overload the public servers and as I said, to save money. So our mirror is synced. You think FTP syncs like good practices suggest. We actually serve it via Google Cloud Storage, which can be accessed directly over HTTP. This has some geo-balancing properties. It has some reliability, scalability. It's replicated, et cetera. It's a good infrastructure. And since we're sort of delivering this system to combine FTP sync and Google Cloud Storage, we have HTTP.debian.net in there so that the app has a very good built-in redundancy. We tested this. It actually does fall over nicely. So people will be sure to get current packages. So some things we would like to evolve this toward. It would be great to talk to the FTP team and the Mirror Admin team about being a tier one push mirror simply because there's a lot of users in the Google Cloud. And Debian is a very common choice now for their operating system. And it would be great to get updates. We may also want to see about getting our mirror added to Rafael's great redirector service, HTTP.debian.net for customers visiting from Compute Engine. And I know Amazon is also using CloudFront to serve a mirror in their cloud. We're trying Google Cloud Storage. It's a similar concept. It would be great to have an interoperable way, not specific to one cloud, to do a direct push to CDNs. This would be more of a development speculative thing. But there are several different clouds where pushing to an object value store in some way might skip an intermediate step and streamline that process. So at this point, we have a brief preview of the two buffs. So I should mention again, it's not a buff per se, but at 11.30, there's one talk in between this one and Mandy's talk. Mandy's talk is a tutorial, an interactive tutorial. You can participate with or without a Google account. You could just use your laptop or watch her as she does it. At 11.30, right here, she'll give you a quick whirlwind spin through it all. And at 15.30, 3.30 PM in this room today, I'll be doing a more debbie and focused buff, but with examples from Google about the question of what Debian should consider an official Debian image in the cloud. It's a bit different than the CD context we have historically done, but with some attention to the needs of public clouds and figuring out how to satisfy the needs of Debian cloud vendors and Debian for its customers in the cloud and how to move that forward. Tomorrow at 9.30 as well in the morning in the second talk room, not here, David will be leading a buff, which is also general to Debian, but he's again bringing examples from our experience about packaging cloud specific software. It often has recent dependencies or fast changing environments and features. There's a bunch of unique quirks to the cloud context and the vendor relationship context that are certainly not specific to Google Compute Engine, and we hope that both of these buffs, other cloud providers, will participate, but they should be good ways to move the issues forward. So a quick preview of my buff this afternoon is basically what I said. There's a question of what official even means in the context of a cloud VM. Sometimes you need to tweak the configuration to work smoothly. Sometimes you have a bug of vulnerability even specific to the environment. We've seen performance issues and other weird situations. Debian, of course, has its values about licensing and it needs control over security and packaging to be reasonable to call Debian. It needs to be supportable. But integration is a tough question as well. CloudInit is one thing that might streamline the process, but there's a lot of other questions that are outside the scope of CloudInit. So we can discuss this more this afternoon right here. So here's David to finish the slides a bit and talk about his thoughts. So I want to go back in history a little bit. I want to go back in history a little bit. Like I said earlier, Google Compute's been around a couple of years now. When we first got into the business of building images for the cloud, we didn't know what to do. And so in general, what we did was we created a local disk file, we mounted it locally, and just started de-bootstrapping and changing crazy files inside. Whenever we needed a package, we copied it in. Whenever we needed some sort of tool, we modded it or we took root it into it and installed some random piece of software outside of the Debian packaging system or the RPM packaging system, which we were in whatever context we were installing software. And in general, we found that that was generally a bad way of doing things because there are several downsides, which you probably all know, which is that you can't upgrade and remove packages very easily. You can't fix bugs with an update or an upgrade. So in general, we've been finding that. And I think in general that the Debian operating system community would rather not have us just putting random files inside the operating system. So in general, what we're trying to do now is take all the software that we're trying to stick into the images and convert them to Debian packages and make sure that they're all just a simple app get install of a Google package. And one of the things that we'd like to have in particular is we'd like to figure out a way of getting these packages wherever appropriate into the official Debian release stream and into backports whenever that's appropriate or if there's some other mechanism we need, we should figure out what that looks like. I want to give you a general overview of the kind of software that we stick inside the images. In general, we only put completely on our free software Apache license stuff. In general, that's Google's way. The key components here are some startup scripts to get the system bootstrapped in the virtualization environment, it figures out what the instance name is, the host name, things like this. We've got a management daemon that manages accounts and networking. So whenever the network changes on the VM, the operating system that can then be notified and said, oh, by the way, you've got load balancing features now or you've got this or you've got that through your IP addresses changed, et cetera. We install some simple tools to talk to the Google Cloud systems. We've got a compute tool and a storage tool. And there will be more coming down the line as we finish the integration with the rest of the Google Cloud properties. We've got some image snapshotting tools to basically let you take the current contents of your disk and create an image template from it so that you can create multiple VMs from the current. So basically you can take the current VM and clone it more or less. We also sometimes try to install some security lockdowns. So for instance, sometimes we want to go in and modify SSH configs. Our security teams are very eager to have us, say, turn off root login and things like this. And so far as I know, there's no great way of just installing the Google security lockdown package and having it go in and modify all the files of SSH and this and that and the other packages that are installed in the system. So some of these things are easier to do than some of the others. It's easier to say, I think, just make them all Debian packages and move on with life. I don't know. How many of you have had trouble building a Debian package before? Oh, ah, sweet. Thank god, I'm in like company. OK, we're still in minority. We should get together the three of us later and commiserate. I'll buy you all of your. Anyway, so moving on. So I want to give you a couple of examples of the things that we've been doing, some of the packages that we've built and some of the things that made them hard. GCU-Till is what we call our command line tool for the Google Compute product. It's a pure Python program. It's got several dependencies. And again, it's 100% open source, patchy license. You would think it'd be simple to just install as a Debian package. We used to be using, part of the problem here is that it's a Python package. We used to be using setup tools and PyPy to do all of our dependency management. But we found that in general, the Python module system was picking up system modules and stuff like this sometimes. And sometimes our modules. And it was just causing us trouble validating and maintaining quality control really over the product. People would come in with bug reports like this, isn't working, and it turns out because they've installed some crazy version of some crazy dependency we had. And it's like, oh, anyway. So eventually, we decided to ditch the PyPy system. We copied all a copy. We basically picked a particular version of each of our dependencies and we basically copied them into our package statically. And we forced the Python interpreter to load all the modules from that directory rather than from the system. And that makes the control, the quality of the products a lot easier to validate. Because we test exactly what the customer is running. And we just inserted a make file into our source code, our tar file, and we run DH make and build, de-package build package and all that stuff to make our Debian. Good. GSU, so that one was somewhat easy because we only had to do the version management, the static linking of the Python code, if you wanna call it that. GSU till is our storage tool. It's a little bit harder. We haven't figured out exactly what we want to do with that yet. And hopefully one of you will tell us. Two of you, maybe multiple, many of you might tell us exactly what we need to do. Again, it seems like it should be simple. It's 100% open source software. It's, the tool itself is all written in Python. It takes many Python dependencies and it takes a few binary Python dependencies. The problem is that both of all the dependencies that we take, the versions that we want to be using are not available in either squeeze or reezy. And so getting a Debian for GSU till that installs and squeeze or reezy is challenging. Either we have to go and build new versions of say CRC mod and things like this and publish those somewhere, I guess, I don't know. We don't know the process that needs to happen here. And again, because this tool uses PyPy and setup tools, sometimes we run into these version conflicts where the system where Python loads a module from one place instead of another place and gets a different version. And we sometimes have to just plaster over those errors, right, whenever the Python interpreter says, oops, there's a version Mets match. And we'd like to have a better solution here too. And if you guys have any recommendations, I really encourage you to come tomorrow at 9.30 to the packaging talk and tell us what to do. Because we need our, we need you guys to help us. That's in the second talk room, not here. Very good points, very good points. That way, I think, right? Below the bar. Below the bar. Afterwards, we can go for drinks and commiserate. Okay, man, all I want to do is buy people beer. Okay, if you help me fix my problems, I will buy you beer. That's my promise to you. Good, was there anything else I wanted to say? Questions. Questions, are there questions? No questions. Oh, there's somebody, there's somebody moving around. What's that? This one works too now, so we have two microphones. So, in terms of API, what you provide is completely different from what AWS and Eucalyptus provide. In terms of API, REST API and stuff like that, what you use is completely different from what Amazon Web Services and Eucalyptus provide. It's different from other services, is that weird? Yes. Yes, we invented a completely new API. We thought that the existing examples of APIs weren't to our liking. They didn't provide the same sort of structure that we would like to see. So ours is a fairly hierarchical structure of, the object model is basically mirrored in the API, if that makes sense. I think most other providers have a flat, sort of key value kind of configuration process for VMs and other things. We wanted to sort of break it out a little bit and we think that that in the future, it's gonna make it easier to integrate with other kinds of products. So for instance, this is sort of happening now if you look at the Google Compute product as it stands now. We started with virtual machines and basic networking and now we're inventing all sorts of new sub-trees in the API for load balancing and advanced networking and routing and different kinds of disks and so forth and so on. And so instead of just pushing them all into one name space, we have a little bit more structure. A couple of related comments. The authentication is handled differently in the normal case than with say S3 or Eucalyptus because it doesn't actually rely on a shared secret. It uses OAuth 2 and so there's scoped and time limited tokens with refreshability and in various ways that's a good security advantage. They can also be revoked. And we do for the cloud storage product, the object value store, have an interoperable way that is similar to S3 and Eucalyptus that you can use it. It's not as full featured as the native API. And one more point, I think just last week, I think we pushed Google Compute compatibility into LibCloud and so I think in the future you'll find that most of the middleware is going to understand pretty easily how to adapt between both Google Compute and other products and but if you want to take full advantage of all the Google Compute features and that nested hierarchy thing, then you come directly to us or if you're building other things for other Google properties, right? They all have a fairly similar structure, if that makes sense, right? And so if I go from Google, if I'm programming for Google Compute and Google Storage and Google Prediction and Google Docs, the APIs are all sort of similar. And there's other middleware that support for our stuff has been added to such as in Ruby Fog, I believe now supports Compute Engine and I believe Voto has support for Google Cloud Storage. Other questions? One simple suggestion. You mentioned different package libraries for the packaging. So why don't you just use a Debian mechanism that already exists and it's called AppPinning. So why don't you use that to specify which versions of the packages you want to use? I don't know. Maybe you should talk to me more about that. I barely know anything about AppPinning. I mean, I know something about it, but I don't know if, like, my understanding, I don't know, so my understanding is that Pinning basically, I don't know. Maybe Jimmy can correct me. It is a bit error prone, especially as you mix different versions of things. We could consider it if the packages are intended, for example, to be used with a stable release only and then we would have a single target to make a repository for. But it would be better to do things in a way where we didn't have to worry about what the pins are. Right, but you don't have to do the whole repository. Maybe you can only specify that particular version, name of the package you want to use. Now the Pinning forces the version of the package on the operating system entirely? So basically you could use a stable system, but you need newer versions of the package so you could pull that particular version from unstable, for example. Without breaking your stable system. But if you're doing the whole repository, then you could maybe break it, but if you specify that particular version, then it's completely safe. I do that all the time. But does each user have to do the Pinning operation themselves or does the package itself do the Pinning? No, no, no, whoever sets up the repository does that. It's a little bit more error prone than you're making it out to, but come to the boss tomorrow morning. Yeah, all right. Then we'll talk to us. How should I see Google computing engine? Can I compare it to a virtual private server, VPS? How does a Google compute engine compare to a virtual private server? It's fairly similar. You have various options when you talk, when you get into the Google compute land, you can connect the virtual machine to Google infrastructure in a little bit more simple way and with more proximity, right? So you can be close to all the Google services that you might want to use. Other things that might be different, you can configure the networking so that you have internet egress or ingress or you can have a private network, things like this. And it's all just fairly, it's all API driven, so you can configure the infrastructure as you want, as you wish. And I think virtual VPS systems don't always give you that functionality. Certainly not the proximity, but otherwise fairly similar. So they also have the persistent storage, for example, is a lot more reliable than a typical hard disk. It's the whole Google compute stack below the virtual machine level. The underlying stack is built on top of stuff that Google depends on for its business. And so there's a lot more focus on low level and systems infrastructure performance and durability and similar things. For example, there's a lot of great stuff happening to improve network performance, et cetera. So there's a lot of advantages to leveraging what Google is trying to do in those regards. There's one other difference is that in general, the market that Google is currently looking at and focusing on is the people who care about having a dedicated server, a dedicated hardware. So if you really need a single CPU to be there all the time or eight CPUs to be there all the time, you get full eight CPUs and full RAM and this and that and the other and full networking, this is sort of the market that we're targeting at the moment. We've started doing fractional machines so that you can get time shares and get a cheaper version of the hardware if you don't need it on all the time. But in general, we've been focusing on people who need a lot of power and a lot of reliability. So it would be a good option for me to use it as a compile engine. Yes, that's the really awesome use case right now. We're building out facilities to give you more features and functionality. If you need web services or work queues, things like that. We're looking for, okay, now I have time for the development. Now I'm gonna start a compile cycle, then I need compute power and when I'm done, I'm going to sleep, I'm going to switch it off and at that moment it shoots the cost counter stop. Yes. Yeah, you would, if the machine is not running, you would only be paying for any stores that you're continuing to use, but you wouldn't be paying for the compute resources or any stores that is not persistent. Right. Thank you. One more question, anybody else? Well, thank you to Google. Thank you guys. And I appreciate the information.