 So my name is Michele Tartara. I'm a software engineer at Google, and I'm working in the GANETICore team, which is the team developing virtualization software that we call GANETIC. I will start with a short introduction to GANETICE itself, and then I will present some of the newest feature, which came out in the latest revision released last week. And finally, I will give a brief update on our participation in the Google Summer of Code program for the first time this year GANETICE participated in the Google Summer of Code as a mentoring organization. So traditional approach to installing a virtualization cluster. Well, of course, you take a bunch of physical machines, then you set up an hypervisor. We hit the one of your choice, sign KVM, LXC, if you are fine with just using Linux solutions, then you set up some storage or application mechanisms because you would like to be able to keep your VM running, even in case of some hardware failures in the hard drive source, and so on. Then you try and everything works, working nicely together, and then, well, stuff. Basically, all the usual configuration steps that you need, and then, finally, everything works, hopefully. This is a traditional approach, works perfectly fine, but it's also nice to simplify your own life. So you take a bunch of physical machines, install GANETICE, and basically, all the other steps comes for free because it is GANETICE doing the configuration of the RBDE and all that other stuff for you. So what does GANETICE actually do? What is GANETICE? GANETIC manages clusters of physical machines. It deploys some KVM, LXC virtual machines on them. Basically, you just have to start from a basic Linux system with the packages of your distro installed, and GANETICE will configure them for you. It allows you, for example, to do live migrations of virtual machines between the nodes of your cluster. It gives your resiliency to failure. So if you are using, for example, the RBDE, it can give you data redundancy. Or if not the RBDE, you can have some external storage that you tell GANETICE, OK, assume that this will be available on all your nodes, and everything will work perfectly fine. GANETICE can also automatically restart instances after power outages. There is a watcher that controls that if a machine was configured as up by the CIS admin and it is detected to be down, then it will be automatically restarted by the watcher. And then there is also a bunch of tools for doing cluster balancement and for giving you more ease of repair and hardware swap. So if something does not work, GANETICE will help you fix in that. It is open and is built on top of open source technologies. So Linux and all the standard utils, then, as I said, KVM, SAN, or LXC as the hypervisors, DRBD, LVM, or in general, any kind of storage area network is perfectly fine. And it's implemented in Python and Haskell. Traditionally, it was mostly Python. Now, Python is still the majority of the code base, but especially for all the internal demons, the ones that even in the worst case, the CIS admins will never really want to touch. All these parts are slowly migrating towards Haskell. Basically, and mainly for a reason of code cleanliness. With Python, it's too easy to modify something. OK, it seems to be working. Then you have your software in production and randomly runtime crash, which is terrible. And when you realize that most of those crashes, you can detect them at compile time. Why not? So a bit of core concepts that Gannety is based upon. Gannety interacts with the cluster as a single entity, instead of interacting with the single physical machines or even worse with the single virtual machines. Basically, all the commands that you give to Gannety, you give them on the master node, which is one node of your cluster that is selected as the manager. And then it will be up to the software to execute the command where it's actually needed. It makes the entry-level cluster with realization as easy as possible. Is it to install and manage, as I said before, and based on common of the shelf hardware? You don't need specialized hardware. Usually, it would be nice that all your nodes are equal, but it's not really required. You can have different nodes. It's perfectly fine. And Gannety is built to scale to enterprise ecosystems. So you can start with one node. OK, it's not a great virtualization cluster, but it's just one single machine. But you can scale up to hundreds of nodes, which are the configuration used inside Google. And you can have more than one feature. And then another core concept for Gannety is to be an open-source citizen and be a good open-source citizen. So all the design code discussions, it's all done on the public mailing list. You can follow completely the development from there. External contributions are more than welcome. And actually, we have a bunch of them, especially from JRNet, which is our biggest external contributor. And we have some guys from JRNet here. And they gave a talk yesterday about their CNAPO system. And as I said, JRNet is one of the big Gannety users. But in general, we try to have an active mailing list where the core team reply to the external users. And also, it's nice to see that it's an active community because the users are replying to each other without even our interventions sometimes. Just to give you an idea, Gannety is maybe not well known, but it's quite used and on quite big installations. So of course, Google is using Gannety internally for all of its virtualization requirements. And when I say this, I mean the internal ones. There is no user-facing service based on Gannety. So for example, Google Compute Engine is made on something else. Gannety is used for internal infrastructure. So for example, if somebody needs a virtual server for some development or something or a virtual abstraction, this is all managed with Gannety. Then we have JRNet. As I said, it's the Greek Research and Technology Network, which is running a really big, multiple Gannety clusters, actually. Then there is the Oregon State University open source lab, which if you look at the name might seem like some obscure laboratory of a university until you find out they are hosting a lot of machines of the Apache Software Foundation, the Python Software Foundation, a bunch of Debian machines, and so on. I didn't know this myself the first time I read the name, so impressive. And then there is, for example, Scouts, which is a company making a price comparison engine in Greece and the Free Software Foundation France on their website states that they are using Gannety. So a bit of terminology that I will be using in this presentation. In instance, when I say instance, I mean virtualization guest, EVM, no more than that. A node is a virtualization host, so the physical machine where you install something. We have the concept of node group, which is an homogeneous set of nodes, usually when you have an instance installed, and you have a primary and the secondary, which is ready to come back. These are inside the same node group, but the cluster can be composed of multiple node groups, which is a set of nodes, and it's all managed as a collective by Gannety itself. Every node, when it's used as part of a Gannety cluster, might have different roles. It might have different administrative roles, which is master, which is controlling the entire cluster. We can have master candidates, which are a small set of nodes that are ready to become the master if needed. So for example, if the master crashes or it has some problem, the administrator can go on the master candidates and say, failover the master here. That node will become the new master with all full power, and this is possible because master will always send a full copy of its configuration to the master candidates every time this is updated. But of course, not every node will be a master candidate, otherwise copying the configuration continuously everywhere would be terrible for scalability. But many nodes might be master capable. So they might be master, but currently they are not configured for that possibility. Then there is the regular nodes, which is all the other ones, and you can also have offline nodes, because of course, sometimes you have configured a node, but this went offline for any reason, and you know this happened, you don't want the cluster every time to go and ping that node to find out information. So you just say, okay, this is offline, don't even consider it for now, but don't remove it from the configuration. Then regarding the nodes that are actually able to host some VMs and are not just the master node, for example, they can be VM capable or non-BM capable, which means, for example, nodes containing only storage, but no actual hypervisors. So how can the user, the sysadmin, control GANETI? Basically in a bunch of ways. The public interfaces are the command line interface, which is a bunch of commands named gnt-something. We have gnt cluster, gnt node, gnt network and so on, which is the usual way for this sysadmin to interact with GANETI. These commands of course require loop permission and are made to be scripting friendly. So they try to have a really regular output and take a really regular set of input. We try never to change this between one is and the other one unless it's really, really required. So it can be used for scripting. But of course, if you need to do proper scripting, there can be better way. And this better way is our remote API, which is a REST HTTP HTTPS API with authentication possibility of supporting multiple users, which can have read access or write access to the cluster. And this one is explicitly granted to be stable because it's meant for automation. On top of that, not directly provided by GANETI, but by external contributors, there are also graphical interfaces. Between these, it's nice to talk about the GANETI Web Manager, which is a really complete interface which includes permissions and quota management systems. It's the one developed by the open source lab at the Oregon State University. And with this one, users can be granted access to a cluster, multiple clusters, single virtual machines and so on. It's really complete, allows you to manage really everything. On the other hand, can be a little bit complicated. It's meant for sysadmins, mainly. So as you can see, this is just one of the screens, is that the dashboard, the overview, get a list of all the clusters you can manage with the summary of their version, allocated memory, allocated disks, how many nodes there are in the cluster and so on. How many get from machines. Another interface is the Cinefo interface, developed as part of Cinefo, which is the software developed by Garnet, which is an entire cloud layer developed on top of GANETI. GANETI is only a virtual isolation cluster management. There are cloud services, like the one made by Cinefo, built on top of GANETI. And as part of Cinefo, there is also a user interface, which is mainly meant for end users, because they are running a huge cluster for providing computation services to Greek universities. So they want to give this university and the users of the universities the possibility of starting their own virtual machines. So as you can see in this image, for example, the focus is on the single machine with a few data, only the important ones. So let's go a little bit more into the detail of GANETI. This is an overview of the GANETI infrastructure. As I said, you control GANETI either through command line interface tools, or through some client, which communicate with the remote API. Remote API is a demon, which is listening for the commands. Then we have the master demon, which runs only on the master node. And it's responsible for managing the configuration of the cluster, for managing the queue of all the jobs submitted by the system administrators, and for managing the locks, which control the proper execution of all these jobs. Then there is also ConfD, which is the configuration demon, which basically is another demon that gives access to the configuration, but it's meant to be read-only, and it's meant to be accessible even by other nodes of the cluster. ConfD is also meant to be robust, because there is one ConfD running not only on the master node, but also on all the master candidates. It has its own protocol, which is based on UDP. Basically, when you want to know something about the configuration of the cluster, you can ask all the master candidates or a subset of them, an information about the configuration of the cluster, and as long as at least one of the master candidates is still alive, you will get back this information. So it's something meant to be robust and be always available, even in the worst case of your cluster being seriously damaged. And then, of course, you have the node demon, which is running on the node capable, on the VM capable nodes, which is the one receiving commands from master D and actually executing them. So basically it's the one calling KVM, send whatever to do the actual stuff. How can you install virtual machines in GANETI? Basically, for every virtual machine, for every instance, as we call them, you need to have an operating system which is associated, because, for example, at some point, you might want to reinstall the instance and you want to reinstall it exactly in the same situation. So we have what we call the operating system, which is actually no more than a bunch of scripts. We have a create script, import, export, rename. So with these scripts, you can create your operating system. These scripts can just unpack some tar-compressed image. They might be running an entire external tool like the bootstrap for downloading and creating your operating system, whatever. It's scripts executed with root powers from the target node and that's it. For this reason, the images that you are installing need to be trusted. So GANETI, as I said, is meant for virtualization, is not meant for cloud services. So it's supposed that when you are installing an operating system, this is a system administrator who put the image of that operating system there. That's why right now we are using this approach. For example, the guys from GANET, which are running a cloud infrastructure, through these scripts, they do something which is more secure. Basically, the scripts are just running a virtual machine and inside that virtual machine, the actual process of creating and updating and personalizing the operating system takes place. So it's in a safe environment separated from the actual node of the cluster. This is just to give you a bunch of examples. We have defined an API for these scripts. So these scripts will receive a bunch of information from GANETI and therefore many people implemented different installation scripts. The bootstrap is the reference one. It can install every operating system that the bootstrap provides you and it's the one we develop. Then there is GANETI instance image, which is based on creating operating systems from already existing images. Cinefo image provides an encapsulated environment for the installation. GANETI OS depth is one of the oldest one. Actually, I'm not even really sure that it's any more compatible with nowadays GANETI. Basically, it was used for installing a VM starting from some ISO image of some CD-ROM and then you would manually install it. So this was a little bit of overview about what GANETI is and what GANETI can do. GANETI is quite an old project. Sometimes we get asked why did Google decide to develop GANETI and not using OpenStack? Well, because when Google started developing GANETI, OpenStack was still really, really far away in the future to come. So it was not existing. But of course, the fact that it exists since a long time also means that we keep implementing new features. And now I'm going to present some of these new features released in GANETI 2.8, which came out last week. One of these is the monitoring daemon. So the idea of the monitoring daemon is to provide information about the cluster state and about the cluster health. So this information can be computed automatically by the daemon itself. And it will be able, for example, to tell you whether some instance is running correctly and it will export it in an easily parseable format. Because the idea is to have this information live or it only, but we don't want to write a general-purpose monitoring system. We don't want to compete with Nagios or a pacemaker or whatever else. We just want to integrate with them. We want our daemon to provide these systems with easily parseable information. But this information will be about the internal part of GANETI itself. So something that Nagios or pacemaker wouldn't really be able to find out because it would need to take a lot of information about the cluster and finding out what this information actually means. We provide this thing already. How is it implemented? Well, basically, it's an HTTP daemon. It also has a REST-like API. Actually, it's less than REST-like. It's meant just for giving information, not for modifying the state of the cluster. So it's actually just getRequest. Provides replies in JSON format. Why JSON? Because it's easy. You can parse it with every language. And because it's already used inside the rest of GANETI basically everywhere, so we didn't really want to add more libraries for parsing some sort of serialization language for no reason. Another potentially interesting thing is that it's optional. So if you want it, you have it. Otherwise, it's no additional security risk as posed on the network. It's just don't even configure it and it will not run. It's also implemented in Haskell based on the SNAP library if you are interested. And it's dependent on ConfD. ConfD is the configuration daemon, which I described before in the overview of the system. And it's also not required. So if you want, you can just compile everything without ConfD, without the monitoring daemon, and the core of GANETI will keep working. So where is the monitoring daemon inside a GANETI itself? It's running on every node. Before I described the architecture, master daemon is running only on master, not daemon is running only on the VM capable. Why monitoring daemon is running everywhere? Because we wanted to provide information about all the cluster. Clearly, it will not provide the same information on all nodes. It will provide the information about the daemons that are running there. It will, for example, not provide information about the hypervisor. If it's a non-VM capable node, it will only provide information about the storage, in this case, and so on. But we want it to be everywhere, and we want it to be as stable and as always running as possible so that you can actually use it to find out what is going wrong when your cluster is not behaving correctly. Oh, by the way, if you have questions at any time, you just ask. Feel free to stop me. So what information does the monitoring daemon provide? There is a bunch of information. Currently, it's providing information about the status of the instances. For now, only for Xen instances, just because we didn't yet have time to implement the KDN version and the LXC1. It's providing information about the status of the disks, specifically taken from the PROC disk stats. It's providing information about the LVM logical volumes, which are used in the system. It's providing information about the status of the RBD, and it's providing information about the average load of the CPU on the various nodes. Soon enough, we are planning to implement the instance status also for KDN. We want to provide information about all the demos that are part of GANET itself and that are running on the nodes. We want to provide information about the resources that are available inside the app provider and the resources which are available on the physical node. So, if you noticed in the previous slide, sometimes I was saying that some data collector was providing status information. Sometimes I didn't say this. This is actually a really specific difference inside the monitoring demo because we have two possible kinds of data collectors. One is the performance reporting data collectors, which means they only provide data as is. They take the information somewhere, they provide it to you in a nicely formatted format, and that's it. There is no interpretation. But then we also have the status reporting data collectors, which are the ones where GANET is actually able to infer something from the data it has, and it will provide you this information in that specific information about whether GANET considers that node or that resource in general to be healthy, to be not really healthy, but GANET knows how it can fix it. And so it's working on a fix. So just in case I'm fixing it, please don't touch it. I know it's wrong. Don't do anything. I'm taking care of that. Unknown, of course, can happen that GANET is not able to say anything. So it's worse than being broken but out of fix and not as bad as being broken, which is the other possible state. What is the format of the replies provided by the monitoring agent? As I said, it's JSON. It's basically a list of all the collectors that you ask the monitoring agent to run, and every collector then provide its own report. But we didn't want just to provide a bunch of JSON data. We also wanted this JSON data to make sense, in a way, and to be a bit more structured. So each of these reports actually has a common format which is well-defined. So you know what to expect and you know what data you are receiving. Specifically, this is the format of the report. Contains, of course, the name of the collector itself. Contains the version number of the collector and the version of the format. We decided to have these two specific things because with the format version, you know if you need a new parser for the same data, but maybe that the format was maintained but the collector was updated and now provides better information, more accurate. Then we have a timestamp, which, as you can see, is in nanoseconds just because for some things you might really want to have the detail. But of course, many collectors are not that precise so they will just pad the number with zeros. Then you have a category. The category is really important and I will explain more about that later because it allows you to say what is the actual data of the collector. Kind is just status reporting or performance reporting, as I described before. Data is theoretically free format. Every data collector is able to provide its own information in there but even here we didn't really want to give complete randomness. So we introduced the concept of category. Some data collectors can belong to a category and in that case, the category will say, okay, this collector will provide at the very least this set of information so you know what fields you will find in here. Then the specific collector maybe will provide more information but a basic set is sure to be there. Regarding the kind of data collector, if it's performance or status, of course if it's performance, it's just data. So it can be whatever. If it's a status data collector, we want to have a specific way that you can go and find out whether the system is healthy or not. Which means that inside the data section, you will surely find the status variable which will be made of two components. One is the code, which will immediately tell you if the system is working as intended, if it's being auto-repaired, if it's in an unknown and therefore potentially dangerous state to get ready your pagers or if there are really problems and so the CIS admin has to do something manually. And then of course there is also a message where the data collector can provide a more specific description. So okay, if everything is working as intended, this message will be optional, but if there is a problem, the data collector is supposed to write what the problem was. So you can see this as a summary of every other information in the data field. If you have a status data collector, just go and look at the status field. If everything is okay, ignore all the rest. You won't probably care. If there is a problem, well, the rest of the data will help you understand even more what the problem is, but you just need to read this one to know that, okay, it's a bad time, something wrong is going on. A bit more goodies. There is actually not only the monitoring demon itself. There is also this tool, which we call MonCollector, which is a common line tool, really simple. It contains the same collectors and provides the output in the same format, but instead of being listening on a network connection on HTTP, this one runs locally, provides output on the standard output, and why? Basically, this is either for quick checks done by the system administrators, or it's for some local scripting, if you really want to script this information without passing to an HTTP interface, and especially for another reason, because it shares exactly the same code, so it provides the same data in the same format, but it's a physically different executable. So if something is, again, really wrong with your cluster, we want to give you another way to find out what's happening. This is self-contained executable with all the things that are needed to run immediately the collectors. So even if the demon is not running because it failed or is not accessible over the network or whatever, you always have another way to try and find out something about your system. By the way, how many of you didn't know something about GANETI before? Okay, how many have actually tried it? Okay, at least somebody, which is not bad. Yes, there was a question. The question about this, is this only for stuff that's in context on the node that you run it on, or is it C to whole cluster? No, so something that I did not specify before. The monitoring demon only provides information related to the node it's running on. So if you want to have information about all the cluster, currently you have to query all the nodes, and it's meant this way. Theoretically, we might want some data to implement some master monitoring demon, but it's not there. Right now it's one node, one demon, and one collector is the same. It runs only locally. It might happen though that one collector to provide more complete information, we'll have to call confd to ask some information about the cluster configuration, and of course, if the cluster is badly damaged, that won't work. But in general, all the implementation is made in such a way that if some information that should come from the net, or from running some local command is not available, it will still leave that field empty, but try to output as much as possible. They say it's made in such a way that it tries actively hard not to crash. It will provide some information. Maybe it's not so complete as you would like, but whatever. It means that your cluster needs some love at the time. So what is where exactly? The Gannetti development especially since October November last year has been switching to a quarterly release schedule. So of course, the implementation of all of this is happening over a bunch of releases. Specifically, Mon Collector and the DRBD Data Collector, which was the first one to be written where it is in Gannetti 2.7. The monitoring demon itself is in Gannetti 2.8, which came out last week. 2.9, which is currently in RC, if I recall correctly, is going to provide more data collector. So specifically the one for logical volumes, the one, the instance status collector for XEN, and the one for information from PROC disk stats. And the CPU load collector will be in Gannetti 2.10. As I promised, more information. Yeah. So here in this, when you talk about information, it's about, what is that? Sorry, as well? Information is about load. It's not about instance. It depends. Actually, it's about all the instances in one node, for example. If you run the instance status collector, it will provide you information about all the instances, the old XEN instances that are running on that node. So actually it's about informations that are available in a node or related to a node, but they can be at different levels. Also, if you look at the data collectors regarding hard drives in some way or storage, we have something which is quite low level. Disk stats is how the hardware is behaving. We have logical volumes, a bit higher. We have DRBD, even a bit higher. And if you take into account these three together, they are made in such a way that each one of them is providing you some information that you can use as a foreign key with respect to the data provided by the other one, so that you can, for example, say, okay, I have a problem in the DRBD volume in some instance, because this one will also tell you, okay, let's say from another point of view. I have a problem in one instance. I see that it's a problem related to the disks. Disk collector will tell you this instance is running over this DRBD minor. From that DRBD minor, you know on what logical volume it is. From that logical volume, you know on what physical drive it is, and you can go from the instance to the lowest possible level following this change. So it's trying to provide you as many information as possible that can be used by an external tool to take decisions to find out what's wrong about the cluster. So categories. As I said, there is a bunch of categories that are usually defining a minimal set of information that every collector has to provide. We have the storage category, which is for the collectors, which I just listed, which indicates that those collectors will gait our data about the storage subsystem, that will provide information about different granularity levels, about physical disks, partitions, logical volumes, and so on. And that is always possible to trace back to the instance. So the storage collectors are built for this reason. But they don't actually provide any common field. It's just, okay, this will be about storage, and you will also always be able to go along the entire chain from the instance to the level you are interested to. Then we have the hypervisor category, which is actually the less defined category right now because there is no data collector yet, which will be about hypervisors. As I said, we mean to implement some data collectors will provide information about what the free or use memory in one node is, about the number of CPUs available, CPU average load. For sure it will be performance reporting collector, but everything else for the hypervisor is still a bit undefined. Then there is the demon category, which tells you that we will provide information about the memory, which is used by a specific demon. At the time of the demon, we want to know if maybe the demon, okay, now it's running, but actually just crashed yesterday and you didn't even realize. At time. The amount of CPU that that specific demon is using, maybe it's still running, but it's using 100% of the CPU because something is wrong somewhere. We want to know. And of course, with this information plus more specific for the demon, we can actually say if it's a normal behavior or not. And therefore it's a status reporting kind of collectors. It will say, okay, this is healthy, this is broken. Let's try to find out what's going on. And there will be one collector for every demon. So we will have a collector for MasterD. We will have a collector for the configuration demon, one for the node demon, and so on. Regarding instances, right now we have the collector for the status of sending instances. As the name says, it will be a status reporting collector. And both this one and the one for KVM will provide at the very least, name unique identifier for the instance, which is the one used internally by Gannetti for tracking all these instances. Admin state, which is what the admin want the instance to be, up or down. Actual state, what the instance is doing. Up time, timestamp of the latest modification to the configuration related to that instance. And also state reason, which is something quite interesting. Basically, this is a chain containing information of all the parts of the system that have been crossed by the last command that changed the state of the instance. So for example, it will tell you at that given timestamp, a user used this command line tool to perform a change on the instance. That command line tool activated this internal, created this job, which activated this opcode, which made your modification. So it's trying to track what's going on inside Gannetti and potentially also outside because you can specify the first level of this chain and it will be recorded. So through this, you will always be able to find out what changed the instance last. Something more, something that was introduced only that actually will be introduced in Gannetti to 10. It's already implemented, but it's still not out there. It's only in our Git repositories. The difference between stateful data collectors and stateless data collector. Most of what I described to you up to now were stateless data collectors. It means you run the collector, it goes, fetches the data, prepares the output, gives you the output. But some collectors, as the one for the average load of the CPUs, doesn't really want to do that. As the name says, it's an average. You want to know, wants to know what's happening over time. So this kind of collectors is run not at the request of the user. It's run automatically in a timed way by the demon itself, collects data, and when the user is calling the collector, it's just activating a reporting function, which will just go and assemble the data in some way and print a nicely done report. For now, the one about the average load is the only one existing, but of course, in the future, we will have many of them. And the current implementation supports a single timer event at, let's say, clock cycle of this timer. All the collection function will be run, but of course, we want to be able to specify a different collection time for every specific data collector in the future. So this was the description about the monitoring agent. Some question about that? Okay, okay, gone. Another interesting tool, which was introduced in GANETT-2.8 as well, is the HAREP, which is our auto repair tool. Basically, before GANETT-2.8, there was no self-repair. The only thing that could happen was restarting automatically a crash instance through the watcher, but the self-repair of the instances wasn't really possible. Whereas, for example, if a DRBD of an instance is broken, you would have manually to fail it over, then trigger a disk replacement, and then everything would be fine again. Or if a plain instance is broken, okay, the disk is gone. The only thing you can do is to recreate it and reinstall, but still would be completely manual. Sometimes you want this to be automated, especially when you have many, many instances. It can be useful. Therefore, we introduce HAREP, which allows you to auto repair your cluster. Basically, what it does, well, it's meant to be run by Chrome automatically every now and then, and the admin, of course, can allow or disallow specific repairs, because as you can imagine, some of the repair operations might also be a bit dangerous so you don't always want them to happen. So, first of all, if you have no idea what HAREP is and you just try to execute it just in case, well, it's not a good idea if your RSS admin to do something like that, but HAREP is going to prevent you to do anything bad. It will just start, have a look at the system, note what is wrong, and that's it. It will print an output saying, okay, this instance is a problem, but it will do absolutely nothing because HAREP will only act on instances where this is admin as manually add or do some tool, added a tag authorizing the auto repair operation. Tag that will have this shape and the type. The type specifically will say what kind of auto repair operation is allowed and this auto repair operation can be of different and increasing levels of importance and of danger. So you can just say, fix storage, which means replace disks or fix the backend without affecting the instance itself. Basically, this is possible only for the RBD instances or with instances who have some kind of replication mechanism. If the secondary is broken, okay, whatever, shut down the secondary, recline the data, recreate the secondary, everything is fine, the instance is still running, nobody notices anything. Migration, well, sometimes you know that the node is not in a good state, but the instance is still running, you decide to move it away to its secondary. But of course, migration can always go wrong for some bad reasons, so you want to be able to decide that you want to do it. Failover, even worse. Live migration is not possible. Should I try shutting it down and starting it again somewhere else? If the tag is allowing me, why not? Worst case, reinstall, everything is destroyed. The only thing we can do is recreated the instance from scratch, so okay, the service will be there, but all the status of the instance will be completely lost, both the running state and the actual drive. It's something you might want to be automated, but of course, you need to approve it. So these tags can be added to the single instance, they can be added to the node group, and the orbit can be added to the entire cluster, because sometimes you don't want to add them manually to every single instance. But what happens if there is some authorization conflicts between all these tags? Basically, the conflicts are resolved through these really simple rules. If there are two tags in the same object, the list destructive will take precedence, and if there are two tags across different objects, the nearest one will win. So in a really simple example, if you have a cluster with two instances, I1 and I2, and I1 has failover, whereas the cluster has fixed storage and reinstall, what happens is that I1 will take failover because it's the closest one, whereas fixed storage will be for instance two, coming from cluster and taking the lowest level between fixed storage and reinstall. You can also prevent auto-repair. Sometimes you have configured all your wonderful auto-repair on all your cluster, and then there is something you really want to do manual to one instance, and you know that auto-repair will interfere with what you are doing. You don't want to remove all the configuration, so you just say suspend the repair for this instance, this node group or the entire cluster for a while. You can also specify an automatic aspiration timestamp, so it's really convenient from that point of view. The operation will go on, and at the end, it will leave you a tag describing what was done, when, in what way, and if it was successful, a failure, or if it could have been theoretically successful, but the policies actually blocked are itself. To conclude my presentation, I will also give you a small outlook of what happened during the Google Summer of Code for GANETI as a mentoring organization for the first time. So I guess most of you already know, but really quickly, the Google Summer of Code is a Google-funded program for post-secondary students who want to work on some open-source software over the summer, and they get paid fairly well for doing this, and students are paired with the mentor from the participating organizations, and for GANETI, it was the first time as a mentoring organization. It might seem funny, but even if we are Google, we were not at all part of the Google Summer of Code program, and in order to enter in the Google Summer of Code program, we had to apply as every other open-source project in the world. So we had to go through the selection process, which makes sense, in my opinions. No special treatment for anybody, I really loved it. So 12 students applied, we had the five slots, so we managed to accept these five students, and your mileage may vary. Actually, it's really the case. So we had a project which was to implement the support for GlasterFS inside GANETI, so that GANETI would automatically, first of all, user-mode support, which is present in KVM, we would like to have not automated ways to deal with that, and then to also introduce support for the actual GlasterFS blocker device as a possible share file system to be used, or automatic configuration of a Glaster cluster, which would be even better, and the student was failed at midterm. It didn't even manage to get to the end of the project, because in the specific case of the student, I'm not going to say any name, but everybody on the GANETI mailing list, we remember the student, he was not really independent. Let's say that at the hottest part of his participation in the project, he managed to send me, I was his mentor, in 24 hours, 26 emails, plus public IRC, plus private IRC. Basically, his approach was, I have a problem, I go and look at the code, no, I send an email. So not so nice. So unfortunately, it didn't work out. But now we have somebody who is working internally, we have an intern, was working internally on this project. Then we have the exactly opposite, we have another project, it was to modify our tools for allocating or rebalancing the cluster, which currently are based only, we're based only on configuration information. So the number of CPUs that your cluster had, we wanted to have the possibility to use CPU load average for deciding where to move instances. And this project was successfully completed, it will appear in GANETI 210, and actually the author is Pierce, who's here in the room. So if you want an applause for Pierce. Then we had another problem with another project, which was implementing OpenVswitch support in GANETI, for having virtual networks. It was not bad, it was supposed to implement the management of the switches on the cluster and the creation setting up and connection of instances to VLANs. There is a basic support, which is working now. You can set up the switches, you can connect instances to the switch. So basic support working successful project overall. Then we had support for the huge pages as implemented by KVM, the student just disappeared. Who knows, as anybody seen our students, we didn't. And finally, there was another project for improving the already existing rados of supporting GANETI. Basically, we wanted to have the KVM user space support, also in this case. So basically it was a twin project with the cluster one. And also this student was failed at midterm because it wasn't really able to contribute any code by that time and the rules of the Google Summer of Code say by midterm, there should be some actual visible contribution. So what are our lessons learned from the Google Summer of Code? Which is a great opportunity, but it can also be a great resource hog. So basically what we think to do if we go on with the project next year is to be stricter in the selection process. The students usually in the Google Summer of Code are selected on the base of a project they present. They say, I want to do this, this is my time scale, which I'm envisioning, and you have to decide. Unfortunately, this didn't really work. So all the students presented projects, they were pretty good ones, but then, given how we work, we always ask for a design document before implementing the actual feature. And just writing the design documents took the students, most of them, the failing ones, basically one month, two months of the tree and something that were part of the Google Summer of Code. So it didn't really work and there was no real correct interaction with the community about this. So probably if we decide to participate again, we will ask the design document as the project, as the presentation of the project. So we can, or at least if not the final approved design document that of course can take some time, at least a first version of the design document already of such a quality to be sent as a proposed patch to the set of design documents. So this is what we'll probably do. This was our experience and this is my talk. So some content is taken from slides made by my colleagues and presented in various other venues. But feel free, if you have any question, I'm here. Anybody? Can you give a solution to the available commercial one? Sorry? Couple of words about what? Can you give a solution to commercial life? Oh, so personally, I didn't use much, I didn't, I don't have much experience with commercial solutions. So I'm not really able to give a general overview. If you have some specific question in mind, maybe I can tell you how Gannett is doing things. But first of all, the main benefit, as said in many other talks in these days, is that being an open source solution, you know that it's going to be around or worst case, you can still be able to modify the source code. And you don't really have issues with licensing, which can happens with commercial solutions, where if you want to scale your infrastructure, well, now I need a bigger license. What's the cost of the bigger license? Well, a lot. With Gannett being completely open source, you don't have this problem. So from what I know about OpenStack, for example, my idea is that OpenStack can be more flexible. It has a lot of components, and this can be really good, but it can also be bad from the point of view of the initial setup, probably. Because when you have so many components, yes, you can substitute any of them, but you still need to set up all of them. Gannett, it has various demons, but it's a single package, it's a single software. So it's easier to set up at least initially, in my opinion, and it's already able to scale a lot. So if you want to start from a brand new system, implementing something from scratch, I think Gannett might be an easier solution. Whereas if you want to integrate with something that is already existing, or you have some specific needs for some component which you cannot really change, maybe OpenStack can be better before, because it's a stack, whereas Gannett is a single product. This is my impression, my opinion at least. What's the state of LXC support? So that's the weak point. We have the support, but it's not actively being worked on. So, and actually I don't even know of anybody actively running LXC. So what is there? Probably still working, but it's not actively supported. Whereas Google internally runs Xen, so of course we have complete support for Xen, and basically everybody that I know in the list of other projects using Gannett are running KVM. So both Xen and KVM are perfectly tested and known to work really well. There was some, okay. Are you running any additional layer to do the same task we've done? What I was seeing out there was like a live migration. KVM and Xen has their own mechanism for that. Do you do something like you use there? No, actually what Gannett does is using their mechanisms, but the difficult part there is to ensure that you have the data also on the node where you are migrating. So Gannett is helping you by automatically managing all the data replication either over the RBDE or over some other shared file system, depending on what you want. We have support for many of them. And that's the thing that Gannett is actually doing is making your life easier in general. It's not really adding new features. You might do the same thing manually, but it takes a lot of time to set up, to maintain them and so on. Whereas with Gannett it's just, okay, I want to do a migration. GNT instance migrate name of the instance. You don't even need to say where because Gannett already knows what the secondary is, so it will just migrate on the other node. Then if you want, you can change the new secondary to be something else and you go on with your life and you can shut down the old node because it was broken or whatever. I was just asking if you use, for example, Xenapi for that. Yeah, we are based on the existing solutions. We just take away the difficult part of that. And there was another passing back there, I think. Okay, perfect.