 options that people wanted to care about. So yeah, we concentrated on that and concentrated on usage data collection, which is the only thing that is really common to all clouds. So the possible users for metering actually goes beyond just billing. It also goes into the ability to do some auditing and the ability to do capacity planning. We also had discussions earlier today about triggering alarms based on the metrics. Well, that's for the future. Yes. Right now, with what we have. And as a reminder, technical session, we look forward into the future, what we're going to be delivering. This session, we look at what we've done so far. So we had solved a few problems. And the first one was, we needed to collect information on a per tenant basis for every resources. And so we tried. We needed also to make sure that everything was going into a single place. We needed to make sure that you were able to use this project and contribute to this project and add to this project as much as you could. So we went up and tried to solve these issues. And that led us to building the following team that actually built up over time. When we started talking about billing six months ago, there were a lot of people in the room, lots of interests. But when we started coding, suddenly the group frunk. I don't wonder why. And as we progressed, well, this group started growing again. And we released on Friday last week, version 0.1 of Cilometer. And we've had the great pleasure to work with people working from the six companies, the logos here, Doug, Dreamhost, people from Red Hat, people from Inovance. Anyone in the room? People from Dell, people from AT&T have contributed to this first release. And based on the discussion we had this morning, the whole morning was talking about the future of Cilometer. Looks like we're going to have a lot more contributor for next release, right? I'm looking forward to getting some more input from additional companies, especially users. That's all the shrinking we saw last time. Go up and down. But at least there is lots of good intention. So what did we do? So clearly, we started a project using exactly the same resources that OpenStack uses to do its deployment. We want to be an OpenStack project. We are not yet an official OpenStack project, but that's the goal of Cilometer. Provide an official way in OpenStack to do metering. We, therefore, use Stackforged. We used all the facilities we could. I mean, Jenkins already integrated. Yeah, the infrastructure team has been really, really helpful at getting us set up in exactly the same way as most of the other projects so that we can transition to incubation and to official project stage when that time comes. So when I started presenting six months ago the list of meters we wanted to gather, we completed today about 90% of them, and we'll get into a little bit more detail. But it's what I would call a minimal set of meters. We've got almost everything covered. We are hoping that the technical committee of OpenStack will make us an incubated project for the Grizzly period. Being an incubated project will allow us to say, yeah, we are official, but not yet core. Hoping to be core for the H cycle. So if you are part of the technical committee, please say yes, just because my mom will be proud of me. Want yours or yours? I'm sure she would be. OK, let's go to the next slide. So we've defined a set of requirements in our design. And we, of course, defined that we wanted to have a scalable project. And well, the limit to that is your database. But let's assume that the database back end we have is extremely scalable. We want a kilometer not to be blocking that, right? Then we wanted to have security built in. As we said earlier, three possible targets for monitoring usage, billing, auditing, and capacity planning, billing and auditing both require the ability to ensure that you're not being spoofed. I gave an example earlier today. It would be so easy to make a very bad press for a given cloud if I could find a way to inject lots of false user-age information into a public cloud so that lots of their customers will start complaining about being billed for lots of stuff they've never used. So we need to have a fairly secure way to transmit messages and not that easy to inject false information. We also need to be able to do auditing if somebody complains, or if there are some regulation we need to conform to, we need to be able to verify that there is no information missing and that the measurement have not been tampered with. So we built in non-repeatability into it. A lot of it's a mouthful of the word, but there's a great Wikipedia article on the subject. We also wanted to have a single point to fetch data from. So we worked on an API that, while it's v1, can be made better, but it's also already really nice. We're not done yet, alpha release. And we wanted to be able to have an extensible project. So everything in the design has been built to be extensible in a plug-in fashion. Yeah, like I said this morning, it's plug-ins all the way down, everything from the meters that you collect to the database layer to the API layer. All of that is based on a plug-in architecture. And that's great, because that means that if we forgot about a meter you care about, it's fairly easy for you to edit. We actually also use plug-in mechanism for the storage engine. And that's fairly important, because we didn't simply say, let's use SQL alchemy. You can choose between MySQL and Postgres. But we also went a layer above that, where we wanted to be able to talk to any kind of database, whether it's no SQL or SQL databases. And the first database that we have implemented right now is MongoDB, but we are progressing quite well towards an SQL alchemy back end as well. That means that we've actually got a proven model that can support any kind of database you may want to use, or actually no database at all. Some people have told us we just want to push the data to some RSS feed. Well, instead of writing to a database, you could be writing an RSS feed, or whatever you can think about. And we called it the storage API, but that's really just the final place that the messages go, so you can do whatever you need to there. I think there's a final point on this slide. And the one thing that we are is lazy, right? Absolutely. So being lazy means that we wanted to use as much code that existed. So we used a lot of OpenStack common and a little bit of Nova. That's right, yeah. So we're building on the Nova Services libraries. The version of those are in common right now, they were not when we started. So that's why we started with Nova. But we're using the OpenStack common RPC libraries and some of their other, when in doubt, we're using the common libraries. So actually, the code that Doug actually wrote is. Well, not only Doug, but all the. Chebulian, yeah. Is amazingly concise. We had a lot of good libraries to work with. So it's been a long time I had not released something that was weighting 206K. That's the current weight in kilobytes of a kilo-major. Really? OK, I hadn't even looked at that. That's impressive. Yeah, I knew there was not much there. It's actually useful. That's what is funny about it. And also, we wanted to be able to accept data for many sources. That means that we've got a very flexible way for data to be collected. There's actually three ways, and we'll get into the details when we add the architecture diagram. So three ways that interaction are being dealt with. The first one is, well, we want to be able to collect information when the user does something, like launches a new instance. We also want to be able to make sure that what has been initiated is still happening. Some OpenStack project sends data on a regular or irregular basis. Some other don't. So sometimes we need to be able to pull for the information. Sometimes this information is being sent to us. And the third way is by, well, actually I mentioned the third way, pulling, auditing, listening to the audit message, and listening to the creation and deletion events. So wherever possible, we're consuming the notification events that are coming from the other services. So we have not had to modify the other services very much at all. Well, actually, we did influence Cinder so that they started sending. That's true. But only in the sense that they're sending notifications. It's not custom for us. Exactly. We also define three types of meters. Based on the analysis we made, we needed to be able to have cumulative data. An example of that is an increase in number of instance hours. We wanted to be able to have goges to say, hey, this discrete event has happened. For example, when you assign a floating IP to an instance. And finally, some information is a matter of delta. There has been an increase or a decrease of this much over the past period. So these are the three types of meters that we cover. And here we go. And we design something based on this requirements. And that's the architecture that we started with. We knew we wanted a collector that writes to a database. But maybe we needed to fetch information from something. So we first, a first way to collect information was, hey, all these OpenStack components do send notification information to the common RPC bus, which is now in OpenStack Common. Maybe we should be listening to them and transforming them into events that we'll store in the data store. Second, there were a few things that we couldn't get out of the events and that we couldn't get through a normal polling. And that is information coming out of LibVirt. So we created an agent that runs on every single Nova computer host and that would grab this information from LibVirt locally and push it to our own bus. And third, we had a few services from which we needed to pull information on a regular basis. So there is a central agent that can initiate those API requests to Cinder, Glance, and soon Swift. Yes, in terms of time. It's customizable. By default, we have a global flag to set the frequency to 10 minutes. In a future version, we'll be able to modify that on a per meter basis. But at the moment, it's one setting for all meters. We're using the standard RPC mechanism in OpenStack. So it's Rabit MQ or QP that have been tested so far. They do guarantee delivery. So next step was to be able to extract the data from the data store. And to do this extraction, having an API was a lot better than telling people, hey, here is our database schema because we didn't know what the database schema would be. As I told you, a plug-able mechanism, the database is actually something that may or may not exist. If it exists, it better be able to talk with our API. And that's part of the plug-in for the database. That's right. The API, we have a few examples in a slide or two. I can't remember the order exactly. Yeah, it's coming up soon. But it's a fairly simple REST API. In terms of providing a way for people to understand how Cilometer is working, we are targeting for the next release. Actually, there is a first implementation of this in the review queue to also have a plug-in inside of Horizon that would show usage data inside of Horizon. This is not there yet, but it will be available pretty soon. In terms of scalability of this, well, here the event bus is something that is fairly scalable, as you can have as many servers as you want. Collector is also fairly scalable, as you can have as many collector instances as you want. Central agent, there, as long as two central agents don't collect information from the same host, you can have as many as you want. Event listener is part of the collector that doesn't count. And again, the compute agent is on every host, Nova compute host. Therefore, you just need to make sure that you can connect to the miss HQ, and we are happy with it. So this has to be validated in real life, and Doug is very well-placed to talk about real life. Yes, we'll get into that a little bit later in the presentation. But on the paper, we don't think we have a bottleneck there. No, it's additional data. It's not the same data that we pull. Actually, for example, we will receive an event from Nova that says I've created instance number one. And what's important is that we want to make sure, over time, that instance number one is still there. So the compute agent, which acts a little bit in a polling fashion in this case, will check on a regular basis that instance number one still exists. And we'll send us an event saying it still exists. So this way we can, if Nova has forgotten to send the image, the instance has been destroyed. Or if the instance is destroyed without a user request that can happen, or the bug, we stop billing the customer. So every piece of metering data that we collect is recorded in the database as a separate object or row. And you can get the events back out of the database as well. So even though we might collect, for the same instance, several times, all of that data is available, it doesn't ever overwrite. It isn't important. We are going to get to that in a second. No, no, that was good. So actually, why don't you explain the API since you designed it? Yeah. So we had several different kinds of questions that we wanted to be able to answer with the API. Primarily, at Dreamhost, we were billing for things like instance hours and bandwidth and CPU utilization and things like that. So we needed to be able to find out how long something had been running, how long an instance had been running. And so we have an API for querying the duration. We also needed a min and a max and a total calculator so that we could ask the API what's the maximum bandwidth that was used by a tenant in a certain period of time so that we could adjust the billing for that, and what's the total amount of storage space that their volumes are holding, for example, so that we could charge them for storage. So the API reflects those kinds of questions right now. There are some pieces that we put in the API design that we haven't finished yet. But the basics are there for initial integration with our billing system, and we've actually completed that integration work. We'll talk about that in a little bit. And then as I mentioned before, there's also an API to extract the raw data. So if you have some arcane way that you want to calculate billing that we can't accommodate in the other APIs, you can just pull all of the metering data right out of the database and process it yourself. For example, you wanted to look at the rate of change of something or something like that where we're not calculating that. Then you could do that by looking at the events themselves. And what you see on the screen is just a sample of the APIs. That's right. So the APIs are, there's several different endpoints based on how you want to query. So if you want to get the total for a resource, for a meter, there's an API for that, but there's also an API that lets you get the total for a meter across an entire project or across an entire tenant. And so it's flexible enough to bill at whatever granularity that you actually care about. So at Dreamhost, we're pulling data based on individual instances so that we can do the totaling ourselves and offer discounts at different rates and things like that. But we could also just say, how many instance hours did the user use this month? And something that we don't show here are the parameters we passed to do. That's right. So in addition to the parameters that are part of the URL, you can also pass start and stop times for the query so that you don't get all of the data. So you can run a job every day or every few hours or something like that and do incremental updates of your billing system based on that. So this is basically where we're at. We delivered last week the first Folsom version, a little delayed compared to the project. We think we weren't part of it. We're only a week late. We're just a week late, but that's pretty good. It's pretty good for a project which is not yet part of the project. We made it before the time. We will have to be on time. We covered Nova Glenn-Cynder quantum. Swift is the obvious element missing here, but we heard Inovance saying they will do that next week, so coming soon. And for Grizzly, we already have an interesting roadmap. Do not sell that to your customer today. It's a future statement. Unless you want to sign up to help. Yeah, exactly. But we hope next week the TC will decide that we are incubated. We want to integrate in the API the notion of a user directed API. Currently, the API is only addressed to the admin of the overall infrastructure. It might be useful in some cases for users to be able to fetch their information about their individual case. We want to be able to integrate with Horizon. I already talked a little bit about that. We want to have agent for other projects. The Swift I mentioned. But for example, the Heat project, which is seeking incubation in parallel to Cilometer, might be another good target. And if you have your own project, feel free to come and contribute your metering information there. There could be also some new users for the collector. We'll talk a little bit. Second, we have a slide on possible expansion in the role of Cilometer. And we are going to be completing the SQL Alchemy driver. And in H, we'll be in core. And we'll have a lot of new features, but that will be defined at the next summit. So we are in the Dreamhost. That's what we were doing. So I mentioned that at Dreamhost, we've launched our Dream Compute public cloud. And one of the primary features that we knew that we would need in order to work with OpenStack was a way to get usage data into our billing system. So that's why I'm so heavily involved in a Cilometer project. We have a large database of existing users, and they have services. And we're billing them for those things using our existing billing system. And so we didn't want to build all new tools for that, but we did need to get the data out of the OpenStack metering system into our system. And so that's the tool that we built was a tool to go between the two. We wanted to make sure that we were measuring only the things that we actually cared to bill for. So right now, Cilometer actually measures more than we care about, but it does include all of the features that we need to measure. Instance hours, basically the runtime at certain flavor size, block storage capacity, how much disk space they're using on our Ceph cluster. The number of image uploads, and we haven't decided what we're doing as far as charging for some of these things, but we wanted to think about charging for some of them, so we made sure that these were all included. And then bandwidth, which turns out to be a little bit of an interesting case, the meter that's built into Cilometer measures bandwidth usage at the virtual interface. But that doesn't help us, because what we want to do is charge differently based on public internet bandwidth versus internal to dream host bandwidth. And we can't tell the difference between those two things just by looking at the virtual interface. There are computing resources at dream host that are not inside the cloud cluster, and so we have to be able to differentiate the two. We're collecting data on the router and measuring it in bytes and packets at this point. Does that answer your question? Sure. So the API lets us do the aggregation like that, and we can manipulate that afterward. So I believe we could answer that question, yes. You've got the raw data, you can do the extrapolation you want based on this raw data. OK, so I mentioned that we're charging differently depending on where the traffic goes. And the meter that's built into Cilometer doesn't do that for us, so we actually built a custom meter that runs outside of Cilometer, but sends data to Cilometer to be collected. So that's another area in which Cilometer is extensible. You can send us metering data from whatever source you happen to have, and we'll just collect it in the database and let you query it using the API. So the tool that we built to get the data out of the Cilometer database and into our database for billing is the dream host usage data extractor, or as we call it, the dude. And it uses the Cilometer API exactly as any other client might use the API. And then it uses an API that we built for our billing system to write data in. So we run the dude every day, and he asks about all of the resources that each user has used during the previous day, does a little bit of aggregation on it, and writes that usage data back into our database, into the billing system. So before we go to the questions side, I thought we had a slide maybe when we moved it, or maybe we forgot to do it, about future uses of Cilometer. I think we moved it early up, and we didn't add. Yeah. So future use of Cilometer, based on the discussion we had a couple of hours ago, it seems that Cilometer is going to be evolving more into a framework to do measurements, not specialized only on metering, but allowing metering, but also allowing for other use, such as monitoring, for example. This doesn't mean that we are going to change everything in Cilometer. It means that we are going to extend the extensibility of Cilometer. We have realized that it would be stupid to have three agents on the same machine, collecting the same meters for three different purposes. So the first thing that is quite sure we're going to be doing is share with monitoring tool the ability to use a single set of agent. The same agent should be able to send to the monitoring interface sometimes and send to the metering interface at some other time. But there is also a lot of other possibility that we are currently discussing. The discussion is going to be continued, actually, on Thursday morning at 9 AM, if you want to join us in the unconference room. In Maggie. Maggie. So the user-facing API is in my head, but we haven't written it, so. And I haven't learned to read your mind yet. You haven't? No, it's only been six months. But he reads a lot of meters quite well. So the API should be the same, except that when you do a query, it should be restricted to what you own. You shouldn't be able to see other people. So in Horizon, we want to show an example. So basically, maybe a graph of your usage of the past few days. It could be used and extended for actual real use case. But we don't know that there is a single use case that will satisfy everybody. We don't know that anybody would be interested in the example itself, or in displaying usage data without showing the billing on the side. Exactly. And we're looking for input on that, too, if there are people who would have ideas about what that should show. So do you mean the web API or the internal APIs? Right now, you can control how frequently the agent pulls all of its meters. And the goal is to have it during grizzly, I think we've said we're going to make it so that you can control how frequently it pulls each meter separately. So you could set different rates for CPU utilization versus disk. That remains to be seen how we will do that. So we were thinking config file, but the conversation we had two hours ago made us think that may not be the right way to do it. Right. It depends on how tightly we tie in with the alarm system and the heat. Well, CloudWatch. So at the moment, since the RPC mechanism in Nova doesn't provide any kind of security, we added two things. The first thing is an HMAC-based signature of each message that we send. So that's a payload that's being signed. And this signature is being saved in the database. And it includes a counter for each event that makes sure that you cannot lose an event or add events afterward without breaking the sequence. So that's the actual definition in a few words of what non-reputation is. In the future version, we hope there is a meeting about this. I believe it's tomorrow about increasing the security of Nova RPC that we are going to be joining. And hopefully, we'll base the same mechanism on a PKI system that would be generalizable to all of OpenStack. This is here to be seen. But right now, we focus just on signing the messages that we send. Right. So all of the metering messages are published on the message bus right now. So if you did not want to use our collector and you did not want to use our database or our API server, if you had some other system that you wanted to receive those messages, you could plug into the message bus and get them. And then the REST API lets you query the database using the tools that we've built. Basically, we tried to build everything so that all of the pieces were optional because a lot of people already have some level of this system implemented. And we wanted to get as much benefit as we could reusing the different pieces. You've had your hand up for a while, I'm sorry. Yeah, so we definitely want to explore that. I'm not sure that putting it in common is the right place. We want Solometer to be the place that you go to ask questions about how much or how big. So we want to do the measuring. And there are lots of ways you could consume that. But we're trying to actually get it out of all of the other projects so that everybody isn't implementing the same thing over and over again in different ways. But yeah, a different approach to the same goal, I think. Yeah, the unconference is something that is not scheduled in advance. So it's not in the planning. But if you go in front of the Maggie room, you can add yourself additional conferences. We did that, and it's Thursday at 9 AM. We want to be, when we say we extend the Solometer on the next release to be not only metering, but anything about measurement, we want to be as open as possible in what can consume these measurements. Now, that means that there could be some Apple consuming Solometer in one way or some Orange's consuming Solometer in another way. I'm not sure I understood. Answer your question correctly. We talked to the heat guys. And they were going to implement a lot of the same agents and polling and talking to LibVird and all of the things that we've built over the past six months. So we're trying to find a way to reuse all of that. And will the data formats and a lot of the APIs will be similar or the same? We aren't sure yet exactly how much code we'll be able to share, but the goal is to share as much as we can. And basically, we're trying to reduce the duplication of effort between the two projects. Not only the duplication of effort, but again, it would be stupid to have multiple agents polling for the same information, just because it's for different destinations to increase the load on your servers for no real benefit. You had your hand up. What's the size of your deployment? So actually, if you go into documentation, there is a link to Google spreadsheet where you enter your variable, and it gives you a sum. As an estimate. As an estimate, of course.