 If you're late, too bad. Good afternoon, everyone. I'm Nick Barsett. And I'm Jordan Del Rue. So the former PTL is a new PTL for Cilometer. So today, we are going to do a little introduction to what Cilometer is about. And we decided to title this presentation from metering to metrics. As we decided during the last summit to make a little switch. So when we started about a year ago, the question that led to the creation of the Cilometer project was, hey, what about billing in OpenStack? And of course, it's not the purpose of OpenStack to actually provide a billing function. But it was really the need in OpenStack to have what was needed to do billing. And actually, what about billing was the name of the session we presented six months ago together with Doug Hellman. When we decided to start Cilometer to solve the first step of the billing issue, metering was our focus. We really wanted to provide all the information in one single place, collecting all the metrics that you would need in order to produce a bill in the end. But we don't want to do billing. So we would stop at metering. Right after metering, when you do billing, is a phase called rating, which is transforming the metric into a dollar amount or a currency amount. This is something that is not part of Cilometer. Right after that, there is a phase which is sending or generating a bill, sending it, collecting the fund. We don't want to do that in Cilometer. And we didn't want to do that. We still don't want to do that. We started the project a year ago. This is the list of people that have companies that have contributed to Cilometer since. Did we forget anyone? I don't think so. There's a long change. We have DREAMOS, which is there since day one with Inovance, which is a big contributor to Cilometer. We both work for Inovance. Yeah. HNT worked with us, too. And we're at, obviously, a lot in Chaldele and Ubuntu. So I see a couple contributors. Can you stand up? Yeah, there's a winner from Red Hat. Who else is there? No one else? Anybody? Yeah. OK. Nice to meet you. Anybody using Cilometer in this room? Nice. Wow. Nice. So we'll talk about it, but there is a session where we want to see everybody that uses Cilometer in. Yeah. That would be on Wednesday. Wednesday, yeah. That's our second one, the last of the day. We have a list of the sessions a little bit later in the presentation. So for Folsom, this is what we had for goal. Not going to read this lengthy sentence, but basically it's saying that we were focusing on metering, which I already said. For Grizzly, we decided to extend. From metering, we went into metrics in general. We decided to extend the scope of Cilometer to allow anyone that needed to collect information from OpenStack to have a single place to do it from. So actually, we would do the collecting, and we would allow the sending of this collected information to multiple destination. That was the result of the Grizzly Summit six months ago. And I think we are quite happy with the result we got. Julien is going to do a quick overview of the architecture in a few slides, so I'll leave that to him. So the great thing is that so far, because the summit does not happen, the objective for Havana have not changed. But please check again at the end of the week. It may have changed at that time. So this is globally the workflow Cilometer uses, collect, transform, publish, store, and read. Julien, do you want to give a little bit more detail? The great new stuff in Grizzly is actually transform and publish. We did not have before. We did only collect, then store, and read. So we're like this in Grizzly. I'm going to introduce you a bit of this works and what you can do with this. We're going to have a more deeper session on Wednesday too about the architecture of Cilometer. So thanks. So this is mainly how we do the collection of metrics. This is accurate to 99%. I have simplified some bits, but there's two way to collect data. The green one and the best one is via the notification bus, which is part of Oslo. You may have heard of it. It's used by a lot of projects. We also have the API, which is used by the agents in Cilometer to collect data we don't have on the notification bus. For example, if you are going to meet your clients, you can't know how many images to not have without asking the API. So you have to pull regularly and ask for it. Overmetrics are sent when you're lucky on the notification bus. So everything is collected by two components in Cilometer, the collector and the agents. We send all of these metrics to our systems via a pipeline, which I'm going to describe right now. So just before switching slide, of course, the goal of Cilometer is to do as little as possible because you are lazy. So if every project could send information on the bus, this is what we would want. Unfortunately, this is not the case. This is why we have to multiply ways we go and fetch data from. But if you lead a project in OpenStack or elsewhere, please send your information on the bus. Yeah, I would like to challenge this for every project, but it's a long way. And I just noticed I wrote a new InGrizzly on the compute storage stuff. We now have posters and meters for Swift InGrizzly we didn't have yet for some, so that's a new stuff too. I was a pipeline system for something we designed InGrizzly. So the basics is you have a meter. So you get from notification or from the API. And you're going to be able to transform this meter into one another or into more meters if you want to. You're going to publish them or the meter to one or multiple receivers. So each step is going to mutate your meters if needed. It's an optional step. And then publish them directly to a receiver, which can be a centimeter. So the transformer is an handy way to change your meters into something different. For example, when you have a system like CloudWatch, like system we imagine it, back then you're going to have a meter from Nova, which is a CPU time from instance. So we get this from Nova and from Levit actually as sequence. So it's CPU dot time counters. But the thing is your system might not be able to treat this data as it is. So maybe it will want a percentage over CPU, a core usage. So you're going to be able with a transformer to transform this data, these meters here, into a new one, which is a percentage and not a sequence. So this is the goal of a transformer. They are designed in very generic way. They have parameters. So you can use one transformer and give it a set of parameters and make it act like you want for your final publisher. So about publishing, so the final step in the pipeline is to publish the meter, obviously, because you want to use it then. And while the default publisher is still the RPC we designed in Falsum, it didn't change a lot in Grizzly, which publish message via AMQP using Oslo RPC standard system to the Cylometer collector. But you cannot hook any of our publisher. We don't have any new publisher in Grizzly. We just have the interface to write new ones. But we know people want absolutely a UDP publisher. So there it is, at least in the diagram. To publish to an external system which can be anything, I mean, billing, monitoring, capacity planning, whatever you want. So really the idea here is we have written all this logic to go fetch all kinds of metrics everywhere in OpenStack. It's not complete. I mean, there can be other. But what we want to do is make sure that we do this work only once. And every time somebody else wants to consume it, rather than rewriting all the logic to go fetch edge metric, we offer them a way to plug into the general mechanism, whether they want to get this data from the Cylometer API, or if they want to get it somewhere else using their own publisher. So for example, if you wanted to collect performance metrics, you need a frequency of sample of about every second, for example. We don't need, when we do metering for billing, we don't need data collected every second. It would be a waste of time and space to store all that. However, with the mechanism of adding multiple frequencies set, the capacity to transformation, and multiple publisher, that is giving us the ability to have one mechanism to collect with multiple destinations. Yeah. And you can configure the time you want to fetch every meter. So that's quite powerful, actually. So actually, you said that there is no other publisher. We actually started one, as you did. I did, yeah. For our learning. Oh, yeah, but it's not a different publisher. We actually hooked on the Cylometer RPC publisher, because you can't do this. Yeah, but you duplicated it so that it would go into a different database. So it turns out to be another publisher. So we know it works. That's what I wanted to get. Yeah, we know. We don't talk about things we don't know. OK, so not the last part, but an important one is the collector. So this part is not optional if you are just publishing to another system. But we still have our collector, which is very handy to do things like billing, et cetera. So the collector didn't change a lot. We still have our architecture with a storage abstraction layer, which I think we are the only one in OpenStack to use, because everyone is tied to SQL alchemy. So we have this abstraction layer, and we are about to use a lot of different backends. We still have the default, which is MongoDB, which is world-tested and works. We have SQL, which is last-tested, but should work. Not every function is implemented. Yeah, and that's true for Edge-based. We have it misses the meta queries in the API. So you will miss some features. And same for Edge-based, which misses the same feature as in SQL, as Edge-based is new and grizzly. We didn't have it. So the base work is there. If you want to use something else in MongoDB, you're welcome to do so. But be ready to contribute code to complete some functions. Some analytical functions in the API are not implemented in the SQL and may not be implemented in the Edge-based, I don't know. No, no. There's a similar feature in the hardware scene. We found, at the end of the cycle, we found a humongous bug in the SQL database that showed that it seemed that nobody else had used it. Not really, at this. Somebody had defined the counter size as being an int instead of a big int in MySQL. You can imagine what happened with a network counter. That was ugly. Anyway, so very well tested. You don't want to think about it. Use Mongo. You want to work with us on it. Feel free to use any of the other two. Or write another one. Yeah. So last part is the API. So once you store a lot of features, you want to read them back. And we use the same structure layer. So you will be able to use obviously the same database back end. We did a lot of change in Grizzly. We built a new API version. So we now have version two of the API. So when we built the version one of the API, we had only one view, which was billing. And we didn't really know where we were going. No, we have a better view. So we simplified a lot the API. We get a lot of useless path and things like that in the request. So it's very simple. We know also merge statistics retrieval in only one code because it's less costly to do this. We have no advanced filtering mechanism, which are the kind of mechanism you won't have in SQL, back ends, and HBase. But works fine in MongoDB for meta data fields. And you can also see something you wrote a few weeks back. You can do statistics by period. So you were about to select a range and divide statistics by hour, day, or whatever you want. So this is going to evolve a lot in Avanacea because we still have a lot of ID. We did a lot of time in Grizzly to write everything. So if you already use Cilometer and built an application that uses a V1 API, don't be worried. We are going to continue on maintaining the V1 API for at least three more releases. We're not going to switch like this API on every release. No, and I think we missed something in version two. The point is most people use version one and a bit of version two so far. But we'd like to be able to deprecate version one, but there's a lot of code. We have an version one. We don't have an version two because we are not sure we are needed. So well, if you were trained to port something from version one to version two, well, don't hesitate to raise your hand and say, yeah, I missed something because we don't know everything that people use in version one. Maybe it's still, I don't know, useful. Actually, François Charlier. I don't know if François is still here. Where are you? And I are leading a session on Thursday. We have a schedule at the end. Talking about API improvements that we would like to see. So this is a global architecture. So everything I just said is here. So you've got the agent and the collector fetching the metrics from the API of the Notification Basin tab and publishing their pipeline to an external system or to the collector and storing and reading back with the API from any of your system. Any questions before we go away from the nice diagram that Julian did? It's AMQP. So you should have messaged your own care costs. Yeah. We use Oslo RPC, which is based on AMQP Oslo RPC. It's the same system that I use, or every other quantum use, et cetera. So it's safe. So by default, it's RabbitMQ. But since it's using the Oslo RPC mechanism, I think three backends are supported right now. Cupid, Rabbit, and ZeroMQ should be supported, but I don't think anyone tested it. Owen tested Cupid. Is that correct? OK, so that's working. And it works. Rabbit is what a lot of people use, so that's working as well. ZeroMQ, if you are using it, please let us know. We all mark it as being tested. That answers your question. So you could write a publisher using RCSlog, but we are not using RCSlog at all. Yeah, you should prefer. Yeah. So there is a couple of mechanisms that we have already in there. First, when you use an AMQP queue, the message delivery is pretty much guaranteed. Second, we are signing. It's a simple message signing mechanism that we are using currently. But it's already guaranteed in some way that messages cannot be faked too easily. And third, we are planning on numbering messages so that you cannot have insertion or dilation or loss without knowledge. There is a session led by Sandy Wash on having a double entry mechanism. So maybe using two publishers to verify in two ways that you're not missing information. And that would provide an extra guarantee not losing information. Who was first? I don't know. You were. Yeah, yeah. Yeah. I'm already 24 years old. Yeah, you want to discuss this with Doug. This is something we are going to discuss on Thursday too. Because we don't know how we have a different idea how to solve this. And I honestly don't have the answer. I think the point is not to. Do you have a DDP? You're talking about DDP? Because when you're doing some kind of things like metering for billing, you absolutely don't want to lose. So you will use a message-sending mechanism that is not lossy. However, when you're doing performance statistics, you don't care losing one or two items in the thing. And you want to transmit as much as you can using as much as a bandwidth that you have. So that's why we. We're not dealing with these things. Yeah, exactly. You can couple two things in a row. So you can fetch same measures and some of them more regularly to performance statistics system and extra. No. You've done this during this period of time. No. For now, what we do is that we monitor things via the audit system built in Nova, Contra, et cetera. So we have kind of her bits of system. We don't want to trust Nova saying, I've started an instance and I've stopped it because we might lose some notification or things. So we rely on her bits for now. We plan to add more, I think, in Havana to have more information like when it really started, when it really stopped. But we use her bits. It's more safe for us. And now, we don't have. We add, but it was always, we don't have this in notification. There's no duration in what we get. When we do an API call call, for example, we don't have anything. So we add this in the first version and it was always said to none. So we dropped this. However, there's three types of meters. Jakov, Contra's there. You have gauge or delta or things like that to know what kind of value you're going to get. But it's not tight. But it won't solve your interval. In fact, we couldn't find any use for it. Sorry. There is a session right after this one at 5.20. It's in the unconference. I don't know in what room. A104, just there. So please join us for the discussion if you want to know more about that, because we don't know yet. Oh, OK. Sorry about that. The last question was, what about kilometer and health and mom? And the answer is, come join us. You had a question. Was it answered in the middle? I guess so. You're looking at your phone. Somebody in the back. We would love for that to happen. Yes. They can. There is quite a good documentation on how to add additional meters. At the moment, we've extended for a specific customer a kilometer to grab additional information. We know it's easy to do. Julia has done that many times. But well, if you are working for Nesera, Midokura, Big Switch, or anyone else, and once you add meter, come see us. We are very open. Oh, yes. We don't have a slide detailing the metadata, right? I know. OK, so the question is, how we received these tickets, these events saying, this has been consumed. How do we tie it back to a tenant or a project ID and a user ID? And the answer is, in each one of the message, we have a payload which contains lots of information. One of this information is a tenant ID, the user ID, and everything else that we can grab from the metadata. For example, in NOVA, it's all of the instance metadata that is carried. So you know what type of instance, how much, well, all the data you've got. Favor and things like that. So the question is, do we have a mechanism to handle the transaction for the billing system? And the answer is, we've got an API, which is a REST API, which people can use to build transaction on top, but the REST API, by definition, is non-transactional. We don't handle anything which would be billing specifics or business logic, often like that. So we're going to move on a few more slides before we accept additional questions, because it would be stupid that we cannot finish a presentation. Yeah. Well, it's quick anyway. OK, roadmap. So Grizzly, this is all we wanted to do. Everything with a check mark is stuff that has been done. So as you can see, we do less than what we wanted, but we did quite a bit. I mean, we become incubated, and we even graduated from incubation during that cycle. We implemented Swift. We implemented the SQL Alchemy Storage Driver, even HBase, which was not planned when we started the cycle. This is what we presented in Grizzly six months ago. This is the exact list of what we worked, so we are good. The API, the user-accessible API, that's quite nice if you want to have a plug-in in Horizon to displaying user information. Multidimension, that's actually fairly nice. That's the filter notion you mentioned. And that's another great contribution from Owen, correct? That was you, or? I guess. I guess, sorry. The other red-eyed guy. They wear their hat all the time, so we can't distinguish them. One is based in the UK, and the other one in Australia, but that's a very small difference. Havana. This is currently what we are going to be discussing this week, right? Yeah, mainly, yeah. And I'm sure this list will be modified as we discuss. That's the point. I mean, if we meet all together to do designs, is to sort out our ideas, and we generally come up with new ones. This are the hot topics of this week, especially alarming and working with it. We have the third of our session about alarming, I guess, and we have a few about the API. And integrating with Novachov, which would be a way to drop the agents and the API core with you, et cetera. So if you followed the HEAT project, which incubated and got integrated about exactly at the same time as the limiter, currently inside of HEAT, there is a function to do alarming or alerting. When certain thresholds are reached, then you need to take actions. This is something that in due time, we want to be able to do in a more generic fashion and externalize out of HEAT and integrate into Cilometer. But to do it well means to do it slowly with the proper design. This is going to be, I think, the most interesting session of this summit from my point of view, at least. Should we move on? Oh, by the way, for I, we don't know yet what we're going to be doing. We're going to do something. Bring your ideas. So that's what I just mentioned. So we are going to try to provide a very generic way of doing alarming inside of OpenStack, which should allow HEAT to drop their more specific implementation to use what will be in Cilometer, OK? So the idea is to have something much better than CloudWatch. A little ambitious here, but. That would be auto-scaling, but it's tight. But we won't do auto-scaling. That's its jobs, but we'll be in between. But the thing is, how do you know the thing is busy? Being busy is a very complex definition. Generally, you need to aggregate multiple meters to decide whether you're busy or you're just having a high load. If you were basing everything on your load metric, you would do auto-scaling for no reason. Maybe you need to combine that with CPU usage and maybe some network metric and number of processes. I don't know. Whatever you think is valid for your application. Then you're going to have, when this set of threshold is reached, then you need to action something. Something can be heat auto-scaling mechanism. Another thing could be, let's get the connection so that the user don't bother my application anymore. That's for you to decide what's going to happen there. And Cilometer is going to try to meter information, detect, threshold being reached, or getting back to normal, and send events to processing engines, which could be heat, but which could be whatever you want. Basically, we are in the very early stage of defining an API where you're going to be calling back, well, Cilometer is going to be calling back a specific URL with a set of parameters telling it, this is what is happening. So when you set an alarm, you set where you do the callback. Yeah, for heat, for example. Heat will be our first consumer, but you could have all kinds of consumer for alarming. Yeah? So Schmeil, who works for Inovance as well, implemented quotas in Swift. So you won't need alarming in order to do that. Just use the middleware that he wrote. Yeah, but you could. So this is a list of use cases that we think could be valid to use Cilometer as it is today. So of course, rating billing engines would be a proper consumer of Cilometer. All kinds of analytics, whether it's capacity planning or adaptive scheduling algorithm, we've got a session with Nova are evaluating whether it would be a good idea or not, we don't know yet, to replace the scheduler information gathering with Cilometer. That could be a cool thing to do. There is all kinds of simulation, pre-prod, visualization, monitoring use cases that could be used and put in place with Cilometer. I think you've worked on a few cool use cases, haven't you, about usage of Cilometer for something other than billing? No, I just did some interface back then to show some graphing of what you get into Cilometer. We have this in the Cilometer actually in the version one API, we have some sort of a debug interface in HTML where you can see the data you get with a simple graph. So you can do anything you like with this data. OK, this is the section where we advertise the future session. If you think this session was cool, these sessions are going to be even cooler, but they are going to be part of the design summit. So they are going to be discussions, not presentation, but a couple of guys on the stage. They are going to be discussion between people that want to contribute. So if you want to contribute to this discussion and to Cilometer, here are the time and the place. Here is always the room B16. That's in the design summit section in the middle over there. A few links if you want to know more. And how much time do you have left? Do you know when we're supposed to end up at 3.20? So we've got four minutes for questions, for additional questions. Not yet. We have a blueprint about this, but nobody volunteered to do it. But we are not, again, the idea. We can't get. I didn't talk about this, because there's a lot of things in Cilometer. We have a source field in our meters, which indicates from which system the meters come. So by default, it's an open stack. But you can look and send any meters from any system, being a physical load balancer or physical host or anything you'd like. So you can extend. Yeah. Yeah. You mean scalability about the collector? OK, so all the meters are sent on the wire for now. You can have any number of collector running and storing meters in 250. Further round robin are a lot balancing on the collector side. So you can do this with the agents. For example, the example I said about images, you want to port the API. This is not scalable at all. We've got one per Nova compute node. For the agents of Nova, yes. This is scalable, because you run one agent on each Nova compute host. But for the agents on trial, which is doing. Yeah, glance polling, for example. This is not scalable. This is why we'd like to drop it. Because if you run two, three, or four agents pulling the API, you're going to have four more times meters. So you don't want to do that. So basically, you need one per API endpoint and no more. Because if you have two, you'll collect the data twice. So that can be a problem if you don't want to store a lot of meters. So in terms of high availability, it's easy to make it highly available. You just need to have a simple watch and restart it somewhere else. But just make sure you don't collect it twice. It won't do any harm, but it's just a lot of data. Collect unavailability? So it's something that you can derive with the data we collect already. You have to do it with yourself. But it will depend on what you mean by availability. If availability is availability of the infrastructure or of the instance or of a service on top of the instance, the notion is a little different. And we may or may not have the correct metrics. Any other questions? Yes? So assuming we're using the Bradley line kit. Did you get that? No. I'm sorry. I didn't get what you meant with that. Yeah. Actually, the multiple publisher will be for multiple users. We don't plan to have multiple publishers for the same use unless you want to do double entry validation. Yeah. Yeah. But I guess right now it's an AMQP problem, not really a centimeter one, I guess. It depends on how you configure your queue or your configure your bus. It's not really tight to a centimeter. We just use Australia PC to avoid a lot of problems and a lot of thinking about this. So we know it's safe, and it works in a way with different back ends. So you just have to configure your back ends as you want to for metering and things like that. Yeah. Yeah, we could. I don't think we have that for now. So the question I'm going to repeat is are we going to publish a new recommendation on how to configure Rabbit AMQ to guarantee the delivery of messages? Yes, we would. We don't have this right under our end. But if we have this information, we could add this to the documentation. That's no problem. Also, in Grizzly, you should notice that for Rabbit AMQ, there was some improvement on the high availability of Rabbit AMQ, where you can now use what is available natively in Rabbit 3.0 and higher. Same exact same place where this session was posted on. You go under the yellow tab Design Summit, and there is a silo meter section. There is a section for each component of Rabbit 3.0. Yeah. Yeah, it is. Yeah. We store metadata as we get them. If it's blank, it means that it's blank in where we got it from. I have no ID. I didn't check. As much as our DRPC mechanism in Oslo does. Yeah. We use Oslo just to avoid these kind of questions. Probably. Yeah. That's it. I think that's it, and we are done on time. Time's up. Yeah. Thank you very much.