 Hello, good evening everybody. So I'm Fabio Giannetti from Cisco and I'm gonna be talking about Monasca and Cilometer. So when we combine Monasca and Cilometer, we come up with something called Cilosca. And I want to acknowledge my co-authors that are in the audience. There is Srinivas Sakamuri from Cisco and Roland Okmuff and Dan Dyer from HP. So in this talk we're gonna look at why we have done such a thing of Cilosca and how Cilometer works, how Cilosca works, and then we're gonna go over some of the performance testing that we have done comparing the two solutions. And we're gonna give you an update of where we are with the code and the functionality we have been implementing. And then what's next on the project. So before I carry on, I would like to have a quick hands up of who knows about Cilometer in the crowd. Okay, good. And who does know about Monasca? Okay, that's a good mix. So the motivations, really we are looking at the operator needs here. So operators in handling modern clouds are handling a lot of stress with the amount of data they need to handle. So it's not just about metrics and events, but they also have to consider logs, checks, and all of this kind of data that is coming for them to evaluate the status and the health of the cloud they are managing. But also they need to be able to do an end to end debugging. Just not enough to go and see an alarm, something red popping up in your console. You need to understand and dig deeper to understand at the end what really caused the problem and how to fix it. They also need to have the visibility to be able to see everything that is going to happen in the cloud at a given time. And the last but not the least important is the resource utilization. They need to understand what is the current capacity, if they have spare capacity, how they can address the requirements for more capacity at any given time, what they can scale back, where they can add, and what they can do. So all of this ended up in a solution requirements, which is pretty complex, because we need to be able to collect a lot of data from different sources. So we need to be able to collect metrics, logs, and all this kind of stuff in one single place. We also need to be very performant and highly scalable, because if we don't scale, basically we won't be able to do all the visibility and debugging that is required for large deployment. We need to be flexible, so we need to have a processing pipeline that will allow us to do different type of evaluations. It's going to be alarming, it's going to be consolidating the data, it's going to be evaluating logs together with metrics and all this kind of stuff. And lastly, the extensibility. So we won't be able to understand or forecast every type of needs and every type of tool that will be integrated. So we need to be able to be able to support some way of extending the platform, so other sources can be later on added. So what is Cilometer? Cilaska, sorry, and what it does and why it's been done. So Cilaska really is in a nutshell, Cilometer built on top of Monaska. And in order to do that, we have done two components. We have extended the publishing agent, and the publishing agent is the tool that collects the OpenStack-related resources data, events, metrics, and things that are happening in OpenStack, and we push this into the Monaska API. The second thing we have done to allow Cilometer to keep operating through the Cilometer API, so for backward compatibility, basically we have implemented a storage driver into the Cilometer code, which talks to the Monaska API and retrieves the data, and does the conversion between the Monaska format and the Cilometer format. So from a client, Cilometer doesn't change at all, it's still the same as it was before. And why we have done that? Why we took the trouble to go and work on this? Well, because we really wanted to have a highly scalable and performance system that supports both telemetry and monitoring. And we believe that there is a big advantage in having all this data consolidated into a single place, because we can fire alarm, correlate in data, we can understand better the status of the cloud, and we can do things that couldn't done before. Moreover, it's easier to maintain a single system at scale than 3, 4, or 5, right? So putting everything in a single place, we believe will give the operators insights and the ability to operate the cloud better. So we see this as a win-win, we don't see any counter-indication of a single unified solution. So looking at the big picture here, the green part is the part that is already available in Monaska, and Monaska currently has been moved to the OpenStack Github, so it's in OpenStack together with the Ciloscope code. So the green part is the current available Monaska, and Monaska is excellent in doing the scalability and it's based on a Microsoft architecture, and I'll talk about it a little bit later. So what we've done is that we attached on top of it the ability of dealing with OpenStack metrics and OpenStack notifications, which is the data that Cilometer is excellent in collecting. There's been a lot of work in integrating and making sure that all the services are able to send the data through the notification bus. And then we laid on top of the Monaska API, Cilometer API. So at the end you have a single system that behaves as both. So the current Cilometer architecture. So how Cilometer works currently, and this, you know, take it with a pinch of salt, this is a very high level view of it. But fundamentally there are the OpenStack services that have the ability of sending data to a notification bus, which is usually implemented by Rabbe. And then the Cilometer agent, there are different types of agents. So there is the agent that listens to the notification bus, which is called the notification agent. Then there are other agents that are doing polling because not all the services are actually subscribed into the notification bus. Moreover, some of the information that is vital for telemetry is not available on the bus. So there are other two agents, one is called central agent and one is called compute agent. And those agents are polling data through the API and the Python client, and they retrieve the data. Once this is done, all these data, which are now called samples, are republished on a Cilometer-specific topic, which could be on the same Rabbe-ton queue or a different Rabbe-ton queue. This is depending on how you guys deploy it. The bottom line is that everything is republished and sent to that queue. After this happen is the data is collected through a set of agents, they are called collectors, which will push the data to a database. In our examples, we use MongoDB because it's the more feature-complete and the better performance of the lot of the database they are supported. So Cilometer currently supports MongoDB, HBase, and MySQL, which is just for Tempest. And then the Cilometer API, the V2 APIs are inquiring directly through the storage layer MongoDB to retrieve the data. So how Cilaska works. So Cilaska is not that different in the sense that they still leverage the same agents and they still get the notification, they still use the agents to do the polling, but then it sends this data directly to the Monaska API. And the way Monaska works internally is that Monaska will publish this data into a message queue, which is the Monaska message queue, and differently from what Cilometer does, this is based on Kafka. And Kafka is way, way more performant and at scale way better than Rabbit will. So once the data is here, there is another component, which is Persister, which reads the data from the Kafka queue and then stores it into the database. And so the open source database available are in FluxDB and Cassandra is under development. Also, there is an HP specific, which is VerticaDB. So Monaska supports variable set of databases. For our test, we use the in FluxDB. The other thing I want to focus is on the fact that our Cilometer agent in Cilaska has been extended to do batching. So we batch at the agent level and then the Persister batch on its own when it consumes the data from the queue. So we have the ability to batch in two ways and we can decide the size and the time of the batch independently. So we can easily configure our system to get the benefit of either speed of accuracy of data based on the two batching techniques. And then what Cilometer API does, go ahead you can ask a question. So yeah, in our test we use one instance because I'll show you we run this on DevStack. Yeah, but in FluxDB run in a cluster and we use 0.942, which allows you to cluster. Usually it's a cluster of three nodes is what you will set up. And then the Cilometer API talks directly to the Monaska API to retrieve the data. So looking a little bit more in detail, Cilometer agent has an interface which is called the Publisher interface. So if you look at the way Cilometer works, Cilometer defines the pipeline. And in this pipeline there are sources which are the meters that you want to consume or work with. And then there is a bunch of transformers which can take the data and massage it into a format which is slightly different. Once this is done, the last step is the publishing. So it takes the transform data and push it somewhere. The default one as I said before is called Notifier and it sends the data to another rabbit. There is also one that sends data directly to Kafka, but we have implemented one that sends the stuff to Monaska. And so that is called the Monaska Publisher. And this is the one that has the ability also of batching. On top of the ability of batching, what we have done is that the messages that are coming have a lot of metadata. So we created a configuration file that allows you to select on a per meter type which type of metadata you want to carry over. So you decide, oh, this is important for me, these are the stuff I don't care about. And that actually makes a huge difference. I will show you later in the amount of storage benefit that we get, not throwing everything into the database. On the other hand, on the API, there is a base interface that implements the storage. And so that base interface has been implemented by a Monaska driver, which again takes a request from the API, is a model view control, takes the request and implements it talking to the Monaska API. And then it retrieves the data back. Also what we have done here on the Monaska driver is that we leverage the scalability of Monaska. So we have the ability of sending multiple queries. So if you are querying for a time interval, let's say five days, you have a configuration that can say how many threads you want to send. So it's going to split the query and send 20 concurrent queries to retrieve the five-day worth of data. What we found out is that we need to make the Monaska, the cilometer code better because currently it waits for all this data to be compiled and then sent back. But if we use things like pagination, you can basically build your page in advance and so when you serve it, you will get the pages beforehand. Then we decided to do some tests. So what we wanted to do is that we wanted to simulate a concept of private cloud versus public cloud. So we want to have a sense of how different we'll perform when you have different type of clouds. And so what we've done is that we said, okay, you know, a public cloud, we have to have a lot more tenants than a private cloud will. And so we have five tenants on a private cloud and we have 500 tenants on a public cloud. And then we have 10 resources on a tenant in which we divided them in four compute, four volume, and two image. Just to give a sense of a standard load that you will get, right? And instead on the 500 tenants, we just decided to have one resource. The reason is that in the end, we wanted to end up with the same number of total measurements which was 7.5 million. So at the beginning, we were very optimistic that even our VMs running DevStack will be able to support that load. I will show you later what the reality happened. And then we had two environments we used to test. We had an environment which is a virtual machine. And that virtual machine was pretty meaty virtual machine, 16 CPUs, 32 gig of RAM. But the bottleneck ended up to be the 50 gig of root disk we had. And then we had a bare metal, so we had a very big server which has 96 gig and a total RAM. So there is where we could really run the 7.5 million load that we plan to. So how we tested it? We took actually Oslo Messaging. Oslo Messaging is a load simulator. It's a simulator that sends messages to the bus. So we extended that code. And this is also available as part of the Cilometer. So you can get this tool, the Ciloscope. You can get this tool too in the same repository. And we extended it to have some real messages, but we mock data in it. And so we send batches of these messages over the bus. And we tested the time it took to send these messages over. And then in the case of Cilometer, what we did, we tested with also time, the time it took to consume it on the Cilometer bus. Once all the messages were consumed on Cilometer bus, then we knew they were all stored into the database. Then we repeat the same thing for Cilaska. So in Cilaska what we did is that we had the same producer, which we timed. But then because it doesn't republish into the queue, we actually went and measured the time it took to load all the data into InfluxDB. So this was how we performed the load tests. And these were the results. So with Cilometer, basically we couldn't complete the 4.8 million because we ran out of space on a 50 gigabyte disk. And time-wise, you see that Cilometer is way higher than what Cilaska is. Cilaska eventually failed, too, around 9 million. So I think it was around 8 million, 8.5 million, something like that. So the takeaway is that on average, Cilaska is three and a half times faster in consuming data for a public cloud simulation. And Cilaska really consumes between two to three times less space. And why I say two to three times? Because it depends on the amount of metadata you decide to put. So we put enough metadata to run a traditional build, and then we are around two times more conservative in space. If you are very skinny, you don't need many metadata, then you can stretch it to three times. And then we did the same thing on the virtual machine for the private cloud. And as you can see in the private cloud, the amount of data that failed was the same. But overall, both systems are speedier. So there is a correlation between the fact that the more tenants you have, the slower the systems are. And this is affected by the two. And I think the reality is that there is a resource creation part, the Cilometers to go through that is time consuming. So more tenants you have, more of that needs to be done. The second thing we've done is we did a query test. So now we loaded the system, we did all the performance, and then we did the queries. So the way we did the query, we used Kern. We didn't use the Python client, we just used Kern. And what we did is that for each individual tenant, we specify an interval of time, and we query samples for a particular meter. So in the case of Cilaska, of course, the query that Cilometer performs goes to the Monaska API. When you query on Cilometer itself, it goes to Mongo. So we timed simply the time it took to run those queries. So we disabled Kiston because we used Kern, so we don't have Kiston. And then what we did, we did at least one query for a 24-hour interval, time spent per tenant. And so we had at least five tenants, and for each tenant, we have 10 repetitions. So we did all of this to avoid to have false positives. Times were too fast. And then we did the 90th percentile to find out what the average times are. So when we test this on the VMs, because we couldn't really go above the 2.4 million on the VMs, so we did the test starting from 300K, 600, 1.2 million to 2.4 million. And as you can see, both have a kind of exponential tendency, but the Cilaska one is really, really more subtle than the Cilometer one. So the result is that Cilaska on average is 2.5 times faster than the standard Cilometer is. And this was to query those samples, for instance. Then we did the same thing when we query on a public cloud. And again, as we saw the difference in the load time, we see a significant difference in the query time. So the more amount of tenants you have, the worse the performance of Cilometer will be. Now Cilaska on the counterpart doesn't degrade that badly. So it maintains more linear behavior. And in fact, the performance was astonishing. It was 11 times faster than Cilometer is. So what we found out, definitely, there is a correlation between the performance of Cilometer is and the number of tenants that you have in your cloud. Then we ran on bare metal. And on bare metal, we could load the 7.5 million samples we were set to, because on the VMs, we couldn't get it because we ran out of this. When we were tear out this, we could do that easily. And so that behavior you have seen on the public cloud that I show here, where Cilaska goes more slowly to increase the time, is really augmented or is really exacerbated on the bare metal behavior. And so the difference here was 18 times faster compared to where Cilometer is. But nevertheless, what we also found out is that the difference between Cilaska API and Monaska API is significant. So not only Cilaska performed better than Cilometer, but compared to Monaska, it's still 1.8 times slower than Monaska is. So we think that there is a lot of improvement that we can do in bringing Cilaska in pair with Monaska. I also want to stress that really there is the other possibilities that if you don't need the backward compatibility with Cilometer, you can still query the Monaska API and get the performance benefit straight away. So the reason we did Cilaska is that we want to have the benefits of both worlds. We want to have the performance that Monaska brings in but still keep the backward compatibility with Cilometer because there are users and companies that are using it and customers that are currently using the Cilometer. So what we currently support on Cilaska? On the publisher, the features we have, we can publish, connect to the Monaska API. We use Keystone to authenticate so we have a Monaska agent like that is publishing data to the Monaska API. So you cannot get swamp with data that is not recognized because the Monaska publisher is a recognized agent for Monaska. We support the pipeline as is because we only change the publisher part so everything else in the pipeline, if you guys do change of the data on the sample, anything that you guys are doing currently will be supported because we haven't touched any of that. And also we added this configuration ability of saying which data will be of the metadata converting to dimension and which data will part of the metadata. This is very important because the way Cilometer with the way Monaska works and the way the time series database work in general is that dimensions are queryable data where instead the metadata is the one that goes with the measure. So you can do any type of query on the dimension straight away but if you want to dig data out of the metadata you will need to collect the data and then go and identify the different parts. And on top of that we have implemented ability to do batching. So the publisher will batch based on a time interval or on the amount of messages that you want to send over. On the Monaska driver, the one that implements the API we have implemented simple queries for pretty much all the major elements, meters, sources, samples and statistics. We support metadata also for everything but meters. And on the statistic side we only support the most common like aggregation, minimum max and all of that. We haven't done standard deviation or cardinality. We don't support pagination and group buy and complex queries. Actually the pagination and group buy we did all of this with a kilo, stable kilo but now in liberty there are limits, there are more functionality we can leverage that warrant available at the kilo version. So how you can get to Cilaska? You can go to the Github, it's open stack Monaska Cilometer and it's all in Python and there are, you can get there the Monaska publisher code. You can get the Monaska storage or the implementation for the Monaska storage driver. You see there are all the unit tests we implemented and then we have an automated Ansipole deployment that allows you to deploy a dev stack and the Monaska and brings in the Cilaska pieces. And we also publish the code changes we have done to prepare or to work with a load simulator. So you can get the entire thing setting up on your boxes, run the load simulator and have fun with it. The only thing you need to do is clone the repo, go into the deployer Cilaska shell and it will basically bring up a dev stack, install Monaska, install the Cilaska parts and then you will have a Cilaska running at the end. So what next? So what we think we're going to do next? Well I think next we are, we want to talk with the Cilometer folks because we think the parts that are developed to extend Cilometer to be integrated with Monaska are really a glorified mega driver if you want that allows Cilometer to use Monaska as a storage driver. It's a little bit overkill but the idea is that if you already deploy Monaska why not taking Cilometer and send the data on the same place. The other thing we want to do is that we want to also extend Cilaska to publish events. So Monaska is in the works of supporting events natively with an events API. So the same thing we have done with a metrics API we want to now send events to Monaska. So we can also store events into Monaska. The other thing is as I showed before this different performance that we have between Cilometer, sorry Monaska and Cilaska API is mainly due to the fact that the JSON parser is not efficient and that the multiple queries that we ran at the same time they are not really leveraged because of the lack of pagination and the lack of the ability of sending data while it's coming. So basically Cilometer is waiting for all these queries to be resolved so it can pack all the data and send a single response back. And then we want to also explore how we can collaborate with Cilometer to basically integrate the alarming capability that Monaska has. Monaska has an inline alarming which is extremely efficient because basically what happens is that once a metric, a measurement gets into the queue it's automatically evaluated and it doesn't need to go to the storage before it's available for the evaluation engine. So we think the inline evaluation of alarming is a valuable thing that could be used also for Cilometer. And this wasn't just the work of the four of us but there was a lot of people involved. So I really want to thank the two teams in Cisco and in HP that work together to crank the numbers and run the test and develop the code and do all of this in a fairly short amount of time. This work has been done in roughly three months. Okay, thank you. Any questions? Yes. Do you? Okay. We haven't compared Ciloscope with gnocchi for two reasons. One is that gnocchi provides a different API. So the idea of Cilosc was to maintain the backward compatibility. So I think if you want to do such a comparison we should really compare Monaska with gnocchi because you have two completely different API. So if you are departing from Cilometer API you don't care about the Cilometer API then I wouldn't use Cilosc to read the data back. I would use Cilometer collection to feed the data into Monaska but I would use Monaska API directly to read the data because it scales better. So the idea is not to compete with gnocchi. I think gnocchi could be an alternative solution to store the data. I know there are some similarities. For instance gnocchi used time series database too. But I think Monaska architectureally has this ability of it's microservice base so as a bus in the middle and all these components can go and fetch data like the inline alarm in all of this. When instead gnocchi is pretty much just a storage part there is an API and a database and you call the API to store the data. So the functionality is rather different. Yes. Yes. So there is an effort in Monaska and Roland can talk about it but there is an integration going on with heat and is already working. There was actually a talk earlier on about this so potentially yes it will work with heat too. I think there is a slightly difference because the way Cilometer works is that it repeats the alarms instead Monaska is more like I send the alarm and then I send the status change rather than keep bombarding you with the alarm so in order to keep the same functionality in it there will be some changes that needs to be done in Monaska but they are not that significant it's just a matter of keep alarming rather than you do it once or stop but you should talk to Roland because he is the one that knows all the details about it. Any other questions? Sure. So if you are so the way Monaska, the way Cilometer works is that you will have to create a message and then write a plugin that will interpret the message and then it will go and send the notification in Cilometer. We are not supporting that because that's kind of a legacy thing but you can post your stuff to Monaska API. Yes. Because that's exactly how the publisher will do is the same way. Any other questions, anybody? Oh you have a question related to Monaska? Sure. Oh yes, there is a Monaska team meeting at 4.30 in the room is S3. S3 where is this? It's the Cisco room. No it's not. I don't know. It's called S3 so that's all I know. Sakura tower. Okay. So if there aren't any other questions I want to thank you again for your time.