 How's everybody doing? Thanks for showing up this morning. Hope you've had a good three days of summit so far. Some people may have joined in a little late, but hope you're having a good time. I'm Sriram Natarajan. This is David Paraza. We're here from Persistent Systems to talk to you about Cilometer, where it is today, what you could do today, what you can do tomorrow, and our thoughts on how the community can adopt it. Persistent is a software powerhouse. We've been around for 24 years building software for some of the biggest names, all of the biggest names in the software world. With OpenStack, we've been contributing code since the cactus days. We've had a couple of signature wins with a few large vendors, and we've started looking at Cilometer as the way to get to monetization or the return of investment that people have made on OpenStack, and this seems to be the way to do it. The community is matured enough. The offering is matured enough that a lot of workloads that are moving on to OpenStack have to be now justified and said, okay, now this is cool, now what? How do I prove that this is giving me the return that I was expecting? We think that Cilometer is the answer to that question. Just a quick show of hands here. So how many of you are running Cilometer today? Right? Right? Are you just using it for metering alone or anything else? Just for metering? Okay. Are you archiving your Cilometer data so far that you're getting? Okay. And people that are using it, do you want to continue to use it? Any people who have problems with it? Wow, that's great. All right, cool. That's David will walk you through a little bit of evolution of Cilometer, and then we can bring you to the point where we are today and then what the future holds for us. David. Thank you. So first of all, I want to say this is not a depth talk on Cilometer. I mean, if you want to get details, this great documentation on the weekend, you can contact me if you want. If I don't know the answer, I can point it to one of the core members which will be very more likely answer whatever question. This is more about showing you the evolution of Cilometer and the opportunity that brings to the business. And then when I say the business, information you can get out of your data, your Cilometer data about your cloud. And then this information that might be relevant today or might be relevant in five or 10 years. So because patents, when you're looking for patents, patents might develop over time, not just instantaneous. So this is the opportunity we see and then we want to then discuss here with this audience and make it a little bit interactive, if you will. With this interaction, hopefully we get one of the things that are happening with Cilometer today that we can improve so you can turn on Cilometer today. Turning on Cilometer today means you can do analytics in the future. So even if you don't care about monitoring or billing, even if you don't care about monitoring, it's still beneficial. Thank you. So what was the initial idea? The initial idea is you had a gap. So you have compute services, network services, and storage services. But you have different departments coming in and the development department will come in and orchestrate, make changes to orchestration, make changes to, let's say, placement algorithm, for example. The IT guys will come in and deploy OpenStack. But then the building department is crashing their head. It's like, how do I build? I have to build for the services. So each vendor will have their own one-off solution and then to measure. Nova had some metering that was collecting just for placement reasons. So you could use that, but that was not enough. So it was hard to port and hard to interrupt. I mean, there was no standard. That's the bottom line. So there was a need, there was a clear need, right? And then there was an incubation already out in StackForge of a kilometer, but it wasn't yet added to OpenStack. So this is when OpenStack is born, right? So now you have a metering component called a kilometer in the OpenStack set of components. You have storage, image, management. You have compute, but now we have a kilometer. Now the initial idea is just for billing. That was the need, that was the gap. But from the beginning, the design has always been we can do more than just meters for billing, right? We're going to create this infrastructure, this pipeline, if you will, right? Where you can send messages at scale and gather all the information, all the information such as regular events or alarms, right? If you have an issue with your systems, you're reaching a threshold, you send an alarm and then a kilometer can capture that. And then you can customize it so you can publish that alarm somehow, right? So if you hook up to the database, you can certainly get the alarm. And there's other ways, you know, after that. A kilometer is just in charge of collecting that and storing it, okay? All right, so these are the initial features. You have, so you have, remember back then there was only Nova and Nova had Nova compute network and volume. So there was like compute storage and network within the same Nova component. So there was a big need for performance there, so there was a specific compute agent that was dedicated for that. And then that compute agent, what it does is just it sends the notifications in the notification channel. And, you know, Nova was not instrumented to send messages like it is today. Today is a little bit different, I'll show you that. But then you have like a central agent which was used pretty much for glance, Swift, and, you know, say, network, Nova network, right? And that was more of a pulling mechanism. So you go and ask for it through the APIs. So a central agent was sitting at the kilometer block, right? And then you have, you know, data. It was non-SQL. They started with MongoDB, as you know. And there was a single actor and a single use case. I want to gather meters so I can add my rates on top of that and get billing, okay? Anything else? Well, I think we can move to the next chart. All right, so there was an evolution through, you know, starting with Grisly and Havana and Icehouse. So the main thing here is what happened is that, you know, the main design started to evolve. The design was there to accept anything else, but then, you know, different companies started adding their piece of it, right? So for example, there was, you know, Alarms Neat, and then the Alarms API was added in May 27, 2013, right? So you have also standard auditing, the CADAC, that came in as part of an event, a new type of event that will allow you to store data for auditing purposes later on, right? You have hardware monitoring. You have network monitoring. So this is sort of realizing the initial design, but it's just going beyond the initial gap, okay? So, you know, you see here how multiple features cannot, multiple messages cannot be sent through that bus. Let's go to the next chart. All right, so what do you have in that bus now? You don't have only meters. You have meters. You have alarms. You have events. And, you know, to be, I mean, now you have more components. So now you have to instruments more than just nova. Now you have to take care of neutron separately. You have to take care of syndrome separately, right? And then the good news is that most of these components have already been instrumented to send notifications. So through a common, common code in Oslo, the Oslo project, I don't know if you're familiar with that, but so they can now send notifications. And then these notifications convert into the events or meters or alarms. There's an engine. There's a collector in the lumber that will process that and stores it or do some aggregation at that point as well. So, you know, for building, we still do an aggregate, we still do an aggregation, right? But it saves the sample as well. It saves the raw event. And it saves the raw alarm, which is pretty important for what's coming on next, okay? So one thing I want to mention here, too, is that there's been a gray fissure that was added for auto scaling that, you know, I know Rackspace is using that. So it's pretty much, you know, alarms being sent and then through Cilometer and then Heat is actually responding to those alarms and auto scaling based on those alarms, okay? So there's a template to go scale up and scale down and then it's acting upon it, right? So this is adding, you know, it's getting to a point where, you know, if you're in a private cloud, you do an open stack, you can do similar things that Amazon is doing, right? So, yeah, next chart. All right, and then, so here's what I, it could become a little more interactive if you want. So what if you turn Cilometer today? Let's say you haven't thought about it, right? You're in enterprise, you don't need billing, you run a private cloud for your enterprise, right? So, yes, you're going to get some features like, you know, auditing. If you're, you have to be in compliance with some regulation and then, you know, some auditor is going to come in, you know, you'll get that feature for free, right? You can measure utilization. It's going to be alarm driven auto scaling. You can take advantage of elasticity as well, even in your private setting, right? You can do hardware monitoring, you know, is it getting too hot? Do you need to actually go and, you know, fix a fan in some of the hardware systems? And then, you have multiple data store options that you can play with. You have non-SQL and SQL support now. So you have a whole set of data stores you can play around. But even if you don't take advantage of these features, right? I argue that you turn it on just because of the analytics opportunity, okay? So that's, by the end of the day, when we finish with this presentation, that's what I like you to take home, you know? Even if I don't need these features, let me explore what are the analytics opportunities that I have in five years, in two years, right? And what my, you know, management team, the intel in my management team could be taken out of this to make smart decisions about the cloud. Do we buy new storage or not, right? Do we buy, do we scale out to a public cloud or not? That type of questions, you know? Those are key decisions that the more intel you have, the better, right? So we understand there's, you know, I argue that that alone, you know, audience alone is enough, but I understand why some people might not, you know, be turning around a kilometer right now, right? So they don't have an immediate need, a business need today, and it's adding extra cycles, right, to my processing, right? So it's adding, you have to generate, transmit and process new messages and a lot of messages. This is not like, you know, sending messages every time something happens, okay? So you have to think now about additional storage. So what do I do with all these data? I keep feeling terabyte in no time. How do I archive this? You know, there was a question in the beginning about archiving. How do I archive this data? That's a potential issue you could have. If you're not ready for it, then that's probably one of the reasons you're not flipping that switch, okay? And then you have additional configurations and complexity. Again, you don't have immediate business need, and maybe you don't want to take advantage of analytics, but so you have to weigh that, right? So is it disruptive enough in my business that I'm not going to turn it on, or can I tolerate that disruption to a point where I can gain this benefit from the analytics work and bring it, okay? So, and hopefully, I mean, if you have any questions, we can help you with that decision as well. So I'll tell you about it. I'll tell you about the evolution, right? So we got from metering to now, metering alarms, events, auditing, et cetera, right? So here's what I see coming on. So it just started with billing. It did event-driven policies now. You have hardware monitoring. Maybe you cannot app-level monitoring. So this is something, you know, if you're here in the middle of the stack and you went down to hardware to monitor that, why not go up to application level and then send notifications in that way? For example, my web app is not the throughput of my transactions and not keeping up. It's an alarm, right? And then hopefully someone hears that alarm and acts upon it like heat, right? And auto-scale my environment just because the app level is not receiving the throughput, right? So that's a good example of app-level monitoring. But, you know, the big thing that's coming on is you have a lot of data for your cloud on all fronts. So big data analytics becomes something that you can do in the future. So its... Cylometer becomes, yes, a tool for today, but also the data of Cylometer becomes the data of the future. Okay? So I think I'm going to let Sriram go through the actual analytics opportunity here. Sure. So we have the capability to connect collective events, alarms, and notifications through Cylometer. So he was talking about applications that are on boarded onto OpenStack. We recommend that when you're integrating such applications, what typically notifications you write don't try to carry on the old to just carry on the old mechanisms into the new world of OpenStack. And switch it over to Cylometer so everything in it flows through one place so you can capture this thing. This comes from experience because we run a large-ish OpenStack instance ourselves and we're kind of calcified in the old ways. Before the pre-Cylometer days and we found it difficult to move it here. One thing that you can do with the data immediately is that you can see the inefficiency. So things like unused VMs, the autoscale to match up to incoming demand, that's something that you can automatically do. You can also, either if it's a tenant that's your in-house tenants, you can actually understand what they're up to and get to know and give them suggestions on how better they can run their workload, both from efficiency and effectiveness to probably more innovative solutions that you can suggest. Probably if you find some complimentary workload, some things that work in the morning and some things that work in the afternoon and they never kind of seem to overlap and they seem to use the same database, might as well tell them, here's a suggestion, probably you can merge them if the rules apply. You can also do things like targeted offerings. So if everybody seems to wake up on Monday morning and starts demanding a lot, you can probably charge it differently. If you're a public provider, if you're in-house, you can probably provide some disincentives for people to not use it in the morning or Monday morning when everybody seems to want it so that you can stagger the load. That's something that kind of leads into how you can differentiate your service. You can definitely look at providing such intelligence to your consumers, either internal or external, to say, this is what you're doing. Here's how I can help you do things better and here's the data to prove it. I'm not trying to sell you new services. This is here's the data that goes with that and that becomes a very easy argument. It becomes a very sticky relationship after that and people really love you when you save the money. When you're running large workloads, you have mature systems, you have the compliance and regulatory rigor that you have to stand up to. There are systems that you probably have in place for existing workloads that monitor and provide your GRC mechanisms for that. What Cylometer can do is be the collection point for your entire OpenStack system. Don't bring over things which talk directly and kind of eclipse the Cylometer path because if you have things going into them, there's one single point where you can capture the data, where you can analyze it and be sure that anything that's running in your OpenStack system is going to go through this way. You have that and you can make it transparent enough and you can stand up to any audit scrutiny that you may face. Then we have the usual capability predictions, capacity predictions that we can do. We can kind of give you, here's where you're going to run out for a tenant level for your zone level or your entire org level. You can say, here's when you're going to run out of capacity, here's what you'll need, here are your special services that you may want to bring in. Here are your bottlenecks. You may want to put a bigger pipe between these places if need be. Those are the sort of things that you could do. So this is kind of what we talked about. There are certain things that we can also look at. You can see if you're ready for disaster recovery. That's something that is gaining interest in the OpenStack space yesterday. Our colleague talked about one possible scenario. There is a working group that David is working with to see how disaster recovery plays in and how you can utilize probably multiple zones in an OpenStack environment to probably have backup images here and start it up in a separate place. Those sort of things can happen if you have the alarms in place, that monitoring in place, that let you know that you actually have a problem. You also have, there are certain things like security problems that you can look at. So once you have some standardized data, you know that your applications are going to do these things. You're going to start up a few VMs, and it's easier to draw a baseline once you have a data. And once there is anomalous behavior, you know that there's something wrong going on, and that can be flagged, and you can have that as your early warning system as something going wrong in addition to your other ideas and IPS solutions. Go on. Go on, David. All right. So just like this is a chart I think we borrowed from some financial planning site, what we want to impress upon you is that even though you don't have the need today, you're going to have the need tomorrow for a salameter data. So we would like you to take a really good look at what salameter offers you today, and it's pretty valuable as it stands. But the data it collects, those are going to get better to analyze them. David said two years, I would argue even much less than that. In a matter of months, you'll have better integration between salameter and the rest of the best-of-breed solutions that we have. In fact, we are working on a couple of those. We'll also have the ability to understand, throw more analytics at it to see how we can predict to actually identify the patterns that exist today so you can ask if this pattern exists, and so we can actually do that. And in the future what we see is the machine learning and analytics systems will automatically start learning these patterns and start generating information to you, saying that this is what I see, and you better be prepared for it. This is something that can happen, and that will only happen if you turn on your salameter today. David. So what are the challenges that come with this? I touched with this already, but I make it really simple to see if you guys can memorize it and take it home with you. So Venn diagram. So volume, scalability, performance, and the business impact today. You have to worry about the volume of data that you're getting now. We have a reference architecture I want to show you that will alleviate that. So you have to be able to scale. If there's scalability issues, I suggest you guys bring that up in the community. We have emails that you can contact for that. So what's your business impact today? If you turn it on today, there's an impact on your revenue. Don't turn it on. You have to make a smart business decision. Does it impact your business today? How much of the risk you're willing to take today so you can gain in the future? So be smart about that. So what are the solutions that we propose to address these challenges? So design for minimum impact to daily operations. For example, you have control. Even though you don't write, let's say you're a user, you don't write a kilometer code, you have control when you deploy a kilometer. Where do you deploy it? What network infrastructure are you putting around it? What storage attachment you have on that? For example, don't run a kilometer where novel controllers are running. That doesn't make any sense. Run a kilometer off to a site. A kilometer afterwards is a Linux service. You can change priorities on that Linux service. And then have it run with lower priority and other more high priority processes. So take advantage of your knowledge of operating systems. This is yet running in an operating system at the end. So what's your archiving strategy? If you let that database grow, obviously you have more impact to your business. If your data is too big, then when you're querying the data for building this month, if you don't archive that, you're going to have a performance problem. So measure what's the data you need today for your business operation for billing and monitoring. And then after that, you have to archive the rest. You don't need that in the same database. So this becomes more of an input to a data warehouse, which is a different world. In open style, we talk about infrastructure as a service. But there's no reason why this is not connected to the other big topic in computing today, which is data warehousing and analytics. Allow for portability. Allow for a kilometer to be moved to another more expensive hardware that actually, if you're doing well, if you're a three-kilometer working for you, and then you want to take advantage and spend a little bit more money, then you take it to a better hardware. You take it to a more scalable hardware. But always allow for that portability. And plan for long-term storage. So you might have more expensive storage today being used as a backup. So in five years, you might have terabytes and terabytes of data. Think about cheaper storage. You can store this and bring it in only when the analytics tools require it. With that, I want to take you through the reference architecture. So think of a kilometer the same way. You have notifications. You have API polling coming in. You have now SQL, non-SQL storage. That's your database. That's the database here. But then you, through archiving, so the archiving architecture is the one that's going to bring in all the data into, you know, we suggest, for example, a Hadoop cluster, right? And then the HDFS file system in Hadoop that allows you to store, you know, massive amount of data in Parler. And then you can then use analytics tools like Mahood, Gate Revolution, you know, to mention one of the popular open source ones that we're using out there. And, you know, persistence actually, excuse me, persistence is actually using and suggesting to our clients. And you can go to the big, you know, big guys like, you know, Teradata, Nathisa, which is actually an IBM product now. You know, then, you know, from those analytics tools, think about those analytics tools feeding into a more structured SQL database. It could be any relational database that you can think of. And then think of the, you know, the other end. Now you have a director. You're a director in your company or your VP or your, you know, any other executive making, you know, getting notifications and getting, you know, insight about the cloud. How is it working? Right? And you have the director making if queries. You see, what if I had more storage, you know, how would this behave? You know, what if my network cars were 10 gigabits instead of one gigabit? Let's make that question and see what do I get, right? So, you know, a couple of things happen in here, right? So you're offloading, like I said, your data to a more containable environment where it's built for this big data, right? And you're adding daily intelligence to your organization. Now, again, don't expect the intelligence to be, you know, of the highest value day two, right? Because some of these analytics tools need patterns over time to recognize there's something going on and let you know, right? So it'll only get better after that, right? So the intelligence you'll get is going to be better and better and better. And like I said, if you keep switching, so you keep the same architecture, but you keep switching the brain, you know, the analytics tool here, which I had a pointer. So if you keep switching that intelligence to more advanced, you know, let's say cognitive systems, I mean, you heard about Watson, what it's doing in IBM, you know, similar things will come from other companies. And what happened here is not only you're using the data that you have in your data center, now you're using the data of the world with the data and the data center, right? So this is like humans really think, right? So we don't just make logic out of, you know, the information that's coming in. We use our knowledge, right? We use our knowledge and the new information coming in to make decisions and to take steps, right? So with that, you know, hopefully we convince you to turn on the button, right? You know, I would like to hear from you why or why not the button is pushed. And, you know, you can go to the OpenStack regular mailing list or the development mailing list, make sure you put a kilometer, so we can catch that traffic as being kilometer questions and then either ourselves or, you know, in the community we'll be able to address those. So thank you. And, you know, we'll leave you, we'll let you, you know, go for questions now. Here's our information if you want to contact us. Any questions? Go to the microphone, please. Yes. Real quick. I know you mentioned one of the analytics tools that you had recommended. Could you repeat that one? The open source analytics tool? Absolutely. So you have Mahut. Mahut? H-O-O-T. O-U-T. Gate. O-U-T. O-U-T. O-U-T. Gate and revolution analytics. Revolution is none more of R. Yes, R. And then you had also mentioned auditing tools. Do you have a recommendation for certain auditing tools that integrate well with a salometer? Well, no, what I mentioned on salometer is the standard that's been used. I don't, I can't recommend auditing tools. That's not what I've been looking at. But there's a standard called CARAF. Actually, IBM came up with that, too. And it's been supported now through the standards body. And, you know, it's just a way to annotate your events so you can know the who, how, when, right? Of when something happened. And then that data alone could be fed into then auditing tool. And easily you can make, you know, recognitions, you know, this guy did this in that time, right? So that's what we have so far in salometer, having seen any specific auditing tool. Thanks for your questions. So how much bandwidth does salometer use and how much storage do you need for each? Yeah, that's a good question. I heard about, you know, terabytes in week, right? So that's quite a bit. So and then, yes, it's going to have a load in your bandwidth and have the exact number. We haven't done that performance analysis. Now we're, again, we're here to promote using salometer so that we can use the analytics tools that, you know, our company does. That's one of the things that we do, right? So we work with customers to get analytics going for their enterprise. So what we're saying is this opportunity, what I like to do is invite you, the audience, to actually, you know, help us with that performance. You know, the more data we get on performance and bandwidth, then the better we can feedback as users to the salometer team and, you know, get that conversation going with new blueprints and even try to contribute, right? Is there a configurability in the salometer that's probably, can you turn it down? Yeah. You mean to dial it up or down? That's probably what you're asking for, right? Well, you can by disabling functionality, right? Oh, absolutely. Absolutely. Yes, absolutely. You can tweak that. Absolutely. The sampling is whatever you want, right? That's right. Yep. Thank you. Yep. Absolutely. I have a good question, sir. Yes. The question for my is, like, how do you correlate the data from the metrics or the metering data from various services, right? So, if you have multiple Nova compute instances, everything would be running with different time zones or different time zones. How do you correlate all this data? So, go ahead. I think it's all UTC. It's all UTC now. So, everything is the same time. So, you sync it to a clock? Yeah. Okay. Right. Yeah. But, I mean, that's, you can correlate on time, based on time, right? So, but Cilemono does a pretty good job tagging, you know, each of the, you know, every event, if you look at the events, this description, you know, you'll notice that it has, you know, IDs, it has a timestamp, it has, you know, what it comes from or what it's going, you know? So, it's pretty standard. I mean, I like the way they come up with, you know, those on the event to, you know, let you correlate things. So, does that help you? Okay. So, I invite you to go to the Cilometer page and look at the actual format of the event. Okay. Hi. I'm looking at the slide and I'm wondering, is this to exemplify the extension of capabilities or do you really think that the analytics capabilities will be outside the APIs because that's where I'm coming to OpenStack Services? Yeah. I think it's completely separate. We don't have to have, like, the analytics component in OpenStack. I think just the fact that you have a database, the Cilometer stores everything, right? And that you can just grab from that and use it as an input to data warehouse environments. There's already plenty of technology out there for data warehousing. I mean, what I'm showing you here is an open source, you know, a little bit more open architecture, but there's plenty of solutions out there where you get an input from a database and fit it into your data warehouse and then you run analytics under data warehouse. And things that cannot be done through the API. You mean the OpenStack API? Yeah. Yeah, no, no. I wouldn't recommend doing analytics through the OpenStack API. I would definitely load OpenStack too much. So you need to offload that data and do that analysis offline completely. No, what I mean is to get the analytics with the tools that you show through the Cilometer API to get in. Yeah, so that goes to my point on, you know, your data store getting too big, right? So yes, you could do that if you let all your data sit on the store. For example, the support for HBase on Cilometer, which gives you the opportunity to spread your data even on the Cilometer context. You're not in a data archiving scenario at that point. So if you want to go that way, then yes, and you're willing to pay the penalty for having that data and not having the indexes that you need, then sure, you can use the APIs. But our recommendation based on our experience is use the super fast database that you need for your business today so you can query billing, you know, for the month, right? But at the same time, offload this data to a warehouse where tools can run and they're not affecting open stack operations at all, right? So... Thank you. Good question, sir. Two questions. Absolutely. Was this a greenfield implementation or did you replace another tool with this? Yeah, this is a forward-looking, you know, reference architecture. This is not something that we've implemented. We have analytics tooling that follows this reference architecture, okay? Okay, so in the rest of your enterprise, is there another tool that they're using and if so, do you integrate with it? No, I mean, we recommend this reference architecture without the top, right? Without the Cilometer part. We recommend this to our clients, okay? And we have implemented solutions like this with our clients. Right. Now, the argument that I'm making is no reason why we could not just use this implementation just for your CRM data, just with your CRM data or your enterprise data. You can also now, if you're using Cloud as a way to automate your infrastructure, you can also use the Cloud infrastructure data to fit in this analytics machine, if you will, right? Okay. So... Thank you. You're welcome. All right. If you can ask, you know, any other questions through email if something comes up. Thank you for coming, right? All right. Thank you so much.