 So my job entails mostly building monitoring systems, metering, keeping track of what's used, and doing stuff with the data, that kind of thing. Making sure it gets stored and doesn't go away. So my project for the last six months, major project has been applying the same to Solomero, which is some open-stacked metering subsystem. So this talk is going to be a brief overview in three parts of Solomero itself, how it's built, what the different bits do, and how it's traditionally deployed with an overview of the different storage backends available and the advantages and disadvantages, followed by what I was dealing with at anchor in terms of requirements, idiosyncrasies, and how we worked around those within the Solomero model. And finally, a very brief overview of future steps in Solomero's development and the introduction of the Nokia, referred to as a time series database as a service, although that's not quite accurate project by Julian. Can't remember his last name, so former Solomero PTA. So metering, why? Ultimately, we're optimizing for one thing, which is pretty obvious, but that thing has several independent sub-goals. We obviously need to keep track of resource usage in order to build for it. We also need to keep track of resource usage for the purposes of capacity planning and some identifying congestion and some hotspots and that kind of thing. So, yeah, continuing on with the Solomero architecture bit, Solomero's made of a wide variety of different components in typical open-stack fashion, everything is designed to be pluggable. It's a very javarish Python. I'm sure you know the coding style. So we've got, on one end, we have Solomero agents, which are the bits that, say, query libvert or the Cinder API or that kind of thing and report the results upstream. These are passed to or invoke a variety of different publishers, which are pluggable message transports, which takes a lot of the messages and some wrap them through potentially a variety of transforms and some aggregation before delivering them to the storage backend collectors, which the publishers immediately deliver to through a queuing service, which aggregate the results in a form which can be passed to the storage backend for storage. And then the majority of open-stack deployments, as has been mentioned, the queuing system in question is RabbitMQ. I haven't come across anyone who's been using Cupid or 0MQ in production, but most of this will be applicable to that, as well if you are doing such a thing. And let me know how it turns out. So in terms of what Solomero is actually passing around, the substantially large JSON blobs, as you can see. And the thing you'll note about this is it's basically schema-less. This format is documented largely. There are some bits of repeated data there. There's a lot of redundancy. And these messages, this one is 1.3 kilobytes, are sent along with every measurement that Solomero collects. So every CPU, seconds, or disk space comes with something this size, which has caused some problems, as you'll see. On the storage side, the backend which was envisaged at Solomero's conception is Mongo. And it's still the officially recommended Solomero backend. It's easy to see why Mongo is a backend. And this, as a message format, was developed in unison. Mongo is uniquely well-suited to that kind of schema-less JSON data. That choice has, again, caused some problems. There is also a quite stable and reasonably well-supported SQL alchemy backend, as, again, has previously been mentioned. MySQL is substantially more tested than Postgres. I've tried the MySQL backend. It's pretty good. It's stable. It's not great for our purposes, but still a better choice than Postgres. At this point, purely due to community support is the conclusion we reached. And a couple of the less-used ones, HBase, that's not open.tsdb on HBase. That's HBase itself from the Apache project. So that's a Hadoop infrastructure general pipeline. And the other one which I was unable to locate a logo for is IBM DB2. I'm sure IBM uses this effectively. But given that it's proprietary, we like solutions that we can fix if they break. So that wasn't really an option for us. So in turn, MongoDB seemed like a reasonable choice at some when we first glanced at it. Then we started thinking about consistency and the Solometer event model. So Solometer messages come in two basic types, events and postures. As the name suggests, events are emitted generally once, have one event per type of event per resource. And postures are sampled. So Mongo has consistency issues to varying degrees. CapTheorem states that consistency, availability, and partition tolerance cannot be properties which are simultaneously exhibited by a single network data store. MongoDB isn't really either a CP or an AP data store. Split brains and data inconsistency have been problems whenever we've tried to use large scales. And the thought of potentially losing, say, a volume.delete event for a one terabyte SSD that your customer deleted eight months ago and therefore building up to the present was not a thought that filled us with joy. So in terms of what we're actually doing and why none of the default solutions really worked for us, the Solometer data we're putting to use for monitoring, obvious reasons, capacity planning, we'd like to take some time series of resource usage and project where it's going to be in the future because that's how we buy kit. Metering in the billing sense, we want to use this data to generate invoices for customers. And a requirement which perhaps not many, except for the larger OpenStack installations, would be exploratory analytics. So we're playing around with different techniques of anomaly detection and predictive modeling to see what works. So this last one adds in some fun new requirements with respect to historical data. Common solution to storing a large time series of metrics over many years has been from... Deep probably wasn't the first, but it's probably the best known. You have configurable resolutions per time period, so you might store your data over the last month sampled at every minute and over the last year sampled at every day or something like that. The point is that you get some decreased resolution over time while also being unable to recover the original data. Yes, so the general principles we tried to follow when coming up with our metering solution for OpenStack, disproportionate resource usage should lead to disproportionate cost. What I mean by that is we would like to correctly align incentives between us, the infrastructures, the service provider, and our customers who write the applications which run on these services. For instance, we would like customers to be paying us more if they're using more of our hypervisors available CPU seconds. So we'd actually like to be billing for CPU seconds and disk IOPS rather than just what's been provisioned. Also, so that we can give customers more flexibility in allocating large amounts of resources for potentially little use if they need burstable capacity or that kind of thing. And for similar reasons, if it can be contended, then it needs to be tracked. So disks, network, CPU, anything. And in terms of planning horizon for metric storage, this is probably one of the most more controversial on the list, but given how cheap disks are these days and the requirements we have for high detail historical data, we figure we can probably get away with just planning to not throw it away until we know that we don't need it. So planning horizon at least a few years kind of thing. And avoid premature aggregation. So don't throw away your data until you know that you don't need it. And that is the general principle here, particularly when as we are working with a statistician to come up with models for this kind of thing, it's very hard to develop a quantitative model with aggregated data, particularly if you don't know how it's been aggregated and can't control how it's been aggregated. So until we come up with a firm list of requirements from an exploratory analysis that we can actually build a model with that we know we can aggregate to, does not much point in our supplying premature aggregation like our DDoS. So you'll probably notice a recurring trend in our project naming. There are a lot of dead French intellectuals. I'm not really sure how that started. Anyway, Voltaire is a time series database that we wrote starting last year to address some of these requirements. It's written in Haskell and back to Ceph. It's a CP data store. So it will reject writes if there aren't enough available nodes rather than trying to accept the right anyway and some being eventually consistent, as opposed to say Cassandra or Dynamo. I won't say too much about Voltaire because there is a talk on it later in the week. But it should be enough that it's a time series data store optimized for writes performance. That's back to Ceph. So with respect to how you actually get data out of Solomata, if you're not intending to fit within the standard Solomata data model, there are quite a few things you can do here. Some are more stupid than others. We've got publishers, as mentioned, they accept data from the Solomata agents and move along down the pipeline. The collectors, which read from said pipeline and some aggregate and send to the storage abstraction layer, those two are quite well documented and designed to be drivable from. So you can pretty easily write new publishers and new collectors. There's also storage back ends, which the kind of thing that Solomata has support for with SQL Alchemy and Mongo. The reason we didn't pick this one was because the storage backend needs to support both Solomata write operations and Solomata query operations. We don't have a huge amount of use for Solomata's query API. And rather than writing a half baked backend, we decided to get the data out of Solomata as early as possible instead. And the other little arrow there is dispatchers. I'm not actually sure of the status of this one. It's the dispatcher is the bit which accepts data from the collector and passes it along to the storage backend, but it seems to be undocumented and I'm not sure if it's going to go away in the future or not. So I thought it would be a bad idea to try and pull anything there. So we ultimately decided to write a publisher, which accepts data from the Solomata agents and sends it back into RapidMQ. This, we then have a little daemon running in our telemetry nodes, which accepts the Solomata messages, passes them into a Voltaire-compatible form and spools them for asynchronous writing. So with the Voltaire data model and how you get data back out again, if on the off chance you happen to be a Haskell programmer, this may make sense to you. If not, it's just a list comprehension, that's all it is. It's just building a list of metrics from a list of criteria. In this case, it's taking a display name, sorry, an instance ID, presumably no for metrics and pulling back everything that Voltaire is storing, which has that instance ID along with the display name and some, the counter name, which is what Solomata calls it, some, what I would call a metric name, so say disk.bytes.write or something like that. And if you are not a Haskell programmer, you could also use the JSON API, of which there are a couple. We have a generic JSON API, which just lets you query for arbitrary metrics, get your data back in a standard form, timestamp and point. And we also have a metering API, which is designed specifically for generating invoices for customers and also for building customer dashboards and the like, with which customers can keep tabs on the resource usage as it grows. A put from that looks like this, Borel being another of those French intellectuals who invented measure theory. So we've, here we've got associated customer IDs, quantities, units of measurement, the resource name and the resource ID it's attached to. So what we can do with this is we can generate basically arbitrarily detailed pretty graphs of how customers are using their resources complete with host names and IP addresses and that kind of thing. A lot of the things that I personally have had a decent amount of pain regarding in the past, the backgrounders are doing assessment stuff with AWS's billing system in particular, not to single them out in particular, but that's just the one I'm most familiar with, getting AWS invoices. I'm sure many of you have, it looks like a series of lines which consist of an instance type, say m1.xlarge and then a number of hours and then a single number, which is the quantity and that's all the detail you can really get through Amazon without jumping through an actually vast number of hoops. So yeah, we want to do something a little differently here. So in terms of how this is progressing in the future, I should first go through a few of the problems the existing Solometer data model has had with the message size as you saw. Storing these is quite expensive in MongoDB in that indexing is very, very hard on the CPUs and query performance scales linearly with the amount of data you have in the system, approximately. And so likewise with the MySQL backends and at the scale we're intending to store this kind of data, the would be ending up with many, many terabytes and SQL server very quickly, which is generally a situation you want to avoid unless you have a good reason and for this kind of data we really didn't. So at the core of this is the kind of integral link within Solometer between message values and message metadata. So at a conceptual level you have a resource like an instance which has a set of metrics, disco writes, memory used, whatever, and you really don't need to store this with every message. You can store the metadata with its own unique identifier and then you just store your points with that identifier and so being vastly less complicated. So this is one of the things that Nokia implements as you can see. It's passing its data along through two different paths. The values themselves along with usage timestamps go to the storage driver of which there are a small number at the moment and the metadata is passed to an indexer which I believe is currently defaulting to a SQL lite database, but I'm sure there will be many changes there in the future. This is also similar to Voltaire's current data model. So the records which Nokia currently supports, I should clarify that Nokia is quite far from production at the moment. The intention was to get it into Juno in a kind of preview form. This didn't happen. I'm very much hoping that the same thing can be done for Kilo, but I haven't been as involved as I would like to be in Nokia development, so I couldn't give you an answer there. So currently supported back ends, the only mature one is called Carbonara. It was written by Julien D'Argel, I don't remember his name, based on Python, written in pure Python based on the Pandas time series library. It's using a custom serialization format and it's about 500 lines. Its performance at the moment isn't hugely impressive, but its scaling is at least much more so than the default cylinder storage model. It does appear to be approaching constant time write and query complexity. So O1 on write and O1 on read once you have the address of a given time series. There's also been interest in implementing back ends for a couple of the more popular time series data stores, influxes, I'm not sure if it's an expressed interest or they actually have a code base in progress, but that's in the works and also one-third in TSDB and we're planning to develop a Voltaire back ends probably at some point in this year, depending on how Nokia development itself progresses. Yeah, so there have been a few preliminary benchmarks from Maranta, I believe. I wasn't able to determine the licensing origin of the graphs they generated, so I wasn't able to include them, but you can look them up yourself or ask me to point you towards them later, that they do demonstrate a pretty market performance improvement and the query model at least, sorry, query model at least will support much more sane scaling properties in the future. So, yeah, so that's pretty much it. All of the code that we've written for this being the publisher, which gets the stuff from Solometer into Rabbit and the daemon which writes it to Voltaire available in GitHub in addition to the query and metering APIs and Nokia itself is under the Stackforge GitHub account. Yeah, any questions?