 Okay, Stephen Simpson, who will talk to us about infrastructure monitoring. Excellent. Thank you very much. So a lot of you probably don't know me as it's been pointed out in the last couple of days. So I haven't been particularly active in the Postgres community, but I'm looking to change that. So this talk is going to be a bit of a mix of everything here. It's not really going to delve into the details of Postgres very much, or how we build something on top of Postgres. So I think it should appeal to people at most levels with Postgres. So we're going to talk about monitoring. So I'll give you a bit quick background on specifically what I'm trying to do. Then we'll go in depth a bit on how to use Postgres for metrics. And there's another section on using Postgres for a few other things that you might not ordinarily think to use it for. So I'm actually a software developer. I'm not a DBA. So I apologize if my SQL and my procedures are a bit shaky. I'll do my best. So I primarily do C and C++ and Python, that kind of things. I've also dabbled with Perl in the past, but kind of learnt better after a while. Usually systems level kind of things. Based in Bristol, out of the UK, which I don't think I've met anyone else from Bristol, it's primarily famous for this bridge. It's quite pretty. It's probably more famous for its cider. So if you ever go to Bristol, you'll see in pretty much every pub more varieties of cider than you can ever imagine. I don't think there's many places else in the world which are like that. So I've done a few bizarre things. I've worked for a couple of startups, one building 10 gigabit ethernet switches and then kind of thought maybe I'll do something completely different. I've worked for a database vendor for quite a long time based on PostgreSQL. So I spent about five years working with PostgreSQL, but not on PostgreSQL specifically. And that database was sort of geared towards big data analytics, that sort of thing. All the buzzwords you can think of, we had them. So I now work for a company called Stack HPC and we're a consultancy for HPC on OpenStack. We primarily deal with complex kind of infrastructure and this is kind of where this talk came from because you kind of need to monitor it. I'll give you a quick background on OpenStack because there is a booth over in Building K and some guys you can go and talk to about it if you like. But the buzzword kind of title for it is Cloud Orchestration Platform. But what it really is is it's a bit like AWS, but you can do it yourself. That's probably the shortest way to put it. I can talk to you more about that later if you want to come and find me. The thing with OpenStack is it's quite complicated. You might think of it as a complex distributed application to run your complex distributed applications. So it's quite subjective, I think. You hear differing opinions, it's a very useful tool and it's gained quite a lot of traction in the past few years. So operational visibility of this is critical because there's a lot of things to go wrong and you need to make sure that you notice when it does go wrong. So monitoring. In the context we're thinking of, monitoring is this kind of thing. So you've got a load of servers, you've got a website, you've got a database like PostgreSQL. By the way, anything in this talk you could apply to using PostgreSQL to monitor PostgreSQL. That's fine. In fact, I think it should be encouraged. So it could be a cluster of systems, could be a disk drive, anything. And we draw these graphs and we make alerts and it emails us when it all goes horribly wrong. And generally this tends to be quite different for everyone. Everyone has their own idea of what they want from a system. A few things that we kind of think are quite important. Fault finding and alerting. So you need to be told when something breaks, when something goes down, you need to replace a disk. Fault, Post, Mortem and Preemption. So once something's gone wrong, you want to prevent that from happening again. And ideally you want as much information as you possibly can about everything that was going on around the fault. So you can work out what caused it and make sure it doesn't happen next time. You can also use a monitoring to measure utilization and efficiency of your infrastructure. It's kind of amazing how many people, if you ask them if they've got a huge cluster of servers, they spend tens of millions on these servers. It's quite often they don't really know how much it's being used. It might actually only be used for a few days a week or a few hours a day and it's just sat there wasting electricity the rest of the time. And as soon as you kind of graph these things it becomes immediately clear. And you can even take it a step further and use some of these techniques for performance monitoring and profiling. So how fast are my database requests, how fast are my web requests? A good example of this in Postgres would be with statement logging. So you could actually monitor the latency of all your database requests. And this kind of, you can take this a little bit further and go into the realms of application profiling as well and start tracking things like HTTP requests. And in OpenStack specifically every OpenStack API request is linked with a unique request identifier so you can actually trace all of the HTTP requests around the system. In a sort of different direction, you can also use it for auditing, security, especially kind of things like tracking log files or tracking network requests or SSH logins, things like that. That's all things that we kind of want to do. And at a very high level than that even, decision making. So planning what your next system is going to look like based on how much your old system is being used, how well it worked or how well it didn't work. It's kind of the manager's dashboard, dollars per hour, that kind of thing. So do a little bit of background on existing tools you might kind of use for this sort of thing. And the first kind of category of tool might be worth pointing out. I mostly work with open source software. I don't tend to buy a lot of proprietary software so I'm sure there are plenty of proprietary monitoring tools which I don't know about and will never mention this talk. So sorry if one of them happens to be your favorite. But in the open source world, one category of tools you tend to find are ones that do checking for you. So they'll ping machines, make sure they're up, make sure that your web server is servicing the quests. And these systems usually give you a dashboard and they'll quite often store history of the events as well. And a classic example of this is Nagyos or Eichinger or someone told me the other day there's another fork as well of it. They're kind of a whole category of things. And quite a lot of people tend to like their own as well. Bash scripts like this seem plenty of as well. And sometimes that's all you need, at least you know it kind of works. The issue with these kind of systems is it will tell you when something goes wrong but it won't tell you why necessarily it's gone wrong. What you really want to know is all the information about what happened around that kind of event. So post mortem analysis is usually the only option of digging around. The system was in a state at the time it went wrong, it died. Unfortunately it's now restarted and it's in a fresh new state and you've kind of got to trace your trace back to try and work out what happened. So something else that's kind of gaining popularity at the moment. You're probably aware of this. Kibana Elastic Search and Log Stash. It's kind of a great job of advertising this is the Elk stack. And this is a way of centralising all your log files together, which is hugely useful. I mean this seems to be what people like using at the moment but for a long time our syslog or the systems before that they're all very capable of shipping all of your logs to a central server and gripping it however way you like. So this one's quite useful unless you do kind of full text searching on your log files and see where your connections happen and how often your log messages are firing. So if you see this big peak in errors at some point in the middle of the night then maybe your batch job's gone crazy or something like that. The Log Stash component of it is useful for transforming your data. So if you've got your web request, classic example is you've got web requests. You've got the IP address and you use Log Stash as a plugin to determine the geolocation of that request based on the IP and then people draw pretty graphs of the world and they find most of their users are from North America because the most people are in North America. So the next kind of set of systems are these metrics based systems. So we collect metrics like CPU percentage or disk space over time and what these let us do is these let us get some insight into what happened before your system fell over. The system falls over and just before it fell over disk space was rising. That's a good sign of what might have gone wrong there or memory rising slowly and things like that. So these are becoming very popular. Very, very, very popular as you'll see in a minute. The tool for this in at least the HPC world and a lot of other worlds has been ganglier and this kind of gives you these nice little graphs and it usually comprises of a collector component that runs on all your servers with a database and an aggregator which gives you a front-end and pulls all the data together so you get graphs like CPU usage over my entire cluster, things like that. And this one's actually Wikipedia's so you can actually go there and look at all of Wikimedia's server stats which I had great fun with the other day just sort of digging around, seeing what's going on there. Although as I checked it yesterday, so I wrote this slide quite a few weeks ago, I checked it yesterday and a big banner comes across the top saying ganglier is deprecated so make of that what you will. And the thing that they link you to is actually I'm interested in the next thing on my slide, next slide and that's Gravana. So this is kind of, I guess you could say like the hipsters kind of tool of choice. It's actually a really good tool. I like it a lot. It's very web, 4.0 or whatever app now. Very clicky and draggy and it's quite nice to use. I don't need to dog on it too much but it does look very hipstry. We don't let that pick off. That's the same thing. It's a George's graph for you. CPU usage, QDepth, whatever you like. But this is only a front end. This doesn't actually store any data for you and it doesn't collect any data for you. It only draws graphs. It connects to a service which gives the numbers back to it. And this is kind of what is interesting, isn't it? There's a lot of choice of what you can use for this. I've listed, I think, about 20 there of time series metrics databases. And there's more I haven't listed here. And they're not insignificant projects either. Notice at the bottom there is one that actually uses PostResq to store its data, which I thought was worth mentioning. So that kind of makes the rest of the talk slightly invalid but we'll carry on anyway seeing as I've already written it. So the interesting thing about all of these databases is they're not insignificant projects. So a lot of them have backing from quite well-known companies or quite significant companies. Some of them work for companies. Some of them are built by people. Some of them are built by companies with investment funding or they're trying to make start-up companies. They've got money behind it and funding. So I kind of thought, well, when did all this start happening? I'm sure some of you might see where this is going. So I drew this chart of when all these databases started appearing in the world. And don't mind, these are only open source ones as well because that's kind of the feel of primarily it. So in 2000, Ganglia's been around a long time. Been able to draw your glass for a long time. Originally started in 2000 from University of California, I think. And then sort of Graphite arrived in 2010 which is kind of a bit of a replacement for the internals of Ganglia because some people thought R.R.D. Tool was getting a bit long in the tooth or maybe they just wanted something fun to do. And then sort of in 2013 to 15 time frame just an explosion of projects doing this time series monitoring kind of stuff. It's kind of interesting when we think about NoSQL as we think about the document stores and things like that. But it seems like time series has also been quite a big component of this. So it became trendy again. There's even, I think, a new one that was announced just last year in 2016, one at the bottom there. So the system, I'm going to talk now about the system we're primarily working on. And this system is geared around OpenStack but it kind of encompasses what a lot of people are building for their monitoring infrastructure now. They want the alerting, they want metrics, they want logs. This is all information we want. We want it central, we want it pretty. So we start off with our servers, our software, storage network, all of our bits that we want to monitor. And we've got all these metrics and all these logs coming out of all of this. We need to store it and present it in some way. So the project that we're working on in the OpenStack world is called NASCAR. And it's a set of APIs for letting you consume metrics and access metrics. It has, and it kind of integrates with OpenStack and all the multi-tenancy that OpenStack gives you. So each tenant has its own set of metrics, its own set of logs and all this kind of stuff. So it's quite nice. It's actually quite a nice system. As you'll find out, it's locked to it. So the first bit of it, it has a MySQL database to store configuration and alerts and things like that. Like most of OpenStack, there's a database behind a lot of the components. So it then has a time series database of some sort, for the metrics. The common one, the popular one is InfluxDB at the moment. It also supports a few other ones. I think it supports Cassandra. They tried supporting Cassandra, and there's a Vertica backend, things like that. Then that feeds Grafana, which has its own internal SQLite database. It stores dashboards and things like that. The logs are all stored in Elastic via LogStash, and Kibana sits at the front, the UI for that. Then someone put Kafka in the middle, because we need to handle peak loads, so we need a queue for that. So everything goes to Kafka, goes through LogStash, back into Kafka, into Elastic, into Grafana, into InfluxDB. And of course, because you've got Kafka and Elastic in there, you need a ZooKeeper as well. I have the three nodes for that. So you can kind of see where I'm going with this. That's six persistent data stores to monitor your infrastructure. This is kind of quite a common system. A lot of people are doing this outside of OpenStack. The Monastica project really kind of encompasses what people want to use at the moment. So having this many databases, each with their own HA protocols, each with their own quirks, persistence layers, each of them you have to back up if you're doing it properly. It becomes a lot of overhead there. So it's a commendable right tool for the job attitude, but why not Postgres? Postgres can store data. So if we just used Postgres to have fewer points of failure, fewer places to back up, fewer redundancy protocols, a more consistent set of data semantics, and you can reuse your existing knowledge of Postgres as well. So this seems like a good idea. So if we look at these components again, well, as it turns out, the Monastica team have already ported this to Postgres. So you can already use Postgres as a backend for this. As with a lot of OpenStack, you can use Postgres with it. Grafana actually also support Postgres, which is nice and my sequel, but we don't care about that. Why not replace the time series database with Postgres? Postgres can do time series. It's a relational database. It's not a new concept. Well, I don't put the logs in there as well. We've got full text search. We lose Kibana because Kibana and Elasticsearch are quite tightly coupled, but Grafana is able to display logs quite nicely. If we don't have a huge system, there's a possibility we could just shove the data into Postgres as fast as we can without doing any of the processing and use that as our buffering mechanism. And don't need that. No Elastic, no Kafka. And while we're there, why don't we use Postgres as text mechanism to get rid of that as well? It's satisfying, isn't it? So some of that, you're probably thinking, oh, some of that's a bit of a push, and you're right. So I'm going to go a bit deeper now into the metrics part, because that seems to be what people are most interested in at the moment. So how would we store our metrics in Postgres? So the system we're building is quite modest. It's an 18 node cluster. We take 200-ish metrics every 30 seconds. So quite limited. We want a six-month history. The server we've been given currently to do this on has a terabyte of disk space, so that's all we've got. We want queries to be fast, because this is all user-facing, so you want to click it and it comes up without notes. So we want queries to be less than 100 milliseconds. So with the time series data, we kind of had two categories of query. We have one query which is, get me all the measurements for this series. So get me all the measurements for a particular post, something like that. And then the other kind of category of series is, I want the average CPU load of all my hosts. So you end up sort of averaging them all together. And then you get one line instead of two or 80. I'll go into a bit more depth in a bit of a minute, so I'm going to move on from that. The other sort of category of queries you end up having to do is you want to find out what metrics you've got in your system. Because these are all changing dynamically. Hosts are coming and going. New VMs are coming and going. And networks are coming and going. It's all changing all the time. So you want to work out what you're actually monitoring, what's available. We want to be able to list metric names. Maska has this concept of dimensions. So things like host name, mount point, process name. And we want the values of those. So we want to be able to say, list me all my host names. So the data in the queries for this is the next thing we're looking at. So the data comes into this system in JSON format because everything's JSON. New XML. And the structure is kind of relevant for this type of time stamp. You get a name, you get the value, and you get a set of dimensions, a list of tags essentially, sort of key value pairs. So for a CPU percentage, you'd get a host name. And there's a value meta, which is kind of an extra bit of data you can store but never really gets processed in any way. So fairly simple. You can stick this in Postgres. You're probably all thinking this is a terrible idea, but there's more to the talk. So we can just shove all the data in. The dimensions we'll store is JSON B because that fits nicely. We want to access them quite fast. The value meta we don't really care about. We just want to pipe it back out again. So JSON is a good fit for that. All of these systems, if you sort of dig into the details, double precision, none of them do anything other than double precision. So float 8 is fine for all the values. And the name and the time stamp. Please always use time stamp, TZ. Nothing else, ever. You'll thank me one day. So this is a query we might run to get a single series out of the system. So I've got a function there that rounds a time stamp to the nearest number of seconds you pass it. It's not a Postgres built in, but it's very easy to write. You can find examples of it all over the place. And so we often want to say, give me all the values between this particular time range for this series. We might also want to say, make sure it matches these dimensions. So we get that single series of CPU time for host dev 0.1. And then we might want to... So this is just an example of we actually want to get all the series. We want to get that individual series. We want to get it for all the hosts. So we want to... So again, we can dig into our dimensions field, grab the host name and then group by the time window and the host name as well. And finally, we might want to do this combo query where we sort of roll everything up into one big metric. So CPU percentage for all my hosts. I don't care about dimensions. And so the metric name list, we could do that. It would work. A lot of you might be thinking that's going to take a while. Yeah, it's going to take a while. But it's... you could. Dimension names, a bit more interesting. Postgres has a nice function to get all of the keys out of a blob of JSON for you. And likewise, if you want all the values for it, then you can just dig into the JSON and pick out the dimensions. Well, fairly straightforward so far. Not very complicated SQL. It's not going to be very fast if we store it like that. So we do need to optimize it a little bit. So if we stick with our denormalized schema, we could just put some indexes on it. We can use a GIN index for the JSON B so we can pick out the host names really fast. We can put a index. We can put a multi-column index on name and time stamps. So when we're looking for particular time range and a particular name, Postgres does that very efficiently, sort of walks through the B tree to find the right name and then walks further through the B tree to find the time stamp and then just iterates all the way down. Really good structure. So a lot of information here. I won't dwell on it too much, but the queries I decided to look at for the performance were some of the series queries of varying time window. So over a small period of time, over an hour, over six hours and over 24 hours. And then the listing queries as well. So the kind of interesting thing about this is this kind of rather naive schema we've put together doesn't actually do too bad. We can pick out individual series really fast, but doing these kind of queries over the big time windows, so over the six hours and the 24-hour time windows, it really starts to sort of grind away a bit. I should mention all these tests I talked about is over one day of data, and that's about 45 million rows on the table. I don't think I mentioned that earlier. So I just rescale that query so we can look at it a bit better. Some of the queries are kind of on the edge of our 100 milliseconds, but a lot of them are actually in there. It's quite interesting. And unfortunately, those other queries where we are picking out metric names and dimension names, as you can imagine, they take a long time. Select distinct over 45 million rows. Yeah, it's not going to be fast, I'm afraid. And then we zoom in on that a little bit. You can see we are well over our requirement of 100 milliseconds for all of those queries. And of course, you're all shouting at me, this is stupid, you should have two tables, and you're probably right. So the first kind of thing we can do to improve this is normalize out the two tables. So we've got a separate metrics table and a separate measurement values table and they're joined together with an ID. This has some other advantages as well, which I'll talk about in a minute. But it does make the queries quite a bit faster. It doesn't mean we need to do a little bit of fiddling around. When we insert the values, so we find the ID of the metric and then we tag the measurements with that value. I don't expect you to read that now, but it's kind of the rough thing. It's not too tricky. And the queries that we can do on this, exactly the same as before, it's just we can use that view to do the join for us. It's just technicality more than anything else. You could write the queries with the join if you wanted to. And the indexes are mostly the same as well. So we have a multi-column index on the metric ID and we have just an index on the name of dimension. So when we're doing that normalisation, we can find the name and dimensions really fast. And that helps a bit. Our queries are down to 500 milliseconds for some of them. For the 24-hour query, we're still way over one and a half seconds, two seconds. We need to do a bit more work. The listing queries, however, because we've already normalised all of those lists out, we've already done that distinct operation effectively, are now well in our requirement. So that's good. So the problem with this is, and this is the problem with all-time series data you eventually get, the actual detail that you need for a particular graph becomes less interesting the bigger the time window you're looking at. So if you're looking at a 24-hour time window or a six-month time window, you don't need to plot every point from every 30 seconds on that graph. So you really don't have to store all of that detail. And this is what a lot of these time series databases seem to be doing, is they're doing these roll-ups as you push the data in. So instead of having every 30 seconds, you have a data point through each of your metrics every two minutes or every five minutes. And this really shrinks down the amount of data you need to query for 24-hour time window. And so I kind of call this a summarised schema. It's kind of conceptually similar to RRD and tools like that. And again, a lot of the internals with these other databases are doing this thing as well. They're building up course of roll-ups to query instead. So we can do that in postgres. We can build these summary tables. Depending on the functions you want to do, you could compute. So I've done some count min and max here because then you can kind of aggregate them together higher up. And we've got a unique constraint there just to make the integration of these summaries a bit easier. So it's actually triggered. So let me insert data into our main measurements table because we want to keep the detail in case we want to go really fine-grained into it. And we just create a trigger which adds a, eventually updates a row, the row for that particular time point for that metric ID. And then if there's a summary already there, then we combine it. So when we summing them, we add them together. Or for minning, we create the min of the one that's already there and the new one we're adding in. And that's it. That's kind of all you've got to do to build these summaries. So let's turn Postgres into a time series database. So what I've kind of discussed here isn't the most efficient way of doing it, but it does the job as we'll see in a second. So a few technicalities just to join the tables together. And the queries, so the queries for the small intervals, we still go to the raw data for the larger intervals. We now go to our summaries instead. And these queries, of course, are a lot faster. So that's good. So those 24-hour queries that we're taking, two seconds, we're just plowing through the data as fast as we could. Now we're doing something a bit smarter. We're only querying the rolled up data, the data we pre-rolled up. Even our 24-hour query comes back in less than 100 seconds. Pretty good. And of course, some of the dimension listening queries are the same. So the other thing that we want to consider is, can we actually get this amount of data in? So we need to be able to get a day's worth of data in less than a day. Otherwise, we're going to start lagging behind quite significantly. And for this system, so we've got quite a lot of headroom. I think the summary scheme equates to about 15,000 measurements a second. So it's not a lot, but bear in mind, this is a very naive scheme and a very naive set of triggers we've drilled together to make this work. So pretty good result. The other thing we're noticing is the normalization, as you might expect, really does reduce the amount of disk space you need. And in fact, this was necessary for us because in order to get to that one terabyte for six months' value, we had to hit less than six gigabytes a day of data. So we were well over that with the denormalized data. So this isn't the whole story. I mean, this is an example of how you could do it. To really make this work in production, there's a few things we're going to do and there's a few things you're going to have to do as well. You're probably going to need quarter summaries. So if you want to do a sixth month average, you're probably going to need maybe a summary for a point every hour or maybe a point every two hours. You're probably going to have to partition the data and that's kind of the assumption I made. So all of these tests have been on a day of data. So you're probably going to want to partition it by day. And that also makes dropping the data very fast. And there's a few tricky ways you can optimize the producing of these summaries because the way we're doing these summaries, we're doing an update for every measurement. So be sure you look up at an update of a row. So not the most efficient way of doing it. What we really want to do is take a whole batch of these values. So for a five-minute interval, we want a five-minute's worth of values and shove them all into a smaller format in one go. And there's a few ways to do that, but I'm not going to go into that detail. This is kind of meant to be an overview. So we can do metrics. It wasn't too hard. So this next section is kind of a bit wider stretching perhaps. So what else could we put into Postgres? Well, we said earlier we could put logs in there. So let's do that. So we want some centralized log storage. We want it to be searchable. So we want to get a search for things like connect and HA proxy and we want to get values back. We want it to be time-bounded. And again, they've got to be fast because we're going to produce interactive graphs and interactive lists of these things. We don't want to keep people waiting around. So this is kind of the data we get. It typically comes from our syslog. You get things like severity, the program name, host name, and you get the message. So again, basic schema, time stamp, message, dimensions. And the sort of queries we want to do, if you're used to using kind of elastic and cabana, you might want to do things like this. So you want to say, find me logs with the word connection, where the program name is HA proxy. And we can do those with Postgres. So Postgres has got full tech search. It's admittedly a bit longer. We can do a TS query on the message and then we can do a contains on the dimensions. As long as we've got that indexed with some gin indexes, it'll be nice and fast. I haven't dug in in depth for this section of stuff. So I've got no performance numbers for this, I'm afraid. A bit of anecdotal fiddling around shows that you can store quite a lot of logs with some basic gin indexes on them and get some really fast queries. That's good. If any Postgres people want to tell me whether it's a better idea to combine those into one multi-column query, multi-column index or not, that'll be really interesting. I'm not sure. I haven't tried it yet. So log parsing. We're sort of stretching now, but I think it's interesting to try. So this is our log message again, our log structure, a bit of JSON that we've got out of ASUS log or wherever. And what we've already done, we want to do is we want to notice that that has connect from, and it's from HA proxy, and we want to tag it. So this is the sort of thing that LogStash will do for you. We want to tag it with connect, because that makes it really easy to search for it later on and draw kind of graphs with it. We then want to get all of this data out, and we want to store it in a structured way so we can search for a service name or protocol. And it's nice and reliable and robust, and we're not doing a bit of holdable regexes all over the place. I shouldn't have said that because we are going to use holdable regexes to do this. But Postgres can do regexes. And if you've used LogStash for a long time, you realise that the core thing is the grog. I mean, that's the really good bit of LogStash. And it's a regex. And you match your log messages, and it pulls out all the field data for you. So with a bit of extra JSON garnishing, we can make this nice into a little JSON blob as well. Yeah, that's quite good. So we've got our schema for our logs, which we looked at a minute ago. What we really want to do with this data is we want to parse the message against the patterns, and then we want all those dimensions extracted and added on as extra fields. So this is kind of an idea. This might not be the nicest way to do it, but it's certainly a way you can do it. We can have a patterns table. So we can store our patterns in the table and we can store the field names alongside them. So that part of the regex ends up being the source IP. You sort of match them one to one onto each other. And then use our little trick earlier. And again, just keeping this simple, let's use a trigger. So as we insert logs into our logs table, we pre-process them. So for those people not too familiar with Postgres or SQL, this trigger lets you pre-process the row before it's inserted into the table. So we're using that. So we iterate over all of our patterns, match all of our patterns. If it matches, then we add the dimensions on attaching the trigger to the table. Interesting. So when we insert our log into the table with those dimensions, it actually grows all of this extra stuff because of the trigger, which is really nice. And this actually gives us a way to dynamically add patterns to our log parsing, which is really cool. So we could have a... So it's not in the... If you ever use log stash, you know, you have to rebuild... You have to edit config files and restart it before it kind of... before you can add new patterns to it. Interesting advantage. Again, probably not the most efficient way to do it, but good enough. So stretching even further now. What do you mean by queuing in these systems? Why do these systems have message queues in them? So the point of the queue is really to handle really bursty traffic. So you might have a system that's spewing logs out at quite a rate, and you really want to keep this, and you want to persist this as soon as possible so you don't lose it, but you don't really have to query it straight away. It's okay if it sits there for a few minutes. As long as it's there, and you can look at it eventually, then that's kind of good enough. And it's really the only option you have. If you're getting gigabytes of logs, there's only a certain amount you can process unless you have an entire cluster just for processing logs, which I presume some people do, by the way, they speak about it. So really all we want to do is we want to just write the data to some sort of persistent storage as fast as possible, don't have to query it, and we'll worry about structuring it later, processing it later. So if we go back to those ingest rates... So I've turned the numbers I had from earlier kind of on their head from ingest time to ingest rate. So our summarized time series data, about 15,000 of them a second, because we're doing a lot of processing on it and we're not doing it very efficiently. But the less processing we do, the more we can shove in. So if we go back to the denormalized version, we can actually put in quite a lot more seconds, so we can take in 140,000 metrics a second. That's quite good. But we can simplify this. I mean, if we just stored the raw JSON B that we got from the software, doing even less processing, now we're up to a much higher number. I think I got my scales wrong there, sorry. So that one's about 140. Well, let's do even less processing. The story is JSON. The JSON, for those of you who don't know, is a text version of the JSON stored in the column. The JSON B does some extra fiddling around for you to make it more efficient to get to. The thing about the JSON type is it still validates your JSON for you, which is great most of the time, unless you really just want to shove data into a table really fast. So if you store it as Varchar, you can really press out even more, and you can give it 250,000 measurements a second now. Do anyone have any idea of what I'm going to put there next? Copy? Anything to do with copy? So if you go to the darkest steps of Postgres and use binary copy, then you can even more performance out, nearly up to 350,000 of these a second. So 350,000 rows of JSON, and these bits of JSON are about 300 bytes long. So the metrics are similar sized to the logs. So that's quite a big system. You can handle pretty impressive ingest rate just by stripping all the processing away and just shoving it in there and having a sort of background process, doing the processing later on for you and letting it catch up over time. So it's possible we could replace the queue with Postgres as well, which would be nice. So there's a lot of persistent data in one place. So what's kind of the conclusion of this? I kind of view Postgres as a, as the hipsters might say, data persistence toolbox that just happens to use SQL. I think a lot of people are scared off by SQL, and if I hadn't worked for a database vendor I might be equally scared by SQL, but I still think it's useful. I still think you can do a lot of things with it. They're kind of the no-SQL crowd like to do. I don't think you should be scared of it. The batteries aren't always included though. You have to think about your problems and do some work. Postgres isn't a bespoke solution for your time series. It's not a bespoke solution for your log searching. It doesn't mean it's hard, but reducing those number of systems can be a huge operational advantage if you've got quite a small team. One bit of software, it's easy to understand, instead of six, and use and deploy what you know and what you trust. If you trust Postgres to store your data for you, then why not use it? That's a good idea to me. So that's it. Thanks for listening. Thank you very much. Okay, do we have any questions? I'll gather them next. Hello. Can you tell us what kind of hardware and main settings you were running for the 45 million logs that were entering Postgres? Yes, it was my laptop. Oh, okay. Any particular tuning settings? No, I had Postgres running in an LXD container. Okay. With default config? Yes, just installed it. So we can make it better? You probably could make it a lot better. Definitely. You said you could replace the Inflex TV with Postgres and said that Grafana could read that, but I couldn't find anything about Grafana reading from SQL. Ah, yes, you're right. So there is a Grafana pull request, I believe, open where they're actually building in support for SQL databases, which is quite interesting. But what we're actually doing here, so that menasca component I talked about briefly, has an HTTP API for storing metrics and for obtaining metrics as well and running queries. And that's part of an OpenStack project, and it's all written in Python. So what they've done is they've built a Grafana plugin that talks to that API, and that's how you're accessing the query. So we're not actually querying Postgres directly, but we're going through this kind of API layer. But if you did, if obviously you were outside in an OpenStack context, then it might be worth keeping an eye on those SQL functionality pull requests in Grafana, because they look quite promising. One question, is this code available on GitHub or something? Not currently though, my employer is more than happy to open source everything we do, so if you drop me an email, I'll let you know when we've made it available. I'll certainly make it available in the coming weeks. Definitely. How you got all the data in database first? How we got the... How we got them into... So the performance test will run using copy through a Python... a little bit of... a little Python shim. If you do a lot of Python with Postgres, inserting individual rows or even insert many is very inefficient. You're much better off using copy in like PsychoPG2 and then you get really, really good insert rates into Postgres tables. And can you tell about scale of your infrastructure just monitored in production? Because people who listen and think is it applicable to my infrastructure with just proof of folk? So that's sort of an interesting point to make really. This works because our infrastructure is quite small. So if you haven't got extreme requirements, you can make your life a lot simpler by doing something like this. The system we're going to deploy this on, the system I've been testing, the scale I've been testing is 80 nodes with about 200 metrics coming from each node every 30 seconds. And the system we're going to run it on is ludicrously overpowered because all these tests were run on my laptop. So... I think you should be able to stretch it quite a way. Can't prove that. May I recommend that when you run an Ingress system for the high-speed data you're collecting, that you use it on a different Postgres than the one that you actually collect the long-term data on? That's a very good idea, yeah. Definitely. I was having a conversation with someone on Friday about potentially using the logical replication to feed a lot of the... So we really want to optimize that summarizing process actually feeding it asynchronously through the logical replication to a different Postgres instance means the summarizing and all that extra work we're doing isn't holding up loading the data. So you'd actually be able to load data a lot faster in that case. So yeah, that's a really good point. Any more questions? Thank you very much.