 I'll try to speak loud enough. If you really can't hear me, please shout. So I'm Lucas. I'm going to talk about stat statements mostly. And my experience is with it. Some open source stuff I've published that might be useful to you, that kind of stuff. I have to admit I'm a bit chat lagged. So if I talk too fast or unclear, please let me know. But I'm sure we'll make it work. So basically, oh, I hope the slides work. But so I know a lot of Ruby. I'm not as good in Postgres hacking. So I'm not the best C developer. But I really like Ruby. And so my tooling tends to focus on making stuff usable with Ruby using the Postgres data. And so the product that I'm working on is called Peach Analyze. It's a hosted monitoring dashboard. I'm not going to talk too much about that. I am, however, going to describe to you how we work with stat statements in that tool. Because we have a little bit of scale on that. And we have seen various issues, like bad data coming in, cleaning it up, all these kind of things. So who of you knows stat statements and knows what stat statements does? OK, so about half. So the name of the talk is Advanced Peach Stat Statements. But I'm still going to repeat what stat statements has, especially because there's a lot of fields in there which might be useful. So basically, stat statements is a module, like an extension. You need to enable that extension in the config. So it's a shared library that you need to include, which means you need to reboot. This is something that we've actually seen a lot of customers having an issue with, is you have production database. You don't have stat statements enabled. You want to start monitoring it. You need to reboot your server, which unfortunately is no way around. And then once you've done that and you create the extension, then you basically get a view of all the queries that are running. It's basically a counter. So every time you look at that, the values would increase if more queries run. So you need to have some tooling around working with that view. The good thing about stat statements nowadays, especially, is that cloud, DBA, AS, whatever, platforms actually support it. So if you use Heroku Postgres, you can use stat statements by default, like no additional work to be done. If you use Amazon RDS, since November, they also added stat statements. You need to enable it, but it's possible, which is really useful because usually you don't get as much introspection there. A couple of tools that work with stat statements data. So PG Hero is one by Andrew Kane from Instacart. It's a nice little tool, open source. Ours, also, both written in Ruby, actually. Ours closed source in this case, but the same mechanism. POVA, but Dalibo, they do a lot of things with stat statements, actually. They also released a really interesting extension lately, which I'm gonna come back to, also, open source. So if we look at stat statements, you basically query the view, and then the data you kind of get back. So the most important data is, which query was it, the number of calls, and the average time. And you see that this is the average time, not the mean time, which means if you basically, if that data accumulates over a lot of time, it's difficult to know what the highest value was in the lowest runtime. So that's just something to keep in mind. Stat statements also normalizes the queries since 9.2. So that means this little question mark you see at the end, there was actually a value like five, or eight, or something, or the password of the user. And so the good thing about stat statements is it kind of groups these queries together, mostly based on constant values, which is really useful if you have a lot of data. And another nice thing about stat statements is it gives you the hit rate for the shared buffers. Actually, not many people know that, but it's really nice. So basically, you'd use this little formula. It's pretty simple, the share blocks hit divided by share blocks hit and share blocks read, which if everything would be cached in the shared buffers, it would be 100%. If it has to be fetched from the disk slash operating system cache, it would be lower. And you can do this on a pair query basis, which is good. And then one thing which we actually need to enable another config option is block read time and block write time. This is actually milliseconds. So basically this is the accumulated time that was actually spent with IO fetching the information, which is useful to know if you're heading like the OS cache or if you're actually doing like really slow disc reads. But a problem is that you need to enable track IO timing, which has some overhead. Postgres docs are good on that, but just check whether the overhead is an issue for you. Some platforms like Heroku actually enable track IO timing by default. So if you use Heroku Postgres, you actually get this for every query. Any basic questions for disinformation? It's just kind of recapping what we have here. Okay, cool. So there's three things I want to kind of talk about. Basically improving data quality. So kind of like when you work with Peach Set Statements data, how can you improve the quality of that data? A tool that I developed called PG Query, which parses SQL queries and gives you the AST of that query, which is really useful in working with Set Statements data. And then I'm gonna talk a little bit about filtering Set Statements and regression testing with Set Statements data. So improving data quality. So one of the issues, especially pre Postgres 9.4, was that the query string would be truncated. So that means if you have a query, for example, like some frameworks like Rails, they have like really long ID not in and then they have like pass in all the IDs, which is a horrible pattern, but they do that. And the problem is that the query string basically gets truncated and so you actually end up with a query that's not normalized and that's also not parsable. And so before 9.4 was usually like, I think default is like a thousand characters for the query. You can increase that, but it had a little bit of cost because it was kept in memory. With 9.4, this is now on disk, like stored in a file, which obviously enables you to use longer query texts. So just to keep in mind, if you use an older Postgres version. Then one thing, so because you kind of, there's a counter values, right? So one thing that we also did initially was using Pitcheset statements reset, which basically resets the statistics, right? So it would like basically remove all the queries from like the internal cache encounters and kind of restart tracking, which is really nice because if you do this like every 10 minutes, then you have a clean slate, you kind of know what has run. The big downside of that is A, it produces a little bit of like overhead and B there's actually a race condition, actually a bug kind of in that statements reset. There's a reason why it's not fixed, but the problem is basically sometimes if you run reset and there's a query that runs at the same time, it doesn't get normalized. And actually, so the thing is it groups together correctly, but it doesn't like the query you would see in the statistics is not normalized. So this kind of, this actually shows up a lot if you have like bigger systems which do resets. So you kind of need to clean the data. But in general, lesson learned, don't use that statements reset. Starting with 9.4, there's query ID, which kind of allows you to identify unique queries, where you kind of see like which query's changed before you have to do some magic around like tracking calls and kind of understanding. Like there's some stuff I can talk about if you want. So talking about normalization also, right? So one of the issues, and this is something like I kind of call this fingerprinting because it's kind of about, so how can we kind of generate the query fingerprint that sums up the logical idea behind the query, right? So one thing it says statements does, I hope you can read this. So if you have select ASB, it's actually the same as select ASC. So like aliases are ignored, which is good. Some ORMs like Hibernate does this a lot. So it's really good that that's taken care of. Other cases are not taken care of. So for example, select something from Y in and then you have a list of IDs, right? This kind of anti-pattern and you have different lengths of these lists. They actually like are separate queries in stat statements. There's also an idea behind that because obviously if you pass a long list, it might be a lot slower than if you pass a short list, but it makes the data really messy. So that's something to know. Another thing is, let's see. Another thing is if the target list is ordered differently. So this actually kind of could be fixed in normalization code. Basically, if you do select AB from X versus like BA from X, it's not the same query. Again, an issue with some ORMs, which just randomized this stuff. And then let's see, oh yeah, deallocate. If you use an ORM that likes to do prepared statements like a lot, like basically allocate them and throw them away again, you get a lot of deallocate calls. The problem is that deallocate is actually, like I think nowadays it's ignored in 9.5, I mean 9.4. But what we've seen is like, you would see deallocate and then the prepared statement name and these would like just bloat your statistics. So again, something you wanna ignore. All right, so as mentioned, one really good thing is that Postgres 9.4 introduced query ID. So what that statement does internally, it actually, it doesn't work with the query string, it works basically with a, it calls the jumble, it kind of jumbles the query, meaning it takes the parse tree and kind of tries to find a unique hash for that. And previously that wasn't exposed. So previously it would use this internally, but you couldn't actually work with that information. Now you can. And one interesting thing is 9.4, you can also not fetch the query anymore. Cause like this goes to disk writes, you might not wanna do that. And so if you, for example, if one query where you really wanna like monitor it every, I don't know, 10 seconds, you could do that now without actually like getting the query text all the time. But important thing to know, query ID is not consistent across systems necessarily. Also if you, for example, the OID changes, so let's say you have a staging database, you drop and reload the database, it would actually be a different query ID. So it might not be useful for like aggregating data. And then something which is, I think like yesterday I checked and it's not merged yet, but hopefully it will be merged, is a patch for adding a minimum maximum time, mean time and standard deviation to set statements. Hopefully that gets added. The big benefit of that would be that you could do something more interesting than an average. Would still have the same issue if you don't reset, like you might wanna reset like every 24 hours, cause otherwise the max would just be like the max of like two weeks ago, which might not be as useful. But anyhow, this is like really useful. If it gets merged. Cool. Going a bit fast. I like to go fast, but we can talk about all the little details later. So one interesting extension, and this actually builds on the query ID that's exposed to 9.4. It's called pgqual stats. And the folks behind POVA released this. So the issue that you often have is, if you have like in stat statements, you don't know what the query actually was looking for, right? So sometimes you have something which is like in RAM, sometimes it's not. And so the queries can be wildly different. And stat statements doesn't really tell you like what kind of constant values we're looking for. And so what quals stats does is, for a certain query ID, it can tell you which like predicates, which where clauses and from clauses, which values were in there, right? So you could kind of say, okay, a lot of times we query, I don't know this user, but then we query this other user, right? And this other user is the query, which is slow. So that's really interesting. I haven't played with it myself, to be honest, unfortunately, but I'm super excited that this exists. And then there's a thing called pgstatplans, which it's been unmaintained for I think even two years now, which is really sad. It's interesting though, because it kind of explored the idea of tracking query plans versus queries, right? Giving you actually a query plan with each query, so you kind of say, we have this query and then we have these different plans and this is how often they were used. There's actually a really bad bug, so you shouldn't use this right now. There's a bug which reruns the statement that you were monitoring. So basically, if you were like, because it kind of auto explains and it has some issues with escaping. So basically, if you do like explain delete, it would actually run the delete when like analyzing stuff. So don't use it right now. But if you're interested in like what could be done, it's definitely something which, like there's some interesting ideas in there. But again, has a high overhead, so you don't want to use this on a production system. Okay, so that's kind of some basic stuff on improving data quality. So one thing we've done or I've done to improve the data quality and kind of to have a building block to work with is something called pgquery. And so pgquery, it's a Ruby extension or Ruby library, which turns an SQL query into an abstract industry. So kind of doing what Postgres does internally to work with queries, but doing it in a way you can access it in a tool. And so maybe kind of like taking a step back, like how would you actually work with pgtest statements data, right? So R, this is kind of where pgquery fits in. So R set up with pganalyze and I think most products or tools would be similar. You have your production database. There's some agent or collector which gets the information from database. In our case it's called pganalyze collector, it's open source, you can use it. Then this collector kind of creates a snapshot. In our case it would be every 10 minutes from all the data. This includes in our case the statements data, but also table information like backends that are running like all these other infos which might be useful to work with the data. And so in this snapshot, then we kind of go through four like different steps to work with that data. And then it gets put into monitoring database. So we have like a separate postgres monitoring database running which includes the statistics information which we can then query or like render as a graph or these kind of things. And so this monitoring database kind of looks like this. It's basically, let's see what's interesting here. So we normalize the query ourselves. So we store the original query and then new normalized queries. We store the parse tree which I'm gonna come to in a second. We also store the parse error and parse warnings. So one of the issues is the postgres, like there's some issues with parsing and outputting like utility statements for example. And so what we really needed to do was basically have a way to say, oh, these errors are happening or these warnings are being generated. And then really useful is also statement types. So basically I only wanna see the select queries, right? I only wanna see inserts on this table. And then truncated false true. Some heuristic around like, if the number of parentheses is unequal then it might be truncated, these kind of things. And the fingerprint which is kind of our unique identifier. And then for each of these queries we'd have a snapshot. The snapshot was basically a representation of stat statements. So this is like basically identical to stat statements data. And then a snapshot just like for the basic database information. So let's go through this. Let's actually start with this. So parsing an SQL query. I don't know who of you has tried this before but it's actually surprisingly difficult to parse SQL. You don't wanna use regex, right? Like you, there is some existing tools. There's projects that have the same problem like PG pool for example. Basically anything that's not postgres which needs to be postgres compatible. So I looked at PG pool, PG pool was parser was difficult to extract and they also have this issue like every major release they actually have to like reintegrate the parser which is not ideal, right? So you always have to delay for each new release. So it's really nice if there was, okay you can't see this but if there was an explain parse tree true, right? So if I could kind of tell explain, please kind of explain me how the query looks like to postgres internally. Doesn't exist but might be nice. So what postgres does internally, I'll just walk through this because you can't see it. So this is a parse statement which is called raw pars internally. Raw pars actually like runs the whole like parser and generates the initial abstracts industry which then goes on to the query being rewritten and optimized, the query being planned and then the query being executed, right? What's interesting is that this step here doesn't use the catalog but these three steps later do. And so the good thing is you can actually parse a query without having an actual postgres database like you don't actually need disk access because the parser is independent from the catalog. And so the good thing is if you go into C you can actually do this, you can basically say call the raw parser and then do node to string which is like a debug function and then you can print the string and you actually get the internal representation and there is a format for it. It's a really weird format but this works, right? So this is like if you use a normal postgres and you went to the GDB, you could actually do this. And so what we did with PG query was basically use that and kind of cut off the whole catalog access part and really just saying we wanna run the parser independently kind of. And so if you wanna use this, take a look. It's basically if you wanna install it, if you know Ruby, it's gem install PG query and then you can parse queries. The way it works internally is it compiles parts of postgres. So basically it doesn't actually duplicate the parser, it actually uses the existing code which means it might be a bit slow but it's always up to date, right? So like anything that's in postgres can be parsed. And so again, you can't see this but basically this works, so you can just call rawPars on PG query and this works. So one problem here is that this node to string function is actually used for debugging and so anything that didn't need to be debugged wasn't added there, right? So for example, if you were to parse create schema, it would actually not give you a result, it would say it can't dump this kind of node. And this kind of happens because the output code like this for the outfuncs is actually like handwritten, right? So like every time a new statement gets added, somebody maybe adds that, although it could be automatically generated. And so I actually did this, it might be difficult to merge because it's Ruby, right? So like postgres doesn't use Ruby internally but I basically built a little helper tool this would take the header files of postgres and generate the outfuncs function from it. And the other thing I added was I changed it from the internal format to a parsable JSON format which makes much more sense. Again, you can't read, but it's basically the same but in JSON, I'm happy to show you afterwards. So that also works, right? So like we can now parse a query, which is good. The problem is parsing a normalized query. And this is kind of an issue because normalized queries have question marks in them and question marks are operators, they are not parameter references. And so this works, right? But if you do a question mark, then it gives you an error, it says syntax error at or near semicolon, da-da-da, right? And so this means the parse doesn't work, the analyzer doesn't work, the plan doesn't work, like the whole thing blows up. If you place the question mark with $1, then the parsing works. It can't analyze the query or plan a query but it can at least parse it, which is good enough for us. And so the initial thing actually, so what I did initially was doing this string replacement, right? So like going through all the set statements data and like replacing question marks with $1, which is a problem if you use like operators which use that part. So what I actually changed was patching the parser to support parsing question mark as a parameter reference unless there's some special cases which might be operators. This is, it works surprisingly well. Also it's really hacky, right? So this is not something that would ever emerge in Postgres. The real fix is replacing that, replacing character. Yeah, so all right. So fingerprinting set statements, right? So this issue we had before with the target lists being reordered and this being different queries for set statements. So as part of PG Query, I also added a fingerprinting method which would fingerprint based on the AST but would do some more like normalization which in practical usage has helped. And the nice thing is it's 40 lines of Ruby. I'll share the slides later, by the way. It might be easier to click through. But the really nice thing for me was to kind of see how easy it is to, you know, compare to writing C code inside Postgres. This can be like easily unit test that it's Ruby. I mean, Ruby is really bad in terms of memory and CPU and everything, right? But you don't care because you're only analyzing stuff. It's not your production system. And there's lots more details to fingerprinting. We can talk about that later. And the last kind of interesting thing here is extracting table references, right? So oftentimes I have a certain table which is having issues or I just wanna know like which kind of queries around that table. And you could of course like just grab for it and grabbing will probably bring you quite some way but it's not the right way to do it. And so because you have the abstract industry, you can actually parse a query and then you just, in the pgquery case, just call dot tables and it gives you the tables of the query references. This also works with common table expressions like all kinds of weird cases. Again, because it kind of traverses the AST. It's a bit more lines of Ruby. There's also unit tests for this because as we started doing this, we noticed some edge cases and obviously there might still be some edge cases. Anyhow, if you wanna use this, please use it. There's people like using this for also like, if they have like a schema dump and they wanna analyze it, this also works, right? So any place where you have SQL queries, this might be helpful for you. Just let me know if there's some issue you encounter. There's for example somebody who would like to turn this like parse tree representation back into an SQL string which Postgres can do for some parse trees. But again, this would be nice if there was like a standardized way of doing this. All right, so last part. Filtering and regression testing, right? So this is actually making use of all these tools. So let's start with filtering. So filtering meaning working with the stat statements data. So previously I've made a gist, I made this nicer for you yesterday actually. So it's now called PGQ top and PGQ top is like a top like tool which uses PGQ stat statements data. It's also Ruby, so you need Ruby installed to run this. It's really simple. So if people find this interesting, I'm happy to like expand on what this tool does. For me, it was more showing like how much power you have as soon as you can parse queries. So the simplest version would give you like, basically you run Qtop on a DB. For simplicity sake, it does a reset in the beginning and then it's just like keeps showing you like the queries and it would kind of like listed by the highest average time. Again, if you really want to extend it top then it could like sort by 20 different things and you could reset and so on. Nice thing here is you can say PGQ top dash T post for example, if you have a post table and it would only give you the queries on a post table. You could also do dash S for statement type and you could do select or you could do delete. You kind of like see like what's happening with your application. So this is here, feel free to use it. It's really fresh. Don't use it on production necessarily. The only reason why you shouldn't use it in production is because it queries that statements every second which might be an issue on production systems. Not a problem on development obviously. Also the reset might be an issue. Right, so last but not least. So this is an idea I'm kind of exploring right now. It's experimental. So it might seem weird but there's some use cases for it. So basically let's say we want to know which queries slash query plans would be affected if we remove an index. And let's say we could of course run like copy the database and run it there. But let's say could we kind of like could we kind of help the developer make this decision quicker? And ideally could we like how would execution plans be affected if we upgrade Postgres version? This is a bit into the future to make this actually work but these would be really interesting to solve, right? Because a lot of people have regressions when they update Postgres because the plans, like for some weird reason 9.3 broke some people's plans and then maybe it's fixed, maybe it's no fixed, like what do you want to know before you upgrade? And so the basic idea here was to, instead of like actually doing all of this, basically kind of faking the data. So using that statements data to know what's actually running on your system, right? And using a table statistics to know what the table looks like and kind of using that but no actual data, right? So you not actually have access to the database. And so testing started for this would be you have production database, you kind of dump the schema and the table level statistics push this into a local test database which looks like production but has no data and then you want to run explain on it. And so we have again the same issue of use that statements data, you want to have it parsable but we fixed this before. The problem now is that if you use a dollar one it's not explainable, right? So technically if you had an empty database with correct statistics and Postgres wouldn't actually check the data then you could actually do an explain here. But a problem is that without the actual value you can't actually explain a query. So one way of doing this is you would replace the dollar one for null but the problem with that is that it generates a really different plan. Data type is different and so on. There's some hacks around basically wrapping it and so what actually the foreign data wrapper does internally also is you basically use null but you cast null to that data type that you need. And then you put it in a select statement because that kind of tells the planner to assume like basically not optimize the value. And so there's some stuff around like finding out the type. If it's the good thing is if it's like a parameter reference you can actually use the pgprepared statements view to find out what Postgres thinks the data type should be which you can then use to like insert it. And so in this case we would kind of like be able to explain a query which was originally a pg.statements query assuming you have the structure copied. And then you actually can kind of try to replicate the production plans, right? So again there's some issues here because or let me go here. There's a big issue of this right now which is that a planner actually reads the physical size whilst planning. So ideally it would only rely on the catalog tables which you can kind of insert into. But the problem is it actually goes to the physical stuff. Ideally we would like kind of patch that to not have this issue. What kind of happens then is that the costs are like always too low or it might choose a plan like it might actually choose the sequential scan because it thinks the table is tiny. But again, but the idea here is you can explain stat statements data, right? All right, so this is something I'm working on. It's all up there. You can actually run it like it has some unit tests and it kind of works. But what I'd be interested in if somebody, like if some of you have tried this, like what you currently do, if you wanna upgrade Postgres, how you approach that, if this would help you, if this is totally useless, like basically kind of feedback if it's worth investing more time into this. All right, so these are kind of the three big topics. So we still have 20 minutes, right? Yeah. So are there like any big questions to what came up so far? Yeah? No, the fingerprint is just an MD5 sum of the hash, basically. So like there's no, like the fingerprint doesn't, like even if it's really subtle difference in the query, then it would do a totally different fingerprint. But potentially this could be an application, right? So I think the nice thing is again, this is something we could easily build, yeah. Sure. So basically, so it depends on if you're on 9.4 or earlier. If you are 9.4, it's a little bit easier. So you would use the query ID to know which query is which, right? So you kind of know like 10 minutes ago I saw the data and now I saw the same data, like I cleared the database again. The same query ID is still there, but the calls are now 20 instead of 10, right? So I know that the 10 calls happened. That kind of works. There is an issue in 9.4 if you wanna track utility statements because utility statements don't have a query ID. They actually have a query ID zero. So you still kind of need to do a workaround. The workaround we do for earlier PostgreSQL versions is to use the normalized query and basically assume that the query string is identical, right? So the query string has been unique. There's an issue with that if you, was that like if you have a database, like if you have a staging database where you drop the data and then you reinsert the data and you look at it again, then you would have this and you don't reset that statements. You actually have two rows with the same query string, right? And this is something to watch out for if you have monitor staging databases because in that case you kind of need to be smart about saying, okay, so last time I saw five of those queries which with these calls, right? And then this time I saw five of these but one of them changed, right? And so this one that changed was actually the one where the query was running. If you need that actually, so I haven't added this yet to the PGQ top but I might as well rework that. So I'll probably open source this in the next couple of days if you guys want to use that. Any other questions? The query ID, I mean, yeah. The query ID, as far as I know, it stays stable unless you change the OID. So it takes the OID of the tables, it doesn't take the name of the table. So if you, again, like the staging issue, right? In that case, it would also kind of change the query ID. As far as I know, and I've only read the docs in this so I'm not sure if like, again, like we personally don't rely on this because it's a little bit unstable but if you do vol-based replication, it's actually the same also on the slave assuming nothing out like same major version of everything. If you do logical replication, it doesn't necessarily have to be, right? But again, so you might not want to rely on that data. Like if you really look at like analyze the data, you'd probably want to look at the query string and not on a query ID. But for like understanding which queries were running in the last 10 minutes, that's where the query ID is really useful. That's a good question. So what I've been playing around with, because so last time I gave this talk, some people had the feedback that they have like, they were heavily rely on functions. And the issue that they experienced was basically so queries were running inside the functions and they wanted to understand like kind of, you know, like which functions were running those queries. Like your issue is a bit different but I feel it's kind of related. Like I'm not actually sure how the jumbling works for the function parameters, but like I'm not sure. I would have to read the code. But right now I'm actually, I'm kind of experimenting with that. Like how can we make this better for functions? Also including parsing PL, PGSQL. So kind of parsing the function definition and being able to say these are the queries that happen in there. And then kind of comparing that with that statements data. But yeah, let me actually, so I have a couple more slides just in like ideas, like useful things you might use. So if you wanna use set statements with a monitoring user, right? So let's say you don't wanna use your production, delete everything kind of user, which is something that we see a lot as a hosted solution. You can actually use a security definer method. So basically as a super user, because the problem with set statements is if you, if you query the view and you are a certain user, which is not super user, you can only see your own queries, which is an issue if you have a system with multiple users. So what you might wanna do is using security definer method to create it by a super user to get the set statements data, and then kind of work with that. And then let's see, stuff that might be useful. If anybody actually hacks some Postgres and feels we should totally use this for Postgres, like the Outfunk C, like automatically generating Outfunk C, like I've kind of have this working. So I'm not sure if there's like any way to even get this merged at some point in, I don't know, Python, Perl, but I feel it would be really useful to not have this like handwritten and also add JSON support. Like to me, we should like, we have JSON now as an essential feature of Postgres. So. Mm-hmm. Which does exactly this, it's called D-Parse. But isn't it, it doesn't work with DML as far as I know at this point, it's called DEL, but it will, and it actually outputs JSON. Interesting. Okay. Awesome. So it's a work in progress, but it's kind of everything. It's related to this. Yeah. And it's, the bulk of it is being committed for 9.5. Yeah. And then it'll be expanded as time goes on. You'll definitely want to look at that. No, no, I've actually seen the Fred, but I, but D-Parsing, so correct me if I'm wrong, but D-Parsing is turning a parse tree into query, right? Yeah, but first, there's an image step where it turns it into JSON. Ah, okay, excellent. So there's like a parse tree to JSON kind of thing. Excellent. Yes. Perfect. Perfect. So you definitely want to take a look at that. Yeah, quick point. For sure. Some more little things. Yeah, consider a way to get a parse tree more easily. Last time I talked to people about this, I don't think we'll ever get this to be honest in Postgres Core. You can actually do this as an extension though. This is actually what we did before Preject Query is we would use an extension to call raw parser, although I'm not sure if you're actually supposed to do that, don't think so. Anyhow, that's something. And then one thing we should really consider doing is replacing the question mark with a dollar zero or something, right? Like basically something that's parsable just because question mark is an operator and it's really bad that it's used as a replacement character. Again, this is Postgres, fixing Postgres stuff. And then something that I've kind of encountered and again if somebody has ideas on this. It's a problem that I've seen is you have a lot of stat statements data, you want to know where the queries were coming from, right? And so the PGSQL case is one of those, right? You want to know which functions call those queries. Another case would be, I don't know, I use Rails as a web framework and Rails knows which queries it's calling, right? So is there a way to kind of blame or annotate and kind of see this query has been called by these parts of the application that many times. Something I haven't exploded at all but I'd be really curious if anybody has seen anything that's useful there. Cool, so tools and libraries are available here. We still have a bit of time. If you have more questions, we can take them. But I think that's the gist of the talk, yeah. Sure. So it seems like there's some overhead here for collecting all this information through PGSAT statements. I was just wondering if there was any talk about extending that to include things like what's actually, the sessions are waiting for inside the database, not just through statistics, how much I owe stuff. And I'm coming from the Oracle background and I'm thinking of things like in the Oracle where they have wait interface and they classify the different types of wait. So it's more, what's going on inside the internals of the database engine and what's waiting on getting given and how long it's waited for that. Yeah. And I don't know if this is the right place to do it. I know Enterprise DB has something that they kind of have developed. I think it's an add-on thing from that called DRID, I was wondering if there's any, anybody know anything? I personally don't know, but maybe somebody else knows. Yeah, I checked the courage of waiting in locks, but who specifically locks their waiting on? Would you add some sense of the... Yeah, I think overhead is always a concern, right? Like even around stat statements, like there was talk about adding histograms to it and there just wasn't any consensus reached in the mailing list because of overhead, because which bucket sizes do you choose these kind of things, right? So I think it's just really tricky to get anything, like it's not production capable. And reset it five seconds before that happened, how do I figure out what happened? But you mean, wait, it locked up because of locking issues or? I'm saying something took a long time and I don't know why. Yeah. I have stat statements running, but the last time it was reset was a while ago. So this specific thing that caused this may not be as much of an outlier as needed to show up at the top of the list. Basically it would still, I mean you would still have the data, right? So if assuming you kind of are consistent and you look at it afterwards, data would still be correct. But you're right that an outlier, especially like the way how the statistics work currently, it's really difficult to know if like there was one query which took like, I don't know, 60 seconds, right? To my understanding, the best way of approaching that would probably be to log, like use the other functions, which are not stat statements, which locks low queries. That's an extension that's now so that you can actually see how things work. What's the name of that? That's what I was trying, I'm trying to do everything that, something that I think I would add to your hand. Okay, if you can send it to me and I add it to the slides or something, then people can have it. Any other questions, thoughts? All right, cool. Thank you.