 Hi, I'm going to talk about a practical post-crust protocol primer. I plan to pack as many pieces as possible into the primary part of this presentation, but I require way too much practice and preparation, so I'm going to stop. So my name is Will Linewerber. If you have any questions about this or miss any of the links to the talk, my Twitter handles Linewerber. I work at a company called Citus Data. It's a very little bit about that because they sent me over here. It's an open-source extension to post-crust that takes care of horizontally sharding so you can use many, many machines to run queries in parallel and go beyond the means of a single post-crust. This talk has not much at all to do about Citus except that since Citus is just post-crust, you could use the protocol with it, I suppose, and they talk to each other over the post-crust protocol. Also, being a post-crust conference, this is something that I only learned after I ran a five- to six-hour data migration in a transaction, and then I typoed one of the table names and then it blew away the entire transaction, and had I known about this, it would have saved me because what this does, if you set this in your P-SQL RC, it makes a save point every single time, and so if you have any typos, you don't lose the whole transaction. When you set it to interactive mode rather than just on or off, it only kicks in if you're using P-SQL manually. If you're piping in a script, it doesn't change that. But so this, I've been putting this in the front of my post-crust slide talks because more people should know about it, it's hard to find. So set on a rollback interactive. Also, this is straight from my P-SQL RC file except I use a little bit more colorful language in the comment that I didn't include here. Okay, so the motivation of this talk. So I got to know the post-crust protocol through writing a driver. Originally, I linked against the LIPIC queue, but then for a couple reasons, I needed to implement the protocol myself, mainly to get all async I.O. And I got a really, by doing that, I got a better understanding of post-crust, and I want to share some of that with you. You kind of get insight into why things are the way they are for certain things and why some things are possible and why other things are a little bit harder to do. And also, the protocol can do a lot, but you might not need all of it. And so if you go to the documentation, it's going to tell you everything that is possible, but you may not need all of that. And so in this talk, I'm going to show some of the actual code that I used and that way hopefully is a supplemental guide to the documentation to give a different perspective. And so what we're going to do is we're going to connect to post-crust and authenticate, and then we're going to send it a query, read the response, and then eventually disconnect. And so I hope by doing it in the order that you would use it, it sort of hopefully tells a little bit of a story as much as talking about the post-crust protocol can be a story. All of the code here, if you want to look at it, is from CrystalPG, my GitHub name is Will. The protocol part of it is only about 770 lines of code. And so if any of this interests you, hopefully that's an approachable amount of code that you can, with the sort of touch points that I show, you can hopefully find them and then start piecing together like sort of the missing parts. And it's not so much that hopefully it's not too intimidating, less than 1,000 lines, like I think it's pretty comprehensible. And then also this is all going to be about version three of the protocol, which was introduced, what is it, 2003-14? Yeah, 14 years ago. Post-crust 7.4 is also the one that added auto vacuum. I was looking through the release notes of other notable things. And so it's been on the same post-crust version for quite a long time. You can see here some of the new things that protocol three added. And since then, it's just been protocol three for that entire time. All right, let's get to it. So the first phase is the startup phase. A little bit of terminology. The client, your program is called the front end. And when you connect, what happens is Postmaster gets that connection. It forks off a new back end at the operating system and then passes the connection to that back end. And then that's only when authentication authorization is done and so on. And when you disconnect, post-crust drops that process. This is also why, as I'm sure many of you know, you're not supposed to kill dash nine a post-crust back end because they have shared memory to communicate between these different back ends when they need to and with the Postmaster. And if there's any unclean shutdown, it has to assume that all that shared memory is corrupted and we have to kill all of the back ends and restart them and we have to start from scratch. And that's why also if you look, there's going to be all these different back ends. All right, so how do we actually get the startup packet together? We're going to have a number of key value pairs. We're going to have the user that you're, the role or the user that you're trying to connect with, the database name. And then you can have a bunch of different key value settings. For example, application name, the client encoding, you know, any sort of the so-called GUX GUC, which is the internal name for all of the pset and set commands, you can list as many of these as you want and these all come along with the startup packet. And then the first thing you need to calculate is the total length because the protocol is going to want to know the size of the message that it's about to receive. And so how do we actually send it? We're going to send the length, which includes itself, so that in 32 of its own length, it's that plus eight. And the one is for the null at the end. And then we're going to send the version number, which is three, but you see that that is actually a number much larger than three. What they're doing is the high order bits are going to be three and the low order bits is the dot zero. And so if there ever was like a three dot one, which there hasn't been, you would put that in the, it'd be like, you know, three, zero, zero, zero, one and so on. And then we send all of the arguments, the key value pairs separated with a null byte because it's a C string, one last null, and then we send it. And then we're off to the races. We've told Parchus we want to connect. And then, but what we haven't done is any sort of security. And so servers may or may not actually support open SSL. They might not be compiled with it. And so what you have to do before you do that startup packet, you have to ask somehow, do you support SSL? And the way you do that is you send, you pretend to be the protocol version, 80 million, 877,103. And so that number's a little bit funny, but what it is, if you split it apart into hex and then look at it, it's one, two, three, four, five, six, seven, nine, the eight's gone. When I was putting this, when I built this, I didn't know why the eight was missing, but in putting this talk together, I figured out why. And you'll find later. And so what you do here is you send the startup backup like normal, except you just send this protocol version and then, you know, make sure you send it. And then what you're gonna get from the server is either a character S saying that it supports SSL or an N saying that the answer's no. It can also send an error response. I haven't actually seen that, so I don't have any examples of what the errors could be. For my purposes, if anything goes wrong, I just say, hey, everything's all wrong, let's bail out. And now it's your responsibility as a client what to do with that information. You can, if they said yes and you want to do SSL, you can go ahead and negotiate, you know, do the SSL handshake and so on. If it supports it, and for some reason you don't want to use it, you can just continue on as normal, or if it doesn't support it and you know, you're saying I require SSL, then you can abort the connection. But all of those sort of things you've seen in Postgres, the SSL mode, prefer, verify CA, require and so on, that's all instructions to the client and the server doesn't get any of that information. You're doing that all on your end of how to proceed once you know whether or not the server has SSL support. And that's all it for the startup packet. Everything else from here on out uses this format, which is a little bit different than the previous one. It's first you send a character which describes what type of message that this is going to be, then the length, which again includes itself and then the rest of the body, whatever it is. And the same message format goes both ways. The messages that you send to the server as a client and the message that the client sends to you as a server. They all follow this format. The other point here is that because the length is only in N32, you can only have queries about the size on the order of a gigabyte. You can't have multi-gigabyte long queries because they'll be in the thing. Although I must say if you're sending individual queries that are that big, maybe something's wrong. Okay, so the next step is authorization authentication. Just as an example here to show you what one of these packets looks like. I'm gonna show this one more time. The rest of the time I'm just gonna do in code. But the message type four to say that I want to, or that the server sends us saying we need you to authenticate, it sends the character R and then the length eight because it's the N32 again and then it's one more number saying what authentication scheme we're gonna use. And there's, on the order of 10 different ways that Postgres allows, if it's a zero, it just means everything's okay. Either you didn't have to authenticate if the user was in trust mode in PGHBA, or you get okay once you've gone through the authentication through some of these other means. Type number three is clear text. I specifically don't support this, which I'll explain why in the next slide. And then the most common one is the MD5 password where it will send a salt and then you do like an MD5-based challenge and then send the answer back. And then there's several other ones that are complicated to support, so I haven't supported them myself, so I'm gonna pretend like they don't exist. And so the reason I don't support clear text is because there's this possibility for this downgrade attack that you can have. And so real quickly the way that this works, like let's say you have a man in the middle attacker who's somehow on the line, maybe no SSL, who knows. So the client sends a sort of packet, the attacker just forwards it to the server. The server says, hey, I want you to authenticate with MD5. The attacker gets that, rewrites it, and it says, hey, I want you to authenticate as clear text. The client sends a secret password Hunter2 and then the attacker goes through the MD5 thing because it has the salt and then goes it and then authenticates. And now the attacker has gotten the clear text password and can authenticate as many times as they want. If your client instead doesn't support clear text at all, it'll say use the, when it says use clear text, it'll say, I can't do that and not divulge the password. Is this really a likely scenario? Like probably not, probably if you're not using SSL, you're inside your own network and so on and so forth. But I chose not to support it just because it's a little bit more security and I thought that was a good trade off. And also, I don't really think there's many servers that would support only clear text and not MD5 these days. And so I think dropping it makes a lot of sense. The MD5 thing, this is, you know, I could have made this prettier and expanded it out, but, you know, this actually was a little bit of a struggle to implement because it says the note here that keep in mind, the MD5 function returns as real as a hex string and that's true. But the way that Postgres expects this, in some cases it wants it as a string and in some cases it wants the raw bytes and figuring that out, which cases was which was very, I'd say, challenging. And so the way that works is you concatenate the password and the user together and take the MD5 of that as a string. And then you make a new MD5 and you give it that part but just the raw bytes. Instead of it being, you know, the character F for 16, it's actually just the byte 16. And then you do the same thing with the four byte salt that the server gave. And then you take the string of that and prepend it with MD5 and then that's what Postgres is looking for to authenticate. And I really wanted to get this, this actually this thing was when I was thinking about how to put this talk together, I thought code would be a better resource because there's a couple talks out there that go into the protocol, you know, years past and then there's the docs but there's nowhere where it really shows you how to do the MD5 thing and then the rest kind of followed from there. So one of the new exciting features in the next version of Postgres is Scram. And this is similar to MD5 in that you do some calculations on the client and send that back. It's part of a larger family of authentication mechanisms called Sassel. And this has been an, this is exciting that it finally landed because it's been in work since late 2002, 2013 with Hakey and I think Michael Pacier helped a lot. And I'm very excited for this because it's a little bit more secure than MD5 but on top of that, it paves the way for another thing which isn't in there yet but it'll be hopefully maybe in Postgres 11 which is called channel binding. And this will allow you to connect over SSL and prove that you're talking to who you want without having to go through the trouble of setting up a certificate authority or getting your Postgres SSL cert signed by a real authority. This provides the same sort of way to know that you're talking to who you think you're talking to and get, you know, without going through all of that, which is very nice. And so Scram being in there now, I'm very excited for the future in that. And this works the same way as the other ones but it's number 10 instead of three and five and I assume that's because it's in Postgres 10 and not just because that was the next available number but that's what I choose to believe. All right, so now that we've authenticated, we get authentication okay and we start, Postgres will unsolicitedly send a ton of these startup parameter statuses and this tells the client everything about how the server's configured. Interger date times can be on or off. I'd be surprised these days to see many Postgres installations with it off. That means that the dates are stored as a float instead of a string which means that you have very high precision around Postgres' epoch time and then lower precision as it goes out. For a long time now it's been integer so it's a uniform precision. My thing, I don't support the float so if I see it say off, I just bail out and say like hey, use a different language and a bunch of other things that you can make sure that your client is prepared to accept what Postgres is gonna tell in the future. And then you get one more frame of a backend key data which includes a secret which we're gonna have to hold on to and memorize if you ever want to do query cancellation and I'll show you how that happens later on in the talk and then finally after all of this we get ready for query and this includes one byte of a character that says what state the transaction is in if it's idle or if you're in a transaction or if you're in a boarded transaction state and this is very useful for if you wanted to write a program like PG Bouncer and you wanted to know what state all of your connections are in if to pay attention to this message and this will come at the beginning when you connect and then every single time that you're allowed to send another query. Okay, so that was a lot. So recap, the front end can ask for TLS, gets a response and then establish SSL TLS if you want to. Then we send a front end, we send a startup packet which includes the database name, the user that we want to connect with, the role and then the server will ask for authentication we respond if necessary and then you get all of the server parameters, you get the secret and then finally we're ready to query. Yeah. Oh, so that's the secret in the, yeah, the secret there which we're gonna use later on if you want to cancel a query. And so there's, yeah. Which, oh, right. Okay, so yeah, that's a very good point. So, yeah, thank you. When I show code of how I'm reading from the server, I have this broken out such that I've already read the first byte and so I know which type of message it is and then I've looked at the length and I've then scanned through the socket and made a slice or an array of bytes of all of the data that's there and then that's available inside the bytes in my initialized length. And so the length did come and in this case it's gonna be eight, four for the N32 for the length and four for the character. But I'm sort of omitting that step because it's the same every time. It's harder to generate the length and I'm gonna show that every time we're gonna send a message but when we receive a message, I'm gonna assume that that's already taken care of and the data's already there. Thanks. Yeah. What language is that? This is in a language called crystal which looks a lot like Ruby but it's completely statically typed and statically dispatched and it's like 100 times faster. It's really good. So, and if you should use it and if you need to use Postgres, I gotta get a driver for you. But the talk's not. All right, so there's two different ways that you can send queries. There's simple and extended. The simple one is very simple. You send the capital letter Q, you send the size of the query you're gonna send plus again the size and then one for an extra null byte at the end and you just send it the query and then you're going to get data back in and the pro is that you can have multiple statements in here. Semicolons, you can run as many statements as you want. The downside is that you can have many statements. And this is bad for two reasons. One, you're only gonna get the data from the last query and two, it makes SQL injections attack worse because then the attacker can make as many statements as they want. Only allowing one statement doesn't, of course, prevent injection attack. You can still get some bad things, but at least it, you know, it's a small, it makes the problem just a little bit less bad. And because of this, in my driver, I discourage its use. It's still there if people wanna do migrations and stuff, but I specifically don't read any of the data. Like when the query happens, I just consume all the data frames and then just return nothing just to make sure that people aren't using this one when the other one works. So that's all I'm gonna talk about the simple query. The extended query is sort of where the Postgres protocol gets most of its power. You breaks everything down into multiple steps which can be reused. This is where query parametrization happens is where the dollar sign, one dollar sign, two comes from. And there's two concepts here. Prepared statements, which you're probably familiar with because this gets exposed a lot. And portals, which you may not be familiar with because this is more of a lower level thing. And so a portal represents a ready to execute query or a query that has partially been executed. You can resume it. So the steps here, we're going to send a parse message with the query and now Postgres will parse it, make a plan and so on. Then we send a bind message with all the parameters that we want to slot in. After that we need to describe the portal and this is how we know what data types are gonna be returned and what their formats and OID, object IDs are. Then we say execute and then it'll start to execute the thing and then we say sync to make sure that everything is on page and we can start receiving the data. The other interesting thing about this is even though we're doing the parse step first and the bind step second, we don't have to wait for Postgres to send the parse complete message and the bind complete message back. You can pipeline all these steps and just send them as fast as you can generate them and then start receiving the data later, which is nice. So the parse step, we're gonna send the character p, we're gonna do some calculations for length. The next slot where I'm just sending a null, if you wanted to save this prepared statement and reuse it later, you can give it a name. If you don't give it a name, you just send a null byte. We're gonna say, let's use the unnamed prepared statement and every connection has the sort of default unnamed slot. Then we pipe in the query and then you can say how many parameters that you want to specify their type and so, or and so that's if you have the two if you're gonna send $1 and $1 and $2 and you know what type you want Postgres to parse that as, you can go ahead and tell Postgres what object ID that is, or if you just say zero and then for the array later on that just send nothing, then Postgres will do its normal inference of what type and this is a little bit easier to deal with and so that's what I'm showing here. Bind, this is pretty straightforward, I don't think we need, I think you can read and understand all of that. No, I'm gonna go into the parts. And so it's not like, I mean there's a lot of steps that you have to send but it's not actually that complicated. We're gonna do some crazy calculations here for the size because there is so many things that we need to send. The character for bind is B and this is sort of, yeah, I could go into this detail but it's not that necessary to understand it. The next thing is two nulls. The first one is for the destination portal, again because we used, we're gonna use the unnamed portal which is similar to the unnamed prepared statement and because we're using unnamed prepared statement we say null for that one. If you were gonna rebind an existing prepared statement to give it new arguments, the second one in there is where you would slot in the name of that statement that you gave it before. Then we're gonna say the number of parameters that we're trying to send format codes for. Zero is text and one is binary. For this, I'm mostly using text for everything because this is what people are typing in except for binary for byday. But if you wanted to be a little bit more efficient about it you could send the binary versions of these things. And then almost done, we send the number of, again the number of parameters and then we send their sizes. And so for each one we send its size and then the actual data. And so the way this looks is we have, for sending five parameters we send a five and then for parameter zero we send its size and then its message and then the next size and then its message and so on. So that way Postgres knows how much to expect for each one of these things. Finally, we can say the return types we want. And this one is sort of where I have an issue with the protocol. Right now you could specify that I want the first result to be binary and the second one to be text and the next one to be binary. When I put this together I assumed that all the proper drivers were using binary because it's so much more efficient. Especially if you take example like a timestamp. If you send the text representative, the timestamp and then every single time in your application having that parse out the timestamp into its internal representation. And then I see a lot of tables with two or three timestamps updated at, created at. And if I'm sending all this data it has to, Postgres has to expand it out into a text string and then I have to read that text string and put it back into an internal representation that's happening over and over and over again. I thought obviously all of the drivers out there are using this binary result format because it's so efficient, they're not. There's like one Java one, one other driver I saw and then this one that are using this and the reason why is a lot of work as I found out, the binary formats for these data types aren't documented. You have to read the C code, which I'm glad I did it because it is pretty fast and that's nice but it's a lot of work. What would be nice is if I could say to Postgres when I connect, hey I support binary for these types, everything else. You keep track of it and send it correctly. But it doesn't. The, this next part could help with it a little bit so the next part is I do describe portal and this tells me what data types I get. Instead of saying P there for the portal I could describe the statement as an earlier step after I've sent it and that would tell me what types to expect. The problem is though that introduces a synchronization point where every single time I'm gonna have to, instead of me sending like all of my data for the query and then getting the results, I have to send part of the data for the query, wait, get a result and then proceed and that round trip is, it's not the end of the world but it introduces more a synchronization point for every single query which is hard to accept. And so this, when this comes back now we know for this query how many columns we're gonna get what data types the results are gonna be and finally we can say execute and again you could tell at the portal that you wanted. This part here, the maximum rows is a little interesting. It's separate from the limit that you would say in SQL. This tells Postgres only send me, if you gave it a 10, only send me 10 rows and if there are more rows what'll happen is it'll suspend that portal and then you can resume it later to get the rest of the data. This is something that I don't, there's not many people who use this. I imagine the main use cases if you are on like a limited memory and you only, for some reason you can't stream the results and you only wanna have a couple and then get a couple more and then get a couple more. I think JDBC does expose this and you can use it there but I have not used that feature myself but that's an interesting little used but possible feature there. And then finally we sync and then once this happens we can start receiving data and also the sync if there is an implicit transaction it'll close it at that point. All right, reading data. So when I send that description of the, please describe me this portal, the portal description what you get back is this row description which contains a lot of information. For every single field in the result type you get the name of the column, you get the OID for the column and the table if that result comes from a table and a column, if it's computed one like one plus one those won't be there. The type which is the most important one to know how to deal with that data. The size, the modifier, this is for if it's a VARCAR 255 versus a VARCAR 20 or something like that, that's where the modifier comes in. And then the format is similar to before it's a 01 if it came back as binary or text. And then I wrap that all in into a class so we can use it. Yeah. Where did the, you said everything's here except the computed column. So the computer columns will be here but the second thing there, column OID and table OID those will be zero because it doesn't come from a table. If it is a column that came from a specific table that it'll tell you that. This is something that again like I don't really see exposed much. I have it here just because it comes in the wire but like client application, I don't know that many reasons why you would necessarily need it but perhaps some of you are more creative than me and can figure out something to do with that but the data comes in and that's where it comes in. Okay, so now once we got that, so we've sent the query, we know what data to start expecting so we can start slotting, put some buckets in to start collecting rows. Now we're gonna read all the rows. And so what happens here is we want to look at the first character for the row to see to make sure that it is a data row because it's not gonna tell us necessarily how many rows to look, we just have to keep looking until we don't get a capital D and then we're gonna read the row and this will pull it off the wire which I'll show you next and then keep looping until we get a D until we don't get a D anymore and then the next one should be command complete and then after that we'll get ready for query again. And so to read the data row, again this is the same sort of format, we get the length, but that's not really necessary because once that data row comes in, we know how many columns are gonna be and then for each column we can, it shows the size of each information in the column and so the size is redundant because it's just the number of columns times each one of the things and so you have to read it off the wire but I just drop it on the floor at that point. This slice UN8 question mark thing is really the most complicated crystal specific thing and all that says is I'm gonna make an array of bytes that may or may not be nil, so it may be an array of bytes or it may be just a null or nil and then since I give it the number of columns it premakes the array of that size so I don't have to do any array growing things it just allocates that in one chunk and then inside it populates that automatically and then the thing here is if the column size, so it's gonna be column size and then the data then the size and the data and so on, if it happens to be negative one that means that the result was null in SQL and so if it's negative one we return null otherwise we read the bytes off and then go forward and that is really all there is to it to read data from Postgres. This is a little bit complicated with crystal specific stuff so if there's any questions here I'm happy to go into more detail but does that make sense? Okay, good. All right so that really is all there is to connect, issue a query and read the data. There's a couple of the things that will pop up and surprise you such as these async frames and they can come at any point. We've already seen one of them, the parameter status when we first connected, those are all asynchronous frames that can happen later on. If something changes on the server and one of those parameters change states like if someone changes the date format to something else it'll send unsolicited, it could be in the middle of the data thing, it could be in the middle of when you're trying to send it a query, anytime this will come in and you'll get another one of those key value pairs at any time. That doesn't happen too frequently but what can happen is errors and so an error comes in, again we read off the length and the error response is it has a code of what type of error it was and then it's a number of key value pairs of the information on the error and then the description and so on and so forth and so there can be a lot of these fields and these are all the field codes and you've probably seen some of these in your application, you've probably seen severity, you've seen message detail hint, this is where those come from when you have a Postgres error but it has a lot of other things, the position, the internal position, the position is how PCQL can place that little carrot underneath where your query is. A whole bunch of information here that you can make the errors really robust and especially if you are writing application and you wanna distinguish from a parse query versus something else, all that information can be found in here. Notice responses follow that same exact format but they're more for just warnings instead of full blown errors but they are formatted the same way, they have all the same fields and they can come again at any time and then also these asynchronous things are put together for how listen notify works and so these come in with the process ID of the backend and then next it has the channel name that you're listening on in case you're listening to more than one channel on your backend or in your front end and then the actual just payload data and so unlike the other ones which have a whole bunch of fields, for listen notify, all you get is the channel name and then whatever the payload happens to be. All right, so let's put it all together and see how this comes down. So we connect and we'll get authentication saying that we wanna do MD5, we'll come down, we sent the answer, we got okay and then we start getting a ton of parameter statuses which you've seen before and so on, we get our backend key data with the secret and that's an example of what one of the secrets might be and then we're ready for query. Let's send it select one plus one as two and high as hello. We get our parse complete, we get our bind complete messages that came in, we get our row description of the two rows and we have yeah, the row description of the two rows, their OID types and so on and you can see neither of them have the, the table OID is set to zero because none of these are coming from a table, that's the example from before and then we get command complete and that's where select one comes in and the number of rows is where you've seen, this is what PC will just prints when you're done, if you updated 20 columns, command complete in there would say update 20 and then we're ready for query again and we can send it and then you can print out a nice little table. If you select something that doesn't exist, you immediately get an error response and here's all the different error fields. So on position eight inside the query, you also get what file caused the exception on the server end so this has been parse relation.c if you were curious of where exactly in Postgres it was telling you you have an error and what line that was on and that's pretty cool and then we're ready for query again. There's just a handful of other small things, talked a little bit about query canceling, this is where you have the secret and what happens here is you actually make a brand new connection so the query could be on and so you can't talk to that back in anymore because it's doing something, you do another connection and you send it with a protocol version 80,887,102 which is very close to the SSL magic number and so this is one, two, three, four, five, six, seven, eight and so this must have come first and then they're like oh we need SSL so we'll just increment it by one. So that's where that comes in and so you send the startup packet and then you send the process you wanna kill and then the secret. This is, so you have the secret so you prove that you can kill that because again this is before any authentication and so if it didn't have that secret thing you could just run up to anyone's Postgres and start killing the back ends. You want to probably negotiate SSL first and then send this packet because if not someone could grab that and then just replay it over and over again and keep killing queries on your other backend until that backend's cycled and got a new PID and secret but you don't have to do SSL because you can just send this packet anything because again SSL is done as a separate step. There's copy mode. This is what PG backups uses and other things. If you issue a SQL command that enters copy mode you're gonna get a message saying if it's copy in it's a G, if it's copy out it's a H. Otherwise and then you get the length, the overall format, if it's binary or text, the number of columns and then if you're doing columns the formats and then you just start sending data packets. You're receiving data packets and unlike the previous one where it was overall size, size of the column, the data size of the column, it's data size of the column data, this is just all of the bytes just smashed up next to each other and then Postgres or you as a client have to take care of parsing that out and this is how copy can be so much more efficient because there's not all that framing data about expect this expect this. One it's a lot smaller on size and also you don't have to do as much processing. The downside of course is that you have to know exactly how this is going to line up. Another interesting little side bit here is the server will always send one row at a time in separate data packets but that's not necessary as a client. You can stop these data messages and pick them up at any point, even in between types, you can just send as many or as few of these as you want and Postgres will be set together but when it's sending you data to the client, it's a little bit nicer about it and does it in row chunks. I talked a little bit about my little bit of my rant on the binary format. The strings and numbers and stuff, those are very easy and straightforward which you'd expect. Numbers are always sent in network order so you're gonna have to swap them on most architectures but that's fine. The date and time that's microsecond since midnight January 1st, 2000. It's not 1970 as you might expect and I believe the reason they did that is because originally it was stored as floats and they put it a little bit in the future and so the precision around 1990s to 2010 would be higher and then it'd get fuzzier as it got out but then I forget exactly what version switched to default B by integers and so that's not really a problem anymore so I think that was a good trade off but it's an interesting why it's 2007, 1970. Numeric was a very difficult one to figure out how it was done. I encourage you, if you're interested at all, to go to the project and look at this. You have the end digits which is a number of, not digits in the thing but the number of int 16s at the end, a thing called a weight. The sign but not that kind of sign, I typed it wrong. The sign positive or negative or zero for not a number. Yeah, it's not doing like a sign or sort of thing. D scale is just where the decimal point will end up and then an array of int 16s but it's not packing it in to an int 16 representation. It's only using zero through 1000 or zero through 9999 and then it just pieces those digits together and then you have to make sense of that on your own. A couple more resources, the docs are pretty good. There's a presentation in 2014 that I relied on heavily when I was writing the driver and I tried to make this be complimentary to that information there, the PDF is very useful. And then finally, one of my former colleagues wrote a project called Fimibi which is a Go program. So we've seen front end and back end so that me is middle end and it's sort of, you can plop it in the middle and start manipulating the query, the things and sort of like inspecting and doing some things. He used it to make some project where it automatically graph data and queries just by sitting in the middle and like looking at it. And so that's an interesting thing. And yeah, thank you very much. Yeah, the question is on the binary formats, what kind of performance? The, for some types, it's fine. Like strings are represented the same but especially for date times because of all that saving parsing on both ends. I don't have like actual numbers I can quote you but it's really huge. And then also that combined with the fact that all IO in Crystal is evented in Async. Like just can like, and oh, just a little bit more about Crystal just because I like it so much. It's LLVM does all the compiler. And so it's leaning on 10, 15 years of LLVM optimizations to pull everything together and do some really crazy optimizations. And so, yeah, it's just really, really fast. I don't know, I can't, I don't have actual benchmark numbers but combination of the binary formats, the Async IO and then just LLVM being an amazing project makes this like pretty, pretty cool. Yeah, yeah. So the first part of the driver was not too bad because I just linked against the PQ. The data types were, you know, the next big work and then doing the protocol part was maybe, maybe only two months of, you know, like nights and weekends kind of thing. You know, again, because it's only 770 lines of code like there's not that much to it. Although there's still a couple missing parts that I don't support, but like you can get, you can get pretty far pretty quickly because as you've seen, like, you know, there's a lot of maybe details but the overall protocol is not that complicated. It's mostly character for the message type, the length and then some data. And, you know, there's a couple of gotchas in there that I hope this may help you avoid if you're gonna do this. But yeah, but it's a pretty straightforward thing. No, yeah, sorry, but I'm not, I just, I know that I've seen someone mention that. I haven't used it myself, so I can't. I don't know anything about the JDBC too much, but sorry. Thank you. Yeah, no, I've only used Postgres for a long time. I don't know, but yeah. All right, yeah, thank you very much.