 Yo, hey, yo, hey, yo, yo. Pack the chrome styles, fly like Mrs. Jones. Lyrical mathematics will have the devil's smoke and stones. I put heads to bed, lick shots, and rap with a friend with a church a few times. Now I'm not trying to get it. You welcome to see a dead cat. Just leave it, it's not your problem, right? OK, all right, database networking. So quickly before we jump into this stuff, administrator things. So I sent everyone on Monday that has an approved database system for Project 2. I sent you a sign-up link. And I realized that looks kind of sketchy. Here's this URL that I told you to click on it. But all it is basically is a cryptic key with your user name and your email address. You sign up for an account. And then I should post something past that too. Once you're registered, if you go to your system, there'll be a link at the top that clicks Edit. And then you'll see the form you can fill out, right? It's not Wikipedia where it's free form text. There is actually, again, a taxonomy which you're allowed to select for certain things. And again, if there's an option that doesn't exist that you want, please let me know when we can fix it. And for Project 3, again, the status update will be in two weeks on April 3. And then the final presentation is at this time here. I don't know where the room is yet. So are there any questions about Project 2 or Project 3 at this point? If you want to discuss Project 3 or even Project 2 with me, please let me know. I know somebody already reached out to Google. They have a call with them this Friday. Impressive. But if other people want to make introductions to companies, let me know. It already works at Howie V. So you already know everything. Yeah. OK. All right. So here's where we're at in the semester. So this is a high level overview of the system we've been conceptually building. But we've been starting at the bottom going to the top. And the topics we have covered, time. The planner stuff we'll talk about next week. To get sushi and engine stuff, we've discussed all this. And the storage layer, we discussed this as well. Check boxes. There we go. So today, we're now going back to the top of the system. And we're going to talk about the networking layer. And we're focusing on the communication between the client. We'll talk a little bit about the distributed system. We have one node talked to another node. We even have the storage manager talk to a remote storage service, like the object store. We'll talk a little bit about that. But we're really focused on the client sending the queries to the database server. And then the next step, starting next week, we'll then come into this. How do we take that SQL query and convert it into a logical plan, a physical plan, and execute it? OK. All right. So as I said, today's discussion is focused on the networking layer. And we're going to start off talking about the way the client's going to interact with the database system. In a distributed system where the nodes need to communicate with each other, there isn't actually that much literature about what goes on there. Often has Google protocol offers are used, or GRPC. We'll talk a little bit about that. But there isn't how to say this. In that world, because the database system itself controls what it's talking to. It's not to the client, to the server. It's a server to server. They can be very optimized and rewrite things and swap things out as needed. Whereas the client, we're sort of constrained with how legacy applications or legacy libraries are going to interact with the database server. So we're mostly focused on that. And then we'll talk about what this protocol looks like. And then we'll finish up with some optimization methods that the paper you guys read talks a little bit about, like, RDMA stuff. But we'll go into more detail what this actually looks like. And this is sort of the cutting-edge stuff. This is the stuff that's happening now in Linux, like in the last two or three years. The RDMA has been around for a while, but other things. So I don't think I've actually given any demos this semester, because it's hard to give for like, compilation is hard to show demos of this and other things. But in the intro class, when we give demos, we've always shown, like, with the black screen and the terminal client, P-SQL with the MySQL terminal, I type SQL queries in my hand, I hit Enter, and I get back some result, printed into the terminal. And so in all of those examples, the results were getting back as in plain text. Like, it's literally like invoking two string on the packets that are coming back and it's displayed in the terminal. But in a real application, you couldn't write programs like that, because it'd be horribly inefficient to have to parse the text and figure out what the type it actually should be. So today, we're really talking about these database system protocols or wire protocols where you're actually out transmitting data in binary form for some to go into an application to programmatically operate on, and not a human actually looking at through a terminal interface. So there's basically three approaches to do this. But there's three approaches. There's three categories of libraries, APIs, that people are going to use. The first is going to be sort of direct access that is database-specific. And this will be like a proprietary API that the vendor provides in, typically, a C library. And you could, but you shouldn't, or wouldn't want to build your application using one of these direct access APIs, because they're really meant for people building drivers for different languages. The API is very specific to that different database system. So here's the documentation for the Postgres PQ library, the CE API, and then here's the one for MySQL. And you can kind of see it a little bit here. Everything's very Postgres-specific. You couldn't write this for Postgres and immediately port it to make it work for MySQL if you're using the API. If you want to know what these actually look like, you can actually ask chatGPT to write your C program. And it actually works. I didn't actually try to compile it, but it looked correct. But again, you can see things again. It's invoking the libPGQ library for all these things. So these exist. Every vendor is going to have their own thing. And again, it's typically for C. What I want to talk about instead, and some more time going on, is ODBC and JDBC, because this is basically how most people are going to implement applications today. Now, if you're writing, like, in Python or other programming languages, oftentimes, they'll have native implementations that speak the wire protocol of the database system in that specific language, or they invoke down into the Direct Access API that's written in C. So again, the idea of these libraries is that it's a standard programming interface so that if I write my program using JDBC for my SQL, I can easily switch it to Postgres without having to change any of the API calls. That way that I couldn't do it with Direct Access. Right, as he points out, he had to rewrite the SQL because, as we said many times, the SQL dialect often is not going to be the same. And there's even another higher abstraction layer you could have above this. If you use an ORM, Object Relational Mapper. If you use Django, Ruby on Rails, SQLize for Node.js. These things extract away the SQL and the databases entirely because you write through it like an object interface. But then again, underneath the covers, they're eventually going to come down and call one of these guys because something has to talk to the database server. All right, so the first standard API for talking to databases was this thing called ODBC. And again, prior to this, everyone had their own proprietary programming libraries. And so there was an attempt in the 1980s to try to build out a standard API because people recognized this problem that the applications were very, the database applications were not portable because everyone was writing to the Direct API. But there was something out of Sybase was pushing. It's called a DB library. It's something real generic. That didn't go anywhere. Microsoft realized that this is a problem. And they formed a consortium with this other company called Simi Technologies and started this ODBC connection, the ODBC library. And now pretty much today, every single major relational DB system, even the NoSQL guys, they're all going to have some version of ODBC that they'll support. Even like NoSQL systems like Mongo who don't support SQL, but they'll support the ODBC API because instead of making SQL calls, you make their JavaScript query calls. So again, it basically looks like this. So ODBC is going to use a programming model called the device driver model where the idea is that the vendor provides the driver that's going to have all the logic to convert whatever commands that you make in the application and then speak to the database system. Like if you buy a piece of hardware and you get a device driver for your kernel, that knows how to allow the kernel to talk to the hardware. So it's sort of the same thing. So the application is going to write code against this standard ODBC API. And then the ODBC driver is going to then convert those commands right to the library into requests that go over the network to the database system. Data cement runs the query, runs whatever you wanted to do. You get back a result. And then this thing marshals it back into the correct format. So things like if the ODBC API might specify things like 32-bit integers, but your database system stores everything to 64-bit integers, the ODBC driver is responsible for converting that back into to conform to the API. The standard ODBC format is going to be all in C C++ types. So it's not that far fetched because most systems are going to be implemented in C C++. Less so back then in the 1990s was certainly the case. And most applications in the Windows world in the early 1990s were written in C in C++. Now things are much different, but back then it was that was the hot language. That was the language that everyone used. So what's interesting about this approach is that the driver can emulate some features that the API specifies on the driver's side, the client's side, but the server may not actually implement itself like cursors. So the idea of a cursor is just to be like you send a request to open a cursor on the database system, and it sort of spools up the results and you sort of get next and get things in batches. If your system doesn't support that, then you just get back the entire query the driver can hold all the results and then emulate the cursory API. So it's kind of interesting what you can do with these things. So the part we care about this class is this piece right here that's back and forth. And I'm going to refer to this as the wire protocol or the network protocol of the database system. So again, ODBC was early 1990s from Microsoft. And then Sun Microsystems in the 1990s when they came out with Java, they recognized that they needed a way for all these Java applications to interact with database systems. So they proposed this thing called the JDBC, Java Connection, Java Database Connectivity Libraries. And this is the way to basically like ODBC, but instead of written in C or C++, it's for Java applications. So back in the day when Java first came out in the mid-90s, this was the rust of the 90s. And this was the hot thing. I mean, serious, right? I mean, also, every five, 10 years there's another hot programming language, right? It was good. It sounds like quality show off somewhere. OK. Go was the hot thing maybe 10 years ago because Google invented it. Rust is the hot thing now. In the 1990s, it was Java, right? 90s, early 90s, late 80s, early 90s, yeah. Object-oriented programming languages were the hot thing in the late 80s, early 90s. And then there were object-oriented databases. That was a big thing where people said, oh, you don't want to write SQL. You want to write object-oriented programs that talk directly through the database system, right? They're basically the same thing as document JSON XML databases now. Yeah. But you ever heard of Versaun? O2? No, right? These are the hot database systems that didn't go anywhere because the relational model crushed them, rightfully so. All right. So again, back then, Java came out. Everyone's like, this is groundbreaking. Everyone wants to use this. The idea of you're going to write code once, and it runs the JVM anywhere. That was a big deal. And so, but in the very beginning, there wasn't going to be all these implementations for these different database systems that come up to the database using JDBC. So they had to deal with a way to bridge into or use ODBC as a stop-gap solution to get started. And then eventually, people would start implementing the native Java implementations, right? So there's actually four ways to do this in JDBC. The first one is, as I was saying, where you don't have a native Java implementation, but you have one at ODBC. So you have a little piece of code that can convert the JDBC calls into ODBC calls. And let ODBC talk to the database system. The next approach would be you would have, and this would be like literally transforming the calls into the ODBC calls. The next one, we'd have a, using something like JNI, where you could have the JDBC call into, well, sorry. This one is like using JNI to talk to the ODBC. This is using JNI to call the C library of the database system. This one would be you have a whole separate, like, middleware, like a separate process running that just knows how to take in JDBC requests and then uses like ODBC or whatever the native API then talk to the database system. There's sort of an extra hop to do that conversion. And the last one is the best case scenario where the protocol itself is written in entirely in Java. And again, most of the systems, like most of the major systems, when you download the jar file for the driver, it's going to be a native implementation in Java of the wire protocol. This is the fastest because it's going to be less copying between the different layers. So this top one was deprecated and removed in 2014. This is the best approach, again, not just for Java, but I say, like, if you're doing this in Python and in other languages, this would be the best approach as well, right? So from the database system perspective, doesn't know whether you're talking to it through ODBC or JDBC or whatever with a direct API. They have a standard wire protocol, and they're going to operate on that. So all the major database systems are going to have their own proprietary client wire protocol that's going to go over TCP-IP. If the database system is running on the same machine as the application, some systems support like Unix domain sockets. I don't know what the equivalent is in Windows. I'm sure they have something. And basically, instead of doing a loop back device of going down through TCP-IP on the OS coming back up, you can have a direct IPC through a socket on the box and do fast communication. Postgres will figure this out for you. I can run more quickly this way. I don't know of any database system that uses UDP connections for client communications. There's one system I know, Yellowbreak uses UDP for the nodes to talk to each other. But for the client talking to the server, everything's done over TCP-IP. And the paper you guys read, they think it was DB2 and Oracle send confirmation messages about when you get results back. Like, I got this, give me more. That's basically what TCP is doing under the covers, because you're getting all the apps that you got stuff. So some of the databases will have redundant messages over TCP-IP. I'm sure there's some reason why that's the case. Maybe it's from, again, from the 80s or 90s. I don't know. So the typical way people are going to interact with the clients and the data server is that the client's going to want to talk to the database system, so it connects to it, begins the authentication process. Sometimes they're running an SSL or TLS. I think I should have posted like there's a, someone did a survey of Postgres databases in the wild where only 20% of them are using SSL or TLS, which is surprising, because it's all by default. I would think it'd be higher. So we get through the authentication process, then we send a query. The database is going to execute the query and then it needs to serialize the results back, and then serialize the results into the return set and then send that back to the client over the wire protocol. So the part that we care about in this lecture today is this piece here. This is the part that is going to eat most of the time up and there's different design stations that are going to affect the performance of the system when we do this. And all the major database systems are going to have their own wire protocols that are not and so this again, this is what ODBC, JDBC is trying to solve that the API is the same, even though the wire protocol for each system is going to be different. And then the driver takes care of all the marshaling as needed. I'll say also too, sending the SQL request usually is not the most expensive part. The most expensive part, especially in the world we're talking about today is OLAP queries is obviously executing the query, but we've covered how to speed that up. And that's why we're focusing on this today. Most queries are a couple kilobytes. The most I've ever heard of like really large SQL queries was actually, it was Google in another place. They said they had 10 megabyte SQL queries, like the string of the SQL statement itself was 10 megabytes. And it's obviously not written by a human, it's someone clicking on a bunch of dashboards, different options and then the program generates the query and they run it. But it's usually like these massive end clauses. But again, there's not much you can do to optimize that, sending that over, other than just compressing it because it's just a string. So as I said, all major vendors are going to have their own proprietary protocol that does all this step here. The one thing that is new in the last 10 years or so in database systems is that a lot of these new systems are choosing not to implement their own wire protocol. And they're just using an existing one, right? So the MySQL protocol is probably the most common, then Postgres, I'll see if you wanna guess what the last one is. But the idea here is like rather than spending your engineering effort, again, if you're like an early stage startup or you're a research project like we were, instead of having to implement your own wire protocol, then go implement your own client drivers for whatever programming language that's out there, if you just implement the Postgres protocol, then you get the Postgres ecosystem for free, at least for client libraries, right? Now you get all the drivers are free for any programming language, a bunch of these tools that out there will connect to Postgres and do stuff, like you get all that for free if you implement the wire protocol. I mean, you have to implement some other stuff too, like the catalogs, obviously the SQL dialect and like any other additional functionality, but you can make your system smell like Postgres, then the first step is usually speaking the wire protocol. All right, so I said my SQL was probably the most common one, wire protocol that ever implements Postgres is probably the second most common, and I'm gonna take a guess what the third one is, and it's not in the paper you guys read, no guesses. SQLite? SQLite doesn't have a wire protocol, it's in process. No, it's not in the paper. Yeah, it's reddus. There's more logos here I forgot to put there, right? Right, so this is from DB2IO, there's an option you click on compatibility and I keep track of what systems are compatible with the Postgres, MySQL and other things. I probably should split up with the other SQL compatible, like the dialect or the wire protocol, we can do that later, but there's a bunch from MySQL, but a lot of these are gonna be, there are like forks of these systems, like Green Plum is a fork of Postgres, Vertica was a fork of Postgres, Redshift was a fork of Postgres, so these systems get the wire protocol for free because they're forks of it. Systems like Materialize, Umbra, Cockroach, Hyper, WeDis in our own system, and NoisePage and Pelton, like we really look at the spec and you re-implement the Postgres wire protocol from scratch. The reddest one is, I'm curious about this because I think it's like a REST API, it's pretty simple, it's like plain text API with like simple commands like get set and things like that. These are all like, except for Apache Geo, which came out of, I think VMware, these are all Chinese. I can't find like, oh, KDB's in Canada, but these are all like Chinese reddest clones, which I find is very, I don't know why. Anyway, so the, you know what I'm saying, like this, if you're gonna build a new system today, this is the right approach. And the Postgres wire protocol kind of sucks, it's very chatty, you have to do two modes, the text mode for the terminal and the binary mode for GDBC, ODBC, and it's pretty inefficient as you saw in the paper, but because you don't have to implement the drivers, I think starting with this is the right approach, right? And again, you see some systems will lop off the top half of Postgres, like YugoBike took the top half of Postgres, ripped up and throw away the bottom half, and that way they get the wire protocol for free through that. All right, so today we're gonna talk about, this focus on this paper. What I love about this paper is it's like the only paper that I thought of this problem like, you know, earlier when we were building our system, but they're the only ones that wrote it. There hasn't really been any deep follow-ups in this survey, like here's all this wire protocol, these different systems that are doing things differently, let's understand what's going on. And that's actually a paper I like to write where you see some functionality, some piece of a database system where everybody's doing something completely different. And there's not really a good justification of why they're making certain design decisions. And for someone to come and do a survey, here's all the all different approaches, here's the pros and cons of each of them. To me, that's a good contribution to the scientific community. And that's why I really like this paper. We have a similar one that we wrote that actually came out of 721 where we looked at how different systems were implementing NVCC. Everybody's doing something different. We just sort of measured them all, all of them in a single system and understand what's going on. And we published paper on it. So this paper came out of the CW guys in the Netherlands. So this was before DuckDB, they were working on a project called MoniDB Lite. And that's what these guys were working on at the time they wrote this paper. And then they went through that code away and then became DuckDB. So the guys that wrote this paper are the DuckDB authors, Hanus and the other dude. So the papers is focused on doing large data exports, which is not quite typical to sort of the applications we think about in database systems, right? We think about like, you run a query, even if it's OTP, you get a small amount of results. If it's OLAP, it's a large, maybe you scan a lot of data, but it just gets aggregated and results small. In their world, they were thinking about how people were actually using database systems for data science applications like pandas and other things. And where you basically wanna do a bulk export of the entire table, get it into your Python application and then let pandas or whatever do the computation on it. So that's the problem they were focused on and trying to figure out like, what is the overhead of doing bulk export of these different systems? If you go over the wire protocol. Like some of the cloud vendors will let you do exports of data into like a CSV that gets written into the object storage. If you already have parquet files as well, then you download those. This is really, we have data residing in a database system. How do you do that now quickly? So I would say also too, when we talk about these optimizations, anything that we're gonna come up with, like they're gonna list a bunch of things that you could do to make things better, but anything we say this is the right way to do it, this is the better way to do it. We also have to implement this in the client driver because the client driver basically has to undo whatever the server sends it. So even if we come up with a great compression scheme, if it's very expensive to decompress, the driver's gotta do that. And so we're also existing with existing drivers. So it's like, we can't just say, this is a magic new feature we have in our database system and everyone gets it, because if the driver doesn't support it, then no one can use it. Because typically when you connect to the database server, there's a handshake where you say, okay, I'm looking at this version, I have these features, I can handle these things. Like sort of like when you SSH into something, there's a handshake to say, here's the encryption of those I can support. So the database server can say, here's the features I can support. And then the driver can decide what it actually wants. All right, so we'll go through these one by one, but the rover's the columnist, so it has compression, data civilization, string handling. Let's say it has to do, what does this sound like? We're talking about networking, but this sounds like what? Storage on disk. It's basically the same problem, right? The disk, it's just another storage device, a hardware device that it's slow to get data on and off. And we have this trade-off deciding how much CPU overhead we want to spend on doing some kind of optimization versus the time it takes actually just to do the data transfer, right? It's the same thing we saw when we talked about reading ready data from the desk. All right, so the first question is, how are we gonna execute, how are we gonna, the client is gonna, is going to retrieve and process the data that it retrieves or receives from the database server. So in ODBC, at JDBC, these are inherently row-oriented APIs because they were developed or invented in the 1990s when column stores were a thing, I'm sorry, they existed exceptionally, but they weren't actually real systems other than side-based IQ, which came out later in the 90s. Like, there wasn't column store database systems. Everybody was writing these applications that were gonna process things row by row, like in the database system. So your application, of course, would process things row by row, right? Think of like, most of these applications at the time were sort of business-oriented or OTB applications, where again, it was not doing the hardcore analytics, machine learning stuff that we do today, right? So, and even if it was an analytical query, again, you're doing aggregation, you're getting back a smaller results. So having an interface like this, like this example is JDBC, right? Where I run a query, get a result set, and I iterate it row by row, and I can access all the columns within it and do something, like this is how does everyone row code. But in modern analytics software, where we wanna get the data out, put it into panels or data frames or whatever, we may not, going row by row is gonna be inefficient. And so they talk about how you could switch to a pure columnar API, again, this is not real JDBC code, this is like pseudocode, where you could iterate every single column, and then with each column do something in each row, or get the bulk of the data and put it in somewhere. But this would be inefficient as well because with this API, you won't be able to do anything on any of the rows, any of the columns until you process each column one by one. And so the solution is the same thing we saw when we talked about storage models. It's basically packs, where instead of having a pure column or pure row format, let's do a hybrid approach, they call it a vector oriented approach, where you get batches of tuples that may be laid out in columnar format, so you can export that data, put it into a matrix as a single column, as a stride or whatever, but you don't get all the column, you get some portion of it. So what'll come up, what I'll talk about in a second, this is what Apache Arrow does in their proprietary format, the ADBC, that's meant for transferring Arrow data, which is a packs layout, this is what they do now. So JDBC doesn't do this, but you can use Arrow's API to do network transfer like this. But of course, when the paper was written in 2017, the Arrow stuff wasn't out yet. The next design station is whether to do compression. And as the paper lays out, there's basically two approaches, just like what was when it was storage on disk. So the first one is to do what I'll call it again, naive compression where it's just some general purpose compression algorithm or compression library, Gzip, Snappy, Z-Standard, whatever. And before we package up the messages or the message of the results to put on the wire, we just, we can press the payload and then we wrap some header around that specific to the database system. So there's very, very few systems that actually support this. I forget whether in the paper they call out Oracle. I know they measure MySQL. Oracle has had this since 2013. It's called the Advanced Network Compression. And then MySQL does this, it's optional. You have to turn it on if you want it, but you can get Gzip compression for results. There's a patch from Postgres that somebody submitted, a Russian submitted in 2018 for our libpq. But as far as I can tell, that patch is dead. It didn't go anywhere. You can hack around this. Like you can have your database system tunnel traffic through SSH and SSH tunnel and then compress that traffic if you want. But that seems kind of flaky. I don't think anybody does that, right? I don't see why you'd want to do that because now it's an extra hop. The alternative to do, deny compression is just as before. Like there's sort of the very specific or a kilometer specific encoding. So this is dictionary encoding, RLE, bit packing, delts encoding, all those techniques we talked about before. Where the data system figures out, as looking at the result set, here's the correct, here's the best compression scheme for each individual column. And of course nobody does this because, again, you'd have to not only implement this in the server side to compress the data as it comes out, you'd also have to embed on the driver side to be able to decompress it. And that would be inefficient. You would assume that the, you know, you're assuming that the client's gonna have enough horsepower for this. I don't talk about this, I don't really talk about it in this lecture, but a very common setup now is with these serverless functions, these Lambda functions for interacting data servers. So that's what's that. That's like spinning up a little Python function that immediately connects to the data server, it gets some insult and then gets some process, does some processing and then goes away. And so in that environment, you don't want to be, you don't have a real beefy client side or the client's not very beefy. So you don't want to be spending a lot of cycles having to do heavy weight decompression. So, Paver says that again, the naive approach is gonna be best over this one because from an engineering perspective and also the cost of figuring out how to decompress things on the client side, this one is pretty straightforward to implement, so this is a better approach. And it's more useful when the network gets slower. There's this trade off between the computational cost of doing the decompression or the compression and the network transfer. So as the network gets slower, you're willing to pay more CPU overhead to reduce that transfer time. But as I said, only two systems that I know that do this. SQL Server doesn't do this, Teradata doesn't do this. I couldn't find any of my DB2. And both of these systems turn them off by default. All right, so now the question is, how do we actually encode our data? So in the paper, what I liked about, I'm not gonna go through it, but they went through every single database system, they said, here's what a packet looks like. They talked about how to handle the strings, how to encode the length of strings, how to encode null values and integers and so forth. So I just wanna go a little bit about at high level what these design stations are. But again, I like the paper that goes, I enjoyed it going into that much detail about the different systems. So the first approach is to do binary encoding. And this is where the database system is gonna encode the data that is sending back for the results of the tuple into a, to like a binary format, like something like related to the IEEE 750 or standard, like a C or C++ native type. You're encoding the data like that and you send it over. So the client's gonna have to handle ndns, like if the server is little ndn and it's packing the bits for integer in little ndn format, but then the client is in big ndn, it has the client's response for doing that conversion. Most systems, most of you are running on x86 or arm and those are little ndn, so like this is not really an issue, but it is something you'd have to check in the client driver. And so I would say also too, in this approach, the closer the wire protocol's binary format is to the native storage format of the database system, then it's less work to put data into packets, right? So we'll see you in a second if you're doing like for strings, depending on how you encode your strings, are they null terminated, or do you have the length followed by the bytes? If the client expects it to be in a certain way, sorry, if the network protocol expects the data to be a certain way and that's different than what the server actually natively stores this data as, then you have to do a copy and convert it and there's additional overhead here. So we had this problem when we were building Peloton, we would store things, I think we were storing strings with the length followed by the bytes, but then the wire protocol, I think one of null terminated strings, you have to do a copy and convert it, which is not cheap, or it becomes expensive. So the alternative approach is, well, the other thing I have to keep track of is like, how do you keep track of like what the metadata is of like, or specifying what the format is of the schema of the data you're sending back. So if you roll your own protocol, you're responsible, like you the data server is responsible in coding, so like I have this many columns and they're this type and this size, all that has to go in the header, you have to make sure everything is correct. And alternative approach is to use one of these existing libraries, like Google Protocol Buffers, Facebook Thrift, and then I think Flatbuffers is the newer version of Protocol Buffers, it does like zero copy. So in these APIs, there's a DSL, you define what the schema of the message you're sending look like, and then they compile a, they do the transpile, convert that schema into actually code, the data will handle it. Now you wouldn't do this for a, for every single message you're sending back with different types, but it's one way to sort of specify what the API actually would be or your messages. So again, we can use ChatGBT to make these things. So you can ask it, make me a Protocol Buffer specification for a table that has a name and a column, name and a timestamp field, and then here's one for Thrift, right? What's interesting is I didn't specify what language I wanted for either of these, and this one gave me a Java, and this one was in Swift, right? But just, again, this is the spec, you then transpile it using their compiler and it spits out the, I think it'll spit out Java code, Python code, whatever you want, it spits the API. But again, this is just the wrapper around the message, you would still need the internal encoding of the data you're sending, which could be different from one query to the next. The other thing we do also too is like, if you're doing, this could help keep track of like protocol versions, if you update new versions of the protocol, like this thing will keep track of all these things for you. Thrift has a bunch of other stuff too, they have like thread pools or RPC handlers. We looked at it for a while, bidding our own system, we decided not to do it, because it brought too much machinery we didn't want. And I'm also asked sometimes, okay, with protocol buffers and these things, why would I even want to define my own data, like data format for the storage and plus the network protocol, can it, can't it just be the same thing? And as far as I know, nobody does that until I found this year, there's a database called profaneDB that natively stores protocol buffers and uses Google, GRPC or Google RPC to send those protocol buffers directly out. So somebody emailed, so there it is. So the other alternative is to do text encoding. The idea here is that the database system could store all the data natively in a binary format, but when it goes over the wire protocol, we're always going to convert it into strings, like using the A2I, the C function. And then the client server, the client driver is responsible for doing the reverse of that, of converting the text strings back into whatever the binary data that the ODPC or JDPC needs, right? So this seems crazy, right? This seems like this be super inefficient, and it is from a storage and computational standpoint, but it does make a thing a lot, you don't have to worry about any of this, it makes a bunch of stuff a lot easier, right? So just give you an idea of what the storage overhead is. Say I want to send the 32 bit integer, one, two, three, four, five, six. I can store that in four bytes in native format in the IEEE 754 standard, like the native C, C++ type, will be 32 bits of four bytes, but if I convert this to a string and send that over the wire, then I need six bytes for the individual character, assuming I'm doing ASCII encoding, and then I need additional metadata to say what the size of the string is, right? Or if I'm doing null termination, right? So Postgres will do this. Postgres, with the text version of the protocol, they do this text conversion, and the paper also talks about monadb, monadb does this, by default. Now I forget in the monadb case why they did it, they said it's like for some historical reasons, but I think if you're building a new system today, you'd want to do this, not this, right? Because it's more efficient and you're going to get better performance, there's just less copy. For representing strings, again, this is the same thing we talked about, or like when we talked about storing varchars on disk. So the first approach is just to do null termination, like in C, like in libc, your C library, where the last character is going to be this null character, and so when the client is trying to, was scanning the packet, trying to find the, or get out the full result of a given attribute that's a string type, once you see this null terminator, you know you're having it and you're done. All right, so we, again, assuming asking encoding, we need an extra byte to store this at the end. The way most systems are going to do this is through length prefixes, where again we just have something in the front of the varchar, so here's the length of the string, and then the last approach is to fix with encoding, which I think, forget what, one of the systems think either monadb or mySQL did this, where no matter what the size of the actual value of the attribute is, you always make the payload of the value be whatever the max size of the attribute is. So think of it like there's something like varchar and char. So with varchar32, if I store one character in the database system, it'll store that one character followed by the length. If it's a char32, then no matter what string I put in there, it's always gonna allocate 32 characters. Same thing for this approach here. So the paper talks about how this, this is gonna be obviously the best and the fastest if your strings are small, because you're now not wasting space for storing the size for every single value or in every single row. But of course, and then if it's a char 1024, and I'm only storing one character strings, then I'm allocating much of space that's wasted. A naive compression scheme like GZIP or Snappy won't go to town on that, so I'm gonna compress that down because it sees long strides of repeated values. But again, most databases aren't using, using one of those compression schemes. So this will be the fastest if your strings are small, but in general, two is gonna be faster. One is faster because there's less interaction for like looking for the value, but again, you have to do this conversion into if your data system's storing this and the wire protocol wants that, you have to copy it. All right, so I would say also too that the performance of these different schemes are not independent of each other. So if we're doing padding, we'll have a lot of spaces, then if we turn on compression, that'll work out great. If we're not doing compression, then one of these other ones is actually gonna be better. And it depends on what the data is as well, what the queries are trying to return back. So I'm gonna show two graphs. They talked about how there's this vectorized approach that they could have and they have some results on that. I'm gonna ignore that for now because I wanna talk about what actually exists in the real world today. And then the, I'll talk a little about the ADB city stuff from the Apache Arrow, that's still very, very new, I think it came out last year. So I don't know, there's not many systems that actually support that just yet. So there hasn't been a scientific study so the vectorized approach with Apache Arrow versus the existing ones, at least that I'm aware of. All right, so this first experiment, we're gonna transfer one tuple from a TPCH table and we just wanna measure how long that takes. And they're gonna compare against my SQL with and without the compression scheme and then a bunch of these other database systems here. So it's, I think the paper talks about being DBMSX, that's usually Oracle, right? We can take that offline. All right, so here's the chart here. So the first thing to point out is that all of these schemes, except for MoniDB, are doing binary encoding, right? So only MoniDB is doing text encoding but yet it's still outperforming all these other protocols. The paper talks about for MongoDB because it's returning JSON that, or yeah, the returning documents that they have to embed all of the scheme information for every single record in the storage. Now for getting back one row, it's not a big of a deal because you have to store the metadata for all these other ones anyway. But that'll be a problem when it starts getting a lot of data. For Hive, I think the reason why they're the slowest is that because it's going over a thrift, the amount of data, the amount of additional metadata they're storing to sending back that one tuple is super high. And then for DB2, the reason why they think it's slower is the pure conjecture they don't know. But they suspect that's because it's doing acknowledgements at the application level in the network protocol. So the TCP is doing acknowledgements, the data system is also doing acknowledgements with the client. So in the next graph here, what we're gonna do is we're gonna look at the cost of transfer one million tuples out of the table, which I think is roughly about one gigabyte. And along the x-axis, they're going to increase the latency of the network communication, sending one package. So they're gonna go from 0.1 milliseconds up to 100 milliseconds. 100 milliseconds is a long time, but that's actually not that bad. If you're going across data centers from different geographic regions, you can easily get above this. But even with one data center, within a data center that's a bit high, but that certainly would be the cost of going to something like EBS or S3. EBS is less than that, S3 can be longer. She made a face. All right, so the first graph we wanna show is MySQL with and without compression. And what you see is that the regular MySQL protocol does much better, almost ordered magnitude better than the compressed version, up until the network gets slower. And then there's a crossing point because now that network gets slower, there's a trade-off where I'm willing to pay the CPU cycles to do compression. And that's gonna offset the slower network. It's not a huge difference, but you can imagine that this thing extended out to a thousand milliseconds. The lines would converge. Or sorry, the lines would split further. So this basically is saying that compression overhead is bad when the network is fast. Most applications, again, if you're running in the same data center as the data system, the network will be fast. If you're going across data centers, then you may wanna consider using compression. And then for all the other ones, they basically end up converging and looking the same. The one to point out, though, is how much worse the Oracle one gets, right? So Oracle is when the network is fast, it's doing quite well, but then as it gets slower, then it's up here with Hive and DB2. And again, the paper is conjecture. They claim that because of this acknowledgement that they're doing, these confirmation messages from the client to the server, then that's why the effect of that extra confirmation step makes things much slower on a slower network. That confirmation is not a big deal when you're over here, but when the network is slower, it becomes a bottleneck. So any questions about this? Okay. So again, in my opinion, if you're gonna build a new system today, I would start with the Postgres wire protocol because you get all the libraries for free. And then if your data system gets traction, then you should rewrite it and switch over. Yeah, sorry. Because the cost is not the network transfer. The main bottleneck is the compression. That Jesus is a slow compression algorithm, right? So I mean, it gets slightly slower here as it ticks up, but the amount of data, I forget the exact number, it's compressing like almost like two or three X over the binary data of everyone else. It does a little bit, but it's also a long scale, so you don't see it. All right, so I wanna share one slide about Apache Arrow. I'm not gonna go into details of what it actually is, but in the same way that we talked about before with like the Parquet and ORC being these open source file formats that a bunch of different data systems now can support to allow you to easily reuse data generally from one system to another whether having to convert it to their proprietary format. The Arrow movement is basically the same thing, but for network transfers, right? Now there's a standard columnar format for memory data that if your system can support this, you can ingest it or share it with other systems. So again, going back to the whole motivation of the paper we read, they were talking about taking data out of a data system and then importing it into like pandas. And so the wire protocol for the different data systems would convert it to whatever format that they wanted, and then the Python code had to convert it to the format that pandas wanted. So with Apache Arrow, the idea is that your data system could send memory buffers of arrow formatted data down to the client. It doesn't have to do any un-martialing, it doesn't have to do any deserialization or any transformations. It just takes those blocks of data and then gives that to pandas or whatever you want, and then pandas knows how to operate directly on that data. It's doing zero copy from the client server into whatever the application wants. So this project I think was started by a bunch of other data companies or sort of people building open source projects. The original implementation came and Java came from the Apache drill people. Apache drill is the open source re-implementation of Google Dremel, which we'll cover in a few weeks. But it was also sort of founded by West McKinney, the guy who invented Python pandas and friend of the senior database group. So again, the idea is again, we have it, if you put your data in this arrow format, you can hand it off to something else without having to do any transformation. So there's a bunch of other sub-projects within Apache Arrow that are also built around this format. So on the wire protocols, the new one that came out last year is ADBC, right? The Arrow Database Connectivity Library. And this again, this is a way for, if your database system supports this API, and you have a client driver for it, you can communicate it as if it was like ODBC and get back packs-oriented or, you know, columnar data in the arrow format and then do whatever you want with it. There was a precursor to this called ArrowFlight, or ArrowFlight SQL. My understanding of this was just, it was a veneer in front of something like ADBC where it would convert, you would send it SQL commands and then you would get back arrow-formatted data. Whereas this is like specifically doing like get some puts for arrow-formatted data. There's a whole another project called Data Fusion where they're actually building an entire execution engine around the arrow format. In the noise-pay system we were building at CMU, that was natively stored things as Apache Arrow. So there's a bunch of systems that are using this now on the inside. I know Snowflake can import and export data in the arrow format. Actually, I don't know about import, I know they can export for query results. Some systems use Arrow for communicating internally. So I can wrap up a bunch of buzzwords about what Arrow is, and you guys should all know what they are because we've covered this in the semester. It's a Pax-oriented format, you know what that is. It's doing, and it's doing dictionary compression. We know what that is. There's not much of anything else to it. And the way they're doing dictionary encoding is through pointers or offsets into an array of sort of values. So again, I think that this is a good example where the things we talked about in this course are sort of the bedrock for understanding how to reason about new technologies that come along. So now when you see something like this, ADBC, you know what it's doing, you know what Arrow is doing because you understand the trade of this. Now you fit into whatever the model or the system architecture you're thinking about building or working on. Okay. So the paper, again, focused on this idea of like, hey, these wire protocols are inefficient for this particular application domain that we want to deal with, right? And in the, and the paper they also talked about, it's like what happens when the database server is running on the same system or same box, you still have to go through this wire protocol, even if it's going over the main sockets. And so the protocol itself, it's inefficient, but it's not the only thing that's gonna cause problems for us. And the answer, the one of the big challenges we're gonna face is the OS, right? You heard me say it many times, the OS is our frenemy. The data system needs it to survive, but it's always gonna get in the way of things and screw us up, right? And so we wanna try to avoid it as much as possible. So we see this one storage, where we wanna use direct IO, we don't wanna use M-Map, and the same thing for networking, we need something to talk to the hardware to get data in and out of it for our network protocol, but the TCP-IP stack in Linux is gonna be slow, right? So there's gonna be context switches, switching to kernel threads to do stuff, then it's gonna stir stuff in buffers and that's gonna be extra copying, right? And the kernel is gonna take its own latches of the inside to protect its data structures, and so now if we're trying to get a lot of data through that hardware, it's gonna be slow. So I wanna talk about two approaches, sort of two categories of approaches to work with and avoid the operating system for doing fast network transfers. So the first one is gonna be kernel bypass methods. Do they teach us in other classes? I don't know. No, here. Okay, all right. So, sure, yeah, yeah, yeah. All right, so kernel bypass methods, the idea here is that the database system wants to avoid the OS as much as possible and do as much of the work that it needs to communicate with the hardware, in particular with the networking hardware, directly itself. And the idea here is that we wanna avoid, again, have the OS maintain state, maintain its own data structures, maintain its own buffers, and we can just do everything ourselves. So the, you know, the reason why we have this problem is because Linux is a time-sharing system, the time-sharing OS, it has to rely on slow and expensive interrupts to notify the kernel when stuff happens to wake things up or when new packets show up and someone needs to do it, right, the threading model in Linux is expensive, right, there's all these data structures that maintains the latches on the inside for scheduling and context switches and so forth, right, and things just get worse with larger core accounts. Now this has gotten better in recent years, Linux has gotten a lot better, but even then the data system has done a better position to figure out how to use resources, right. So the idea of the kernel bypass is that these specialized APIs that allow us to avoid the OS as much as possible. Now, IU U-Ring is exploiting the OS and we'll see why we need this in a second, but these other ones are more aggressive at this. So the first one is gonna be the Intel's DPDK or Develop Data Plane Development Kit. So this is for networking devices. There's another version called SPDDK, that's for the storage plane development kit. The same idea of doing kernel bypass for interacting with hardware. Again, think of this almost as like it's an API that Intel exposes to you to communicate directly with the NIC instead of having to use the syscalls. So they gave you libraries that go directly to the NIC and then you treat it as a compareable device, which goes against the entire model of Unix, right. Unix's whole idea was like everything's a file, no matter it's a device or actual file, you interact with that. So this is like breaking that idea. So to make this work now in the database system, we have to have a bunch of stuff ourselves. In particular, because now we're going directly at the hardware device, we got to implement layer three and four, TCP-IP in our database system. We have to manage the buffers, right. We have to, and when the NIC gets something, we have to get notified of that and put that data somewhere. That's all now our responsibility. So the benefit that you get potentially in theory is that you can get almost zero copy access to network messages, right. And the rest of the data system doesn't know that it came through dbdk. You just process it like normal, but like that having that on the, cutting down the OS involvement in getting data in and out. So to implement TCP-IP yourself, you could write it yourself, but there's libraries called like fstack that basically took the TCP-IP code out of FreeBSD and the port of that to user land. We tried using this for one of our works. Matt was doing research on PG Bouncer, Pesco's Proxies, and we had a student last semester try to implement fstack in this and it didn't work. And we're not the only ones have these kind of struggles. Like this is not easy to do. There's two systems I know that are actually doing this. There's SileaDBs, they have this framework called CSTAR. They're doing co-routines and much other stuff, but they're relying on dbdk to do kernel bypass for network activity or network access. They gave a talk with us I think about a year or two ago during the pandemic. They told us that dbdk is a nightmare and they want to get rid of it. And the other one I'm familiar with is Yellowbrick's YBRPC. I think they're only using this for back end communication, not between the client and the server itself. So I think they use this to communicate with S3 like for their object store. So don't take my word for it that like, so you read this, it sounds amazing. Like you want to use it as, don't take my word for it that it's going to be a bad idea. This is probably one of my favorite tweets of all time. So this guy's referring to SPDK, but it's the same metaphor applies here. So, I don't recommend it. We spent entire semester with one of my best masters student and we couldn't get to work. All right, so this is not good, we can't use this. The next approach is to do RDMA, Remote Direct Memory Access. And the idea here is that there's now an API that allows the, what I'll call the client, but it could be the server itself, to have one machine interact with the memory of another machine directly without going through the CPU, without going through the OS. And so you can do reads and writes into the memory of another machine, right? And the server itself is unaware that you're doing these things. So obviously there's a bunch of bookkeeping you need to set things up to say, okay, well here's the memory region where our database data or data is, you want read and write it can be located. And then something needs to know, okay, when things get written, if the other side has to do something, you need a way to notify them and say, go ahead and do something. I, this might be out of date. In the old versions there wasn't callbacks. I don't know if the new versions have callbacks. I'd have to double check this. But the two systems that I know that support RDMA is Oracle Rack, most famously, and they're using this actually for the buffer pool. So they can, it's not a shared memory system, but they use RDMA to make things look like sort of shared memory across the nodes. And then Microsoft Farm was a OTP system at a Microsoft research where they were using RDMA to do fast transactions. But because it's RDMA, because again you don't have notifications to say when things get changed and some people get notified, they don't do two-phase commit, they don't do three-phase commit, they do four-phase commit to get everybody to synchronize that transactions can commit. So again, this also requires specialized hardware. Like you need like Melanox and Phanaban hardware to do this. I think you can do RDMA over ethernet now, but again, I don't know many data systems that's actually using this. What I'm more bullish about, and it's very, very new, is called IOU ring. Who here has heard of IOU ring? Sorry. Well, he doesn't count. You don't count either. Yeah. All right. So IOU ring is this new system call that came out last, I guess 2019, so four years now. Delighted you zero copy asynchronous IO in Linux. And the way to think about this is that it's like the DPDK where you wanna avoid extra copies of data coming off the NIC, but you don't wanna give up the OS doing a bunch of stuff you don't wanna do. So when I talked about DPDK, I said like, oh, you have to write TCPIP handling code yourself. Well, the OS already has that. We don't want to re-implem ourselves. So IOU ring is a way to get kind of zero copy, but let the OS still do all that kind of crap we don't wanna do, right? So it first came out in 2019 for accessing storage devices, and then there was a patch added to Linux. Yeah, I think it emerged in 2022 that can now do different network devices. It's called IOU ring in the Linux world. Windows has their own thing called the IO compilation ports or completion ports, I think it's called. It's basically the same thing as IOU ring, but they don't do batching. But the way it works is like the OS is gonna expose these two buffers to you that are basically cues, and you say, here's work I want done, and then there's another cue to this, here's one thing that's getting done. But the difference is now when you say, go read from this network device, you pass along the buffer of memory where you want the OS to put your data into, right? You malloc, but the data doesn't malloc that space, put that in the buffer, and then the OS writes into that. Because without that, the OS is gonna copy in its own buffers, and then when you call, you know, read or whatever on the regular syscall, then it copies into your buffer. So this is reducing that extra copy by letting the OS do it for you, right? And you can batch things, and there's callbacks, has to be completion, not competition. But anyway, so this is the way I think people are gonna build data systems going forward. There are some early implementations of this. So there's some blog articles, this is from QuestDB, which is actually a Java based time series system out of the UK. Was that, sorry? I'm just making a joke that it's like Java, when you said Java based, I just thought it was nice. Sure, yeah, yeah. So I don't think it's this dude. Like they gave a talk with us, I think last year. A lot of the high frequency trading guys used Java. And there was a system called Lomax out of the UK that was like getting insane throughput numbers, even though it was the JVM. Because you basically do a bunch of stuff to avoid the JVM's garbage collector, you keep off heat memory, stuff like that, right? So the idea is that this guy in the UK, he basically applied all the techniques to build high performance trading applications in FinTech and apply it to a database system here. Even though it's Java. I forget whether, I think they used J9 to cut off Java at some point and go down the series. So they have a blog article about using IU ring. There's another one from, this is Tiger Beetle, which is an old TV system out of South Africa that is written in zig instead of rust, sorry. But they gave a talk with us last semester. And then there's a blog article from Clickhouse that says, oh, here's what IU ring can do in Clickhouse, right? But if you go look at the pull requests for the code for the IU ring enhancement to Clickhouse, this guy comes back and says, they didn't get that big improvement and became very, very hard to deal with. And then the guy who actually wrote the code couldn't figure out why there was bugs. So this passion never got merged. So they should use Rust. He says they should use Rust. I don't think, yes? They just write it as normal, so they probably make point colors on their syscall. I mean, so the statement is, there's a Rust creator library that raps IU ring, makes it look like a regular syscall, even though it's doing IU ring stuff with callbacks, I'm assuming, right? Yeah, so just write it in Rust. I think there's C++ libraries that do the same thing. I don't know how they're going to perform, but there's abstractions, the libraries are wrapped around this kind of stuff. Anyway, so for networking, I don't get it, other than, I don't remember Tiger Beetle, I don't think anybody else is doing this for networking stuff, but I think this is going to be, going forward, this is going to be one of the big, big enhancement people are going to add to the server. And certainly if you're building a system today, from scratch, you should think about, okay, can I do, it's not true kernel bypass, because you're like, you're still using the kernel, but you're just trying to get zero copy asynchronous IO, which is essentially what you really want. Yeah, okay, so this is a, this doesn't have IU ring, because this is the paper we wrote in 2020 before IU ring could do network support for networks. There's a paper we wrote with West McKinney, the Arrow guy, where in our system, we were trying to measure how quickly can you get data out in these different approaches. So you have here, higher is better, so how many megabytes of data can you get, just trying to do a bulk export. And so the slowest one here is Postgres, no surprise, there's the vectorized version of Postgres in the way that the, the duct B guys talk about. Here's if you use Arrow Flight, using the direct export of Arrow data, which again, our system stored Arrow data in memory natively, so it's basically zero copy onto the wire to get it out. And then here's the RDMA version of it. So this is why I'm bullish on Arrow Flight, or think of this like early version ADBC, because you can get this without specialized hardware. And this is also without IU ring or DPDK. So that's still pretty good. All right, so the last thing I want to mention is an alternative to kernel bypass called user bypass. And this is what my PhD student that's working on. And so the idea here is that similar to the IU ring where you're letting the OS do some work for us, and that work was what the OS was meant to do anyway. I already had the implementation of this. With user bypass, the idea is that we're taking database logic that would normally exist up in user code that we write, and we're gonna embed that inside of the kernel, like a kernel module, and let the kernel run whatever our database code we need. And the way we can do this now is through this thing called EPVF. I'm gonna show up hands. Who here has ever heard of EPVF? Okay, some of you. These two don't count. I got you. Now, okay. So the way to, before EPVF, the way you have to do this is through running kernel modules. You either write this in C, or you compile your kernel, so you compile your module and link it in, and then hope you don't hit a kernel panic, because you segfold or something like that, or you modify the kernel, add the functionality to UN, and then recompile your kernel, but no one's gonna do that either. But with EPVF, the idea is that, like the JVM was the hot thing in the 90s, this thing has its own VM that runs inside the kernel. So you write your EPVF program in their DSL, it then gets compiled, and then there's a verification process to make sure you don't do stupid things or do things you shouldn't be doing, like you can't call malloc in your program, right? You store things in these kernel maps that they provide. They also count the number of instructions you have, so you can't run forever and lock the kernel up. Then once it gets verified, it gets loaded in the kernel, and then at runtime it'll get jitted with the LLVM, right? And so you can now push some logic that would normally be up in user code, you could write it into their DSL, and have that be on the path of data as it goes from the hardware into the OS to the database system. Yes? So it's EPVF. It's EPVF, like UDS for the Linux kernel. Yes, it's a very apt metaphor, yes. Put that in the cookie jar, okay. So, but again, we don't want to do this for everything, because again, the programming model is quite limited, so we can't put a SQL parser in there, right? And you don't want to do this for any data that you actually need to have stick around, because otherwise you're not getting any benefit. Like think of it, what you want to put down is logic that looks at data as it comes from hardware, and then makes a decision on what to do next with that data where the next thing you want to do doesn't rely on having that data around for a longer period of time. So where we first implemented this was actually for Postgres Proxies. So a packet shows up on the proxy, you gotta look on the header and say, okay, where's this thing going? And then you immediately shove it back over the network to send it to the data server. So that one, you don't want to spend the time to copy it from the kernel up to the user space, then to look at it really quickly, and then send it back down to somewhere else. You can just look at the header really quickly and then shove it back out without having to go into the user space. That's much faster. Or you like traversing a B plus tree. You bring the page into memory, and you look really quickly, like you find your divider key and decide what's the next page you want to look at, and then you immediately throw that page away and go get the next one. Now, this is ignoring the bumper pull and other things like that, but you can imagine some of the zone map stuff we've talked about in the Parquet files. I read it in, the zone map's gonna tell me whether anything I need, the thing I'm looking for is gonna be in some sequence of bytes after that. If not, then I immediately throw that zone map away. If I can do this now in EPPF, I don't have to spend the extra copy to go up into, or spend the time to do the construct to go up into the kernel, or the database system. I don't know how this is gonna play out with IOU ring. We haven't done that experiment yet, but it may be the case that the IOU ring is so fast that the overhead of doing this extra stuff doesn't make sense, but we'll see. All right, so I like, I like the paper, I signed reading, oh, if I didn't like it, I wouldn't have signed it, but the, in my opinion again, this is a good example of what's in a database system that part of the database, most people overlook, especially in academia. Maybe it's less of an issue, because now it is every month. Postgres, the Postgres wire protocol certainly can be improved in the way that we saw in this paper. The kernel bypass stuff is fantastic, but it requires more bookkeeping and it's a huge pain to actually be working, and then user bypass is a new direction, I think, for some parts of the database system, but not entirely. Long-term vision is where we're going, I think, is if we're, think of all the things we're trying to do to avoid the OS. What if we've got the OS entirely, or built a unicolon that only ran our database system? That's the, I think that's where we're going. The Germans are working on this, but we'll see whether we can do something here too as well. Okay, all right, so any questions about this? All right, so next class, we're now, again, going down the stack, we got our packets, so we got our messages, we got our SQL queries, now we need to cover it into a physical plan. So for the next week, we're going to focus on the query optimization, and I say this is the hardest part of the database systems. So the paper I have sent you guys to read is a survey paper from the late 90s from one of the top researchers at Microsoft, but it's a rough overview of what the challenges are in query optimization. Even though it's from the 90s, we still have the same problems today. And as I said, this joke last time, if you get lost in this, don't worry, because the joke is if you can't pack it in query optimization, your backup plan could be like rocket scientists, because this stuff is harder than doing rocket science. And it's, you know, people are trying to do this for 50, 60 years, and it's still terrible. So, again, good luck. And then after that, we'll then discuss cost models, it'll be the following week, okay? Bye, guys. Have a good weekend. See ya. That's my favorite all-time job. What is it? Yes, it's the S.T. Cricut, I-D-E-S. I make a mess unless I can do it like a G.O. Ice cube with the G to the E to the T.O. Now here comes Duke, I play the game where there's no roots. Homies on the cusp of Yamabuku, cause I drink bro. Put the bus a cap on the ice, bro. Bushwick on the goal with a blow to the ice. Here I come, Willie D, that's me. Rolling with fifth, what's your G? And St. Ice, when I party, by the 12-pack case of a thought. Six-pack, 40-act, gets the real balance. I drink brooch, but yo, I drink it by the 12-valve. They say bill makes you fat. But St. Ice is straight, so it really don't matter.