 Carnegie Mellon University's Advanced Database Systems course is filmed in front of a live studio audience. We're all going to die. Okay. In the meantime, let's do databases. All right, so today's class, we're going to talk about networking protocols, and then this will be the, we're sort of hitting the, we'll call the second third, we're finishing up sort of the second third of the semester of the materials. So this week, and then for the next two weeks, we'll talk about query optimization. And then after that, we'll go through and start reading the papers for your major systems and understanding, you know, how they work and putting the things we talked about this semester, start seeing how they're going to be applied by the companies and the people building these various systems. Okay. So last class was all about how to take user-driven functions that the application level has written because they want to embed logic that would normally be in the application. They want to embed that directly inside of the database system and evoke it through a query. The idea was through in-lining techniques, we can convert the UDF constructs into a SQL relationship algebra and then have that be exposed to the query optimizer to figure out what the intention, what the user-driven function actually wanted to do. All right, so this is the example, again, of pushing the application logic into the database system. So as I said at the end of last class, today's lecture is about how to sort of do the opposite of get data out of the database system and bring it over to the application so the application could, you know, can process it and do it at once. So we'll first talk about, you know, start off with talking about, like, what these different database access APIs look like, then we'll go into more details of what the network protocols look like and that was the paper you guys were assigned to read about just, actually, what are the bits look like and how it's inefficient in modern application scenarios where data scientists may be working in pandas or some Python notebook and just want to do a select star and get a bunch of data out and then do all the processing on the client side. So we'll see how the major database systems today, the existing protocols, are insufficient or just not designed for that kind of workload. The answer is going to be, in the end, it's going to be Apache Arrow is the solution, right? So the paper you guys read came out before the Arrow database connectivity library stuff was defined, but they basically are reinventing the same thing and then ADBC and Arrow will do the same thing. We'll build up to that. Then we'll talk about additional optimizations we can do on the server side to make things run faster at the networking stack or potentially for other parts of the system by either doing kernel bypass or user space bypass, and then we'll finish up quickly to just talk about, okay, what are some additional optimizations we can do on the client side if we know our Python program or whatever it is is talking to a database system and it's going to put some data into a data frame, right? So I would say some of the things we'll talk about today will be applicable for back-end communication between the various database, like the workers in your system, if it's a parallel system and one worker needs to communicate with another worker or needs to communicate with the optimizer service or the scheduler service, right? A lot of these things that we'll talk about in that environment will still matter. Certainly, kernel bypass stuff could help, or user bypass stuff can help, but we're mostly focused on how do we actually expose data to the client and how to make that money efficiently. But we'll see when we go through the discussions of the real-world systems where there are some optimizations you can apply in the back-end. All right, so last class I showed a really quick demo of opening up the Postgres terminal and writing a SQL query and hitting Enter and then getting back some results, right? So that's sort of like a basic AP access method to the database system where you're sending a SQL query and you're getting back results that are meant to be printed out on the screen, right? Because it's meant to be interpretable by humans. But most queries aren't going to run like that. Most queries are going to want data in typically a binary form because it's going to be fed into some kind of application code that wants to do some additional processing on it, right? So in my example of the terminal, that's just plain text. Actually, in that case of Postgres, Postgres is actually sending plain text data over the wire back to the client. We'll see one system in particular that actually does that for whether it's talking to an application or a terminal. But most systems are going to be doing binary data serialization. So you wouldn't actually want to write your application by just piping out to pSQL or whatever the command line terminal you want to use. Instead, you're going to write your application using one of these different methods. And these aren't mutually exclusive. You could, depending on your application, maybe it's written in C-sharp or C++. You would use this. If it's Python, you use that and so forth, right? So various systems are going to support some of these. But we'll see when we go through it. The thing that we're really going to care about is the low-level NERC API of how we're going to put bits on the wire. And then all of these methods, except for maybe the last one, can hide all that, right? So the first one is this is a proprietary API that the system exposes to you, typically through a C library. And it's like you wouldn't want to write this for your application. This is like if you're writing a driver for these other ones here, you would use these kind of things. And you can look at the documentation for mySQL and Postgres, right? They all have information about the API for the low-level C library. How do you open up a connection? How do you send a query? How do you do authentication and so forth, right? And you can use ChatTBT to write this kind of stuff, right? And then you can basically say, you know, write me a C program that uses LibPQ, which is the low-level C API interface that you would use to program in Postgres. But again, like, typically you don't write programs like this. You'll use some other abstraction. You can use a higher-level abstraction like an ORM, like writing Django, ActiveRecord, Ruby and Rails, SQLized and Node.js, right? And underneath the covers, they may be calling the C API, but use the application programmer, aren't writing coding as these things. So I want to focus now on these two. The Python came later in the 90s, but you'll sort of see how things get built up over time. And a lot of things we'll talk about for, yes, in JDBC as applicable for, you know, whatever the, you know, pick your favorite library. So I pick your favorite programming language that has a specification of how to database connectivity, and they would basically follow through the same thing. Because the big idea of what these APIs are going to do for us is that in theory, instead of programming against the low-level, like C API, like these things, instead we could program against these technically database system agnostic APIs. And then if we decided, you know, changing what database server or database system we want to use, we wouldn't have to change our any of our application code. Of course that's not entirely true if you're writing raw SQL, because as we said many times, a SQL dialect could be different from one system to the next, but we can ignore that. So the history for this goes back into the late 80s, early 90s. Basically, prior to something like JDBC, it was just the C libraries that all the various database system vendors provided. So things weren't portable. You were writing again to a low-level API to talk to the database system that was very specific to the one database system, you know, you were using. And so there was early, people identified early on that it'd be nice that people are writing a lot of applications. It'd be nice to have a standard way to do database connectivity and to send queries, give back results. So I think the first attempt was in the late 80s from Sybase. They had something generically called DB library that was meant to be like an open source standard. I don't know if it was actually open source, it was meant to be the standard everyone could implement. But that didn't go anywhere. And then Microsoft teamed up with this other company called Simba Technologies in the early 1990s and they put forth this thing called ODBC. And so now pretty much every database system that you can think of today is going to have an ODBC implementation. Even if it's actually not a relational database system and doesn't support SQL, like MongoDB has an ODBC implementation, right? Because again, at some point you have to put it in the query command that you want to send over and ODBC doesn't know, doesn't care that whether it's SQL or not. Just here's the thing I need to send to the server. API's had iterated result sets, bind parameters to values and so forth, or values to parameters and so forth. So the high level looks like this. So the ODBC is based on the device driver model. It's sort of similar to how the hardware works in PCs where if you buy a graphics card, the vendor that sells you the graphic card, they're also going to provide you with a driver that you can install in your OS to communicate with the hardware. So the same idea, the database system vendor is going to be responsible for providing you with a driver that you can use based on the ODBC spec and then communicate with the database server. So if the application wants to run some queries on the database, they go to the ODBC driver and then the ODBC driver is responsible for sending the request over to the database server, getting back the result, and then marshalling it back into the form that's required by the ODBC spec then expose it to your application. So this can mean things like if my client is expecting everything to be 32-bit integers but the database server sends back 64-bit integers, then the drivers are responsible for converting that and cleaning things up. It also can do other things like there's certain features that are in the ODBC spec that the database system doesn't support. Cursors, for example, like Postgres doesn't support cursors, like TrueCursors, then the driver can emulate that basically, like send the query over, give you back a cursor to it, and then the results that are cached on the client side. So you can do a bunch of stuff in the driver. So the thing that we care about today is this piece here, the request going out and the response coming back. We're going to call this the wire protocol, the network protocol of the database system. So this is what we're going to focus on. Yes? So the question is, like, if I have a SQL query, where does that get converted to a plan? My understanding is, when you're using ODBC, you're using standard calls that don't have a SQL. No, they'll be a, like, prepare statement command and you put a string in, that'll be whatever the flavor is, SQL, the database system supports. There's no way that can be universal across the database systems. But, like, the API call to say, here's the query I want to run, and then execute it, get back the result. Now iterate over the result set and give me the, for each row, give me the second attribute, and I want that as an integer, all that's standardized. But the SQL itself just goes over the wire and the parsing, the planning, the optimizing, all that happens over here. Again, this is basically going to be calling, typically, the C API that I mentioned before. All right, so let's talk about, so again, this was the first one, the big one that really took off. And early to mid-90s, everybody was supporting ODBC at this point. And then Java comes along mid-90s, and then Sun recognized that, you know, if you want to be able to use Java applications in the enterprise, they need to be able to talk about database systems, so they had to support something similar to ODBC, but for Java. And at the time, again, ODBC was very much window-specific, but since then, it's sort of generic and it's expanded. But again, at the time, it was window-specific and for C++ applications. So it wouldn't work in the Java world. And the same way that Rust is the hot thing now, Java was the hot thing in the mid-90s. The idea was like, you write your program once and the JVM can then run it anywhere. That was mind-blowing for people back then. Go was the hot thing 10 years ago, it was always some kind of fad. All right, so, JDBC comes along and you can sort of think of this as like, again, it's basically the same thing as ODBC, just now it's for Java instead of C. But because they were trying to bootstrap this new connectivity API to an existing ecosystem of a bunch of database systems that already support ODBC and they want to be able to people get up and running for any possible data system as soon as possible, they have different variations of how you can build it a native or how to build a JDBC library or API or implementation of it. And they have various methods to sort of bridge the gap between what was available at the time versus what came on later. The four approaches are, and the first one is that there is no native JDBC implementation or Java implementation of communicating with the data system. So instead what you provide is basically a bridge or a wrapper in Java that then invokes ODBC, like the actual shared objects, the C code, that then that communicates with the database system. So this was meant to be like, again, if you have a database system that doesn't support JDBC yet, you could just wrap something around ODBC and use that. The next approach was that you would have JDBC calls make JNI invocations down into the C code of the C API and have that go over the wire to the database system. And again, this is because, thinking like taking the bytes, putting into buffers, all that was done in C and then the thing was just copying the data into Java. Another approach is basically you have a separate middleware, a separate server running that the JDBC thing then would talk to and then that middleware then would use ODBC to talk to your database system. So it's sort of extra hop to make the call you need it. And the last one is obviously going to be more ideal is that you have a pure Java implementation that makes the JDBC calls that you provide from the application directly into the vendor civic wire protocol commands. So every single database system at this point is going to have their own native Java JDBC invocations. But again, think how many times you come across something like in Rust or some cargo you want to use and there isn't a native implementation that says calling into C. That's really the top thing up there. So the top one has been removed and this is the best one and this is going to be the most common one at least for the most major database systems today. All right, so as I was saying, the thing we care about is what's being sent over the wire to communicate from the client whether it's ODBC, JDBC or whatever it is to the database server. And so every database system for the most part is going to implement their own proprietary wire protocol typically over TCP IP and it's going to use that to begin to send the bytes back and forth and acknowledge and get queries in and get responses back. If you're running on the same box and it's Linux, you can use Unix domain sockets to get faster performance because you're not going through the full TCP IP stack in the OS both on the client side and the server side. You can do this in Postgres. But again, if you're running in the cloud, the DB server is some far away location. You're not going to be able to do this. Those systems do not... I'm not aware of any system that uses UDP to communicate between the client and the server. TCP has its overhead because you have to send the acknowledgments and back and forth where UDP you sort of throw it over and hope it makes it. So no system I'm aware of will do this from between the client and the server. Maybe one system later on, Yellowbrick, they'll actually do this between use UDP to communicate between the back-end servers because it's just so much faster and they basically have to do their own retry and acknowledgments on their own but in that case, because they're trying to get the best performance possible, it was worth it for them to implement this. Postgres uses UDP to communicate between the stats collector and the different workers but again, that's all on the back-end on the same box. It's not between the client and the server. So typically what happens is the way you would communicate with the database server is that the client comes along, connects to the database system. There's always going to be some kind of authentication process or you're given a token because you've authenticated with something else or you do username, password or whatever the mechanism is. Ideally you wouldn't be using SSL or TLS because you don't want to sniff your packets. Then you send over the query. The data system will then block that connection. That's not true because you can do ASIC but it'll run that query and then as soon as it starts getting results it serializes them and sends them back over the wire. Now some systems can do cursors for example and start spooling you some of the results even though the query starts running. The query is still running but as far as I know most of the cloud systems once you get all the results then you can start sending things back. Obviously it depends on the query too. The root node in the query plan is like an order by with a limit on it. You need to see all the data before you need to start sending anything up. Again the thing we care about this today is this sort of step here and we'll talk a little bit at the end of what we can do to maybe speed that piece up faster. I would say also too in the paper you guys read they talk about how this part is actually not that big of a deal. We spent the whole semester so far talking about how to build a fast database system and how to run queries really fast and obviously if you're reading pedabytes of data I'm sure that's going to take a long time but in the paper you guys read and then this other work that came out from this thing called Connector X, this thing is actually in this expensive part just sending that over the network impact to the client. Again the query themselves aren't going to be that big. Like the biggest SQL query that you can get at top is going to be like 10 megabytes. So that's not expensive to send. Sending the results back is going to be because it will be very steps along the way because you may have to copy it in the form that the client or network protocol wants and that may not be the same as you're natively storing in the database. Yes. Question is how would a SQL query reach 10 megabytes? Yeah so this example actually comes from Google. They told me that they had it's not hard to imagine either. They had some dashboard where you can click a bunch of checkbox of what you want to visualize and all that's doing is concatenating search options in a giant in-clause and then before you know it you've got a 10 megabyte SQL string. It's rare, I'm not saying it's common but you can imagine something like that. We didn't really talk about tricks to how the in's go faster. You basically in that case if your in-clause is huge you basically put a hash table on the expression itself then you use that to probe when you do lookups. It's like a joint. It's like you think of in-clause as like materializing another 10 table. If it's huge if it's big. Other questions? So if you're going to build a new database system today you have two choices. You either can implement your own wire protocol by scratch and then in which case then you have to write your JDBC ODBC client libraries to support talking to your database system. The more common thing to do now though is just use an existing wire protocol from an existing database system because then you can just inherit their driver ecosystem for free. Right? It's not enough to say like okay I speak the wire protocol to say you're compatible with another database system. That's the bare minimum. If you spoke the wire protocol the client drivers don't know typically don't know don't care what the SQL query looks like sort of related to his question like they're not parsing on the client side to see are you really sending me a Postgres compatible query? They're just sort of sending the text over. So if you want to be able to support more of the ecosystem then you have to support the catalogs and other functionality but the bare minimum you would need is just to say I need the wire protocol. So it's about 50-50 now. It didn't used to be this way but the two most common wire protocols that are going to be reused is going to be MySQL and Postgres. MySQL used to be number one Postgres is actually becoming more popular. That's partly because there's a lot of databases that are like forks of Postgres where they keep sort of the top half including the network layer so you speak in the wire protocol and then they rewrite the bottom layer. That's sort of Neon does and Redshift and others do. The third most common wire protocol is actually Redis. This is because it's so simple. It's like text-based like get and set and simple things like that. But again if you support these existing protocols someone can run against your new database system without having to rewrite their application or change what driver they're using because you just piggyback off of the existing driver implementations. Snowflake interestingly did not do this. I think it's a different time. Snowflake decided we're going to write our own wire protocol from scratch including their own SQL dialect from scratch. They started in 2011-2012. I think if you're going to build a new system today it would be a hard decision to do that because there's just so much stuff you can reuse if you speak the Postgres wire protocol. So the paper I had you guys read was about how to improve the wire protocol between these different database systems. And they sort of focused on four key design decisions. Obviously also the background of this paper is that this is from the MONATDB light project which was a precursor to DuckDB. So Hannes and Mark who are the authors of this paper as part of the work they were doing when trying to make MONATDB be embeddable they realized all the problems they were having of getting data in and out into like pandas and R programs even if you're still running in the same process. So this is sort of what led them to throw away the code and start putting DuckDB. So again it's the same team but before DuckDB came a thing. And again this paper is focused on doing large data exports. So it's not complex queries doing a bunch of joins and a bunch of sophisticated aggregations. It's more or less like select star queries or even getting a subset of the columns projected out to then be able to feed that into a pandas to do additional computation or train machine learning models and so forth. So this paper is really about how to get data out of the server into the client. So now whatever obligations we're going to have talking about today you're going to have to also implement them in the client driver. Because if you start compressing things on the server side send that over the wire if the client doesn't know how to decompress them then the data is useless. From a row-oriented format to a columnar format if the client doesn't know that you did that transpose then it's all useless. Right? And so typically client drivers are being very conservative and they're not going to want to have a lot of extended capabilities in them because now you have to support that for every single possible language you ever want to support. So if you look at the CEAPI and you just wrap that around the various different programming languages then that's fine because you just sort of implement it once. But as I was saying before ideally you want to have a native implementation of your client driver in whatever program language you're running in so you don't have this copying over between like C or whatever program language you want. Right? And so if now you have all these additional features in your client driver well now everybody who every programming language that implements your client driver has to implement the same thing. And that becomes sort of could become problematic because it's fractured people don't implement all the same capabilities. Right? So there's a trade-off of how sophisticated we can be versus what people are actually going to be able to do when the client drivers. Furthermore in a modern scenario we haven't really talked about Lambda functions or serverless applications but a very common scenario now is like the communication between the database servers like I spin up a Lambda function which is like say some Python thing that is run it connects to the database server does authentication, sends some queries, get back results and then does some minor processing and then goes away. So in that case you're paying for the compute time on the serverless function and you don't want to have a bunch of expensive desidualization if you have a very sophisticated you know client protocol. I think and again the answer is going to be Apache Arrow is going to be the right solution to this. Let's sort of spoil them. Alright so we're going to go through these four major pieces one by one which is going to create us are again not just for performance but also again from the engineering side of the client. So the first one is going to be kind of obvious because we started off in the semester RowStore versus a column store and ODBC and JDBC are by their nature are row-oriented APIs because they were developed in the 1990s early 1990s before columnar databases were a thing. The paper of columnar databases column stores is like 82, 83 but that's a theory paper there was a Swedish system that was technically a column store but like in the 70s when no one's ever heard of that. Sybase IQ is probably the first one that came along it was a true column store implementation but that's like 97, 98. So again ODBC comes along in 1990 column storage aren't a thing and most applications people are writing are like business applications that are like going fetching one order record of single entities, single information so it was inherently row-oriented. So in this world what's going to happen is the server is going to take all the tuples that it's getting apart as output and even though the on the server side it may be storing them as a column store it's going to stitch them back together materialize them back together because the client protocol, the wire protocol wants it in a row-oriented manner because then you write applications and sort of pseudo JDBC stuff like this we're going to iterate over the result set about the data you want row by row. But if we switch to a column in a format then this technique could be bad too because if I ever need to get multiple data for a single tuple across multiple columns then I have to write some weird code of like iterate over the columns and iterate over the next rows and try to stitch things back together. Again this is not real code so the solution is basically the same thing we talked about at the very beginning we want a pack space model because now we can operate over batches of tuples and although we're going to be sending them the data out in a columnar fashion we'll group them together in row groups or small enough chunks where all the data we need for a single tuple will be close together right so this is what Arrow does as we talked about and so Arrow has this thing called the Arrow database connectivity and it's basically like JDBC or ODBC it's a specification, a programming API for how to interact with a database system and operate over getting back vectors and so if now your database system supports ADBC which some systems do like Snowflake for example and now I can make requests send a SQL query over to the database system and get it back in native Arrow form and then I can integrate that and use that in my application any way that I want without having to do any copying or deserialization because it's already in a vector format so we're not going to go through what ADBC is not everyone actually supports it but this is going to be what Hans and Mark are going to propose like hey, nice we have this vector-based API and this is what came out later right because the paper you guys read predates ADBC okay so now if we want to send things back as vectors how we want to support compression and this basically is going to smell like all the stuff we talked about before in storage of this trade-off between having general purpose or naive compression that is taking blocks of data and throwing gzip versus having a more lightweight encoding scheme that's specific to the actual data that I'm storing so again the easiest approach is to do just gzip or snappy or zstandard and this is basically you do all the same wire protocol construction of the packets of messages that you would normally do but right before you send it over the wire you just run gzip or snappy on it to compress it before it sends it over and the client basically does the reverse of it so this is not that common it's not on by default for most systems but I know for like oracle and actually it might be on by default but like a real low light compression but oracle, mysql and bigquery these are things you can go add on after the fact bigquery is doing this over http so I think it's part of the http client protocol they're adding gzip oracle added this in I think 2013 mysql's had it I think for a while there was a patch to do this and add this in Postgres to 2018 but that didn't go anywhere so Postgres doesn't support this and then the the next approach is again using all the stuff we talked about before using dictionary encoding, RLE, dust encoding frame of reference encoding and again the idea is that you recognize the data type and you're sending back over the for the response and you just run this compression scheme whatever you want on it so nobody does this because again it'd be a different arrow because you can arrow does dictionary coding that's the only encoding scheme that I think supports out of the box so if you get data back as an arrow it'd be already dictionary encoded but they're not doing the dust encoding RLE stuff as well so all your client drivers also support this as well and typically the way it works is when your client connects to the data server it's like when you do an SSH handshake you say here's the features I can support and the client in the server then picks the bare minimum they would have so you could have a bunch of old you have a bunch of clients showing up with old driver implementations and then not support any of these things so I think it's part of the reason nobody does this to support this all the different implementations yes Is it really a either or can't you have for specific encoding and then you do a different one? Yes same as it's not exclusive you could do both yes and then furthermore depending on how you serialize the data if you're just doing text encoding and you pad things out then this one's going to make a big big big difference versus like this right yes so they're not mutually exclusive as far as I know other than error ADBC nobody does this because I was saying the drivers have to support it so basically everything I'm saying here is all things we talked about earlier when we talked about getting things from the object store room from disk when the communication channel between the storage or the between the client and the server is slow then heavyweight compression is going to be much better because we're willing to pay that tradeoff of spending more CPU cycles to compress data down to smaller sizes because then that will speed things up as we send it over and obviously the larger the chunks of data we're sending over the better compression ratio we'll get next is how do we want to sort of serialize and encode the data we're sending over so the first approach is the most common one where you do binary encoding and this is where you're basically sending the data from the client to the server in a little binary form that it's being represented in your database ideally not always the case though and in this case here the client is responsible for dealing with any endian issues like if the data is being stored in little endian and your client for some reason is running on a big endian machine then the client is responsible for doing that conversion because the idea there is that the database server is just trying to get you data as fast as possible and the client can then since there's more clients than servers typically you can spread out the computational cost of doing that conversion across all the different clients so another question is going to be if we want to use the binary encoding how are we going to decide what serialization scheme we're going to use and in the paper you guys read they argued that rolling your own serialization format is better than using existing libraries because these existing libraries bring up a bunch of other infrastructure other things that you may not actually care about that add additional computational overhead and storage overhead or space overhead for the packets are sending back so what do you mean by this so like you can write your own serialization format to like how to take a result set of three attributes an integer, floats and whatever and pack them down into the byte representation that you then send them to the wire or alternatively you use one of these libraries like Cotobuffers Thrift or Flatbuffers is the newer one the better one there's Cat and Proto, there's a bunch of these other ones that basically provide you the capabilities to define the schema of the messages you're sending and serialize it out so one year somebody asked me like why doesn't any if we're going to be sending back data through PotoBuffs why not just store PotoBuffs natively and I was like nobody does that that sounds like a bad idea turns out somebody does do it because they emailed me later on there is a system, I think it's like a toy project where the wire protocol sends out protocol buffers and internally storage they're storing everything as protocol buffers as well because it just bytes so in that case you don't do any serialization or serialization when someone requests something you just send over the stuff you've already stored as PotoBuffs I'm not saying it's a good idea, but it does exist the other challenge also too is like with PotoBuffs that's least separated enough from GRPC where you don't have to bring in all the infrastructure for GRPC in thrift as far as I remember you bring in the threading models thread pools and I think buffer pools as well this brings out way more infrastructure if you choose to use this flat buffers it's like PotoBuffs it's pretty simplistic and just the serialization format there's other things that these guys provide to you as well which may be useful because they can spine keep track of the versioning of your messages and so forth but at the same time if you expand the capabilities the internal data members of the packets of messages you're sending when you send back results to take queries in PotoBuff will keep track of the different versions so you know what version of the API you're interacting with the other approach is to do text encoding and this is the simplest thing to do is you take no matter what the data is and you run the equivalent of two string or stir on it to convert it from the binary form to a string form and then you just send it over as variable-length strings to the client and this one's nice because you don't have to worry about any NS because it's some ASCII or UTF-8 format the client then takes your text and converts it back to the binary format and they can put it in whatever form that it wants for missing values you could have a separate bitmap to keep track of what values are null in MooneyDB they just store the value null to the string null to represent you have a null string right yes yeah sure yes well you need to make it an ASCII and then reverse it with A to I so is this a good idea or a bad idea as he points out what if the string is null what do you do I don't know this is MooneyDB is this a good idea other than his like how do you store null is this a good idea or a bad idea why what is his hand gestures what explodes the size of the data what explodes the size of the data uh huh uh huh why do we need to actually why do this this is the start of the string this is how long it is that's not the string why don't you like translate it at all what is the encoding integer so like encode it like this if I have a 4 by 8 30 bit integer 1 2 3 4 5 6 when I send it over the wire to the client I'm literally going to convert it into the string the ASCII string character 1, character 2, character 3, character 4, 5, 6 and I'll do what you said I'll either do something in front of it or I can do null termination but like every piece of data that I'm sending over in a record is going to be a string formula what is the difference well again so like I can store no no no so like this is storing like this is a if you look at the bits this will be 32 bits to store this number each of these is going to be say one byte to store the size on the ASCII character 1 right and you have to store the size or the null terminator or keep it fixed length which is the next one so good idea or bad idea that idea figures more data and if you're not going to use that and what happens to you know the expense it also seems that the compression comes in the wrong direction instead of compressing this you go in the wrong direction and if you put gzip on top of this it's going to compress the halibut and do it fantastic potentially yes why would gzip work better on approach 2 than on approach 1 because there's more things to compress there's more bits that's like we'll look at the results in a sec give me a sec what's that Sabian for byte time we'll go to the end result which one will be small if you compress the 30 bits of this versus whatever the 6 bytes plus the null terminator or the length like is that thing ever going to be smaller than this now I'm sure the data makes no better about how to serialize anything the data says we know better how to serialize this rather than always doing the same thing in theory yes do you want to spend the time on the server side so if you roll your own so most systems are going to do binary coding but roll their own and not use one of these existing libraries but then it's all the stuff we talked about before when we talk about data file formats we have to do the the null mass, keep track of data types of the sizes of the data and the messages that's fine we know how to write that stuff because we had to do it for storage anyway it's just more work it's like protobuf gives you much of stuff for free is it all things in cs you pay a cost this one we've already talked about how are you going to represent the length of strings you could do the C style have the null terminator byte at the end and then the client can just scan along and it finds the null terminator and says okay now I have all the data that I need this makes it harder then to do jumps into fixed length offsets as we talked about before if you're sending things over vector batches the most common ones we like prefixes which we talked about before and then some systems I think this was they're just going to pad out the string with additional characters to be whatever the max size of the edge could be like if it's a far charge 16 I have a bunch of four character strings just going to pad out the rest with a bunch of spaces yes say again it's going to be the best why because if it's fixed length then if you're padding you're padding 0 so gzip can take care of that and if it's fixed length you can jump around if it's fixed length and padding 100 0s gzip can compress that but then also too now everything will be fixed length and you don't need to decode you don't need to first read the correct so it again depends on what the query wants to do with it and furthermore also too if the column is like a far charge 1024 and I have a bunch of one character strings in it then that's wasting a ton of space why would people do that people are stupid you see all sorts of crazy things in real databases right yes on the server side you can reuse like libc's string functions yeah so when we built our first system like my second or third year second year at CMU we did this and then of course then we go over the wire protocol because you're speaking the protocol the protocol didn't want an alternator you didn't have to copy the string and add the length in front of it you have like one character in the back shell the column is like 4 won't gzip take care of that so in 16 minutes even if you have the varchar 1024 and you pad it out even if you leave small strings won't gzip handle that for you if you use gzip yes it takes time but also if you use it even snappy z-standard would be fast but not all the databases support that I just said Postgres doesn't support this Postgres wire protocol itself has no notion of compression you can hack it by like tunneling all your traffic over SSH and compress that but that's an extra hop and that sounds crazy but like the Postgres wire protocol as far as I know at least in 2024 does not have like a flag that says this is going to be compressed MySQL has it Oracle has it other systems do not so again sometimes ones can be faster sometimes twosomey faster no systems are going to do both no systems are going to try to figure out okay based on what your what the data looks like and what your query looks like I'm going to give you one versus the other because again that's more engineering overhead that you got to support now on the server side and on the client side and it's just not worth it right this will be the fastest if your data set size is small if it's all char ones this is going to be the fastest because you don't store the length okay I'm going to show as all things we talked about before these aren't independent right if I choose one of these that will affect whether you know what kind of compression team I want to use that's very similar to the stuff we talked about when we talked about data on disk so I'm going to show two graphs here so the first is going to be what happens when we just send one tuple from the database system to the client and the idea is here just to look at what the overhead of like just all the infrastructure around sending the query and getting back the result and for all these systems except for Hive these are all going to be using ODBC Hive is going to be using JDBC I think I forget the reason why they did that so here's the numbers and they're listed in order of performance so the first thing to point out here is that here's monadb that's using the text encoding thing that we talked about before they're sending over converting all the binary data into string form and sending that over all the other ones are using binary encoding but yet monadb is the second fastest or third fastest why? it says powered gzip what's that? yep gzip so is gzip helping him here? no so let's talk about why the other ones are slow so the slowest one is Hive the reason why that according to the paper why that slow is they're using thrift so thrift is going to do copying things in and out of thrift buffers so that additional mem copies to get data onto the thrift on the server side and then on the client copying out of their buffers as well and then thrift is also going to sending over a bunch of different metadata about what the the structure of the of the what the structure of the message is going to be you know they're sending that over as well so the size of the packet the message for sending the same tuple as all the other systems is just much, much higher dv2 is the second slowest because they are actually I mean Oracle does this as well but for some reason it's more pernicious than this one they're actually also basically re-implementing acknowledgements on top of tcpip so tcpip is already going to be doing like you know, sending acts back they're going to be doing that as well above that to make sure that like okay I got your message for this dv server for this you know I got this packet I'm ready to give me the next one right so the protocol itself is just way more chatty because for some reason they're re-implementing re-implementing this idea of acknowledgements, yes Was it based on UDP earlier? Is that what they mean? This question is based on UDP? I have no idea also too like since it's a proprietary protocol they can't see the implementation on the server side this is what in the payment they speculate how is it possible for one tuple, like how many bits is that other than at lows I'll say from tpch it's less than a kill light I think also too like this is like I think this is end-to-end time right and not like just sending the message so like this is like sending the query and then high basically converts the query into a map reduce job then it dispatches that gets back to the result sends it back so I think it includes that but I had to double check and this one clients on the same machine as a server um let's see what else they say they minimize query execution time they would query multiple times the system would cache the query plan and the result I came back to what I said it wasn't running a map reduce job it literally is just like how to get data in and out as fast as possible right again it's one second it's long high is not a great system I'm not, yeah there's a reason why you're not Facebook ditched and rewrote rewrote presto right Hive was the stopgap solution in the late 2000s when and I was sort of part of this like the Hadoop came out, the map reduce paper came out from Google Yahoo took it re-implemented the ideas as Hadoop Hadoop was like the hot thing everyone was like this is amazing this is how you should be doing analytics and big data stuff um the relational database people which I was a part of we were like you guys are all doing it wrong you're reinventing stuff it was invented in the 90s for parallel databases distributed databases and then like declarative languages like SQL it's a good idea processing data on partition tables that's a good idea and then people realize oh yeah writing these map reduce jobs in Java sucks be nice we had SQL so then they built Hive which is basically a translator from SQL and it would then co-gen a map reduce so yeah you're making a face I'm not saying it's a good idea for this again they were surprised at how slow DB2 was again as you were saying it's such a small amount of data but again I think the protocol is just so chatting alright so let's now look we'll send more data so for this one we're going to send a million tuples from TPCH and what they're going to do is they're going to scale along the X axis they're going to artificially slow down what the network latency is between the client and the server right and so the first line I want to show is just for mySQL with gzip and mySQL without gzip so this basically corroborates what we talked about before with storage getting things again from S3 or the object store or whatever when the network is really fast you don't want to compress the data because the CPU cost of doing that additional compression is just not worth the penalty or it's not worth it because the network is so fast when you see this gap here where the network is really fast not using compression is the better way to go even though you are sending more bytes but then even though we are on a large scale here but as we get to a slower speed so 100 milliseconds for the latency again we're on a large scale but the the compression one actually is slightly better because in that case the CPU or overhead is not the dominating factor of getting the data out right because compression overhead is bad when trade-off the network gets fast right so now we bring back all the other ones right and they all basically convert or moving along in the same way as expected right the time it takes to get the data out of the database server goes up as the network gets slower but what's surprising here is that you kind of see that in the case of Oracle they're one of the faster ones when the network gets fast and the network gets slower they're now the second slowest DB2 is always the slowest Hive is actually beating DB2 when the slower network and so the Oracle is a proprietary protocol we can't see the implementation of it but they speculate again just like in the case of DB2 Oracle is also sending their own acknowledgments back and forth and it just becomes more with the network gets slower so again all of these except for MoniDB are binary protocols but MoniDB is actually what the is the third best after MySQL and MySQL with GZIP because it's simple yes this question is do you get the same benefit of compression for the other systems as MySQL? I would assume yes like Oracle you could test it I would say yes because the Oracle wire protocol the actual bits themselves may be different than what MySQL is but it's a binary based protocol like MySQL so it'd probably be the same why do they only turn on GZIP for MySQL? I don't know so I'm going to show another result from a different paper this is a paper we wrote with one of my former master students now a PhD student at MIT from Apache Arrow so for this one this is from our older system Peloton or NoisePage and the idea was how fast can we get the line table out of the order line table out of TPCC 7GB data how fast can we get it to the client so the client isn't doing any competition on it it's just how fast can you get it and so our system is supported in the PostgreSQL wire protocol this is the default like PostgreSQL without compression, row based this is how fast you can get the data out so natively our system was storing everything as Apache Arrow tables so in our system you can do transactions then over time as the data got cold and you weren't modifying anymore it would just then flip some bits around and then it would be natively starting Apache Arrow so this next bar here is what you get from what they were posing the paper you guys read here's the vectorized version of the PostgreSQL wire protocol where you're sending things as a PAX format rather than as a row warranted but then the next approach is using early precursor to ADBC the Arrow connectivity stuff where this is like natively sending out the Apache Arrow data in its form without doing any translation just natively shoving that to the Python application and so it's faster because there's no conversion over the to convert it into a different form it's exactly for what we're sending the data we're storing natively in memory we're shoving those bytes right out and so now the last one is RDMA I'll cover what that is in a second basically this is like a network accelerator to do kernel bypass to literally get the data out of memory put it on the NIC and send it out without having to copy things into the CPU first and I forget we use the Phantom Band for this one but again this one also is just sending out native Arrow blocks rather than doing the conversion so again even though the paper you guys read didn't implement didn't have Arrow at the time to send it out the performance difference I think would look like this so guys what I'm saying something ADBC just shoving data out as Arrow is the right way to go if you're building a modern system today yes is there a cost to convert whatever PostgreSQL is into Arrow I mean certainly yes this is the cost of converting Arrow into the a PostgreSQL compatible protocol that sends things in a vectorized format this is like I don't do any copying I just literally shove the bytes out and then paper talks about it like to do something like this to rewrite your write protocol it'd be very unlikely that you're storing that data naively anyway so if you just convert things to Arrow or have things already be Arrow internally then that's a better way to do this that's why you see some systems like the intermediate results going from one operator to the next in the query plan or how they exchange data between the different workers if everything's in Arrow then you have the infrastructure to shove the data out like that okay so these experiments are shown here we talked about how like okay the network protocol like you could press things is it how are you encoding the serialization format how much metadata you're sending around like that was what we focused on but that isn't always going to be the major slowdown of sending things over the network right as I said many times the OS is going to be a problem for us it's always going to ruin our lives make things harder for us, break up our marriages and whatever right and in particular TCP IP stack it's just going to be super slow and ideally we want to try to avoid it so why is it slow well it's the networking implementation is based on this model of interrupts so they're assuming these interrupts are going to come along and that's how it's going to trigger things like hey bytes are ready to go in and out and you're going to do a context switch all that becomes super expensive then you get data coming on the NIC the OS wants to copy that in its own internal kernel buffers and then before it hands you that memory it's going to copy into your user space buffers what's that face what's that what's wrong what when they were trying to yeah this sucks this is terrible furthermore so the kernels got under the threads coming down and they're handling the interrupts they're handling things coming over the NICs and hardware and so forth well those have to be scheduled they have to maintain their own latches for their own internal data structures all that is going to be problematic so we want to figure out a way that we can avoid the OS as much as possible we need the OS to survive we need it to give us some memory and obviously schedule us but after that we want to avoid it as much as possible and that's going to allow us to run faster so what I'll talk about next is going to be focusing primarily on for networking stuff but this also applies for disk you want to avoid the OS for disk as much as possible too alright so the first part is going to be what I call kernel bypass and the idea here is that we want to be able to get data directly from the hardware in this case the NIC, the thing that the network interface we want to get that into our database system running in user space into our memory up there without having to go through the OS, without doing any copying ideally without having to talk to the OS TCP IP stack and so there's three different ways you can do this there's the DPDK RDMA and then IOU ring is going to be the newer one so the way to think about this is Linux is a time-sharing system and that means it's going to rely on these slow expensive interrupts to again tell it when there's something new showing up and take away excluding some thread to go let now the kernel thread deal with whatever the interrupt handler and all these additional threads on the inside they're going to maintain their own matches and all those things are going to be problematic for us now Linux has gotten a lot better in the last ten years for handling with large number of core accounts it's gotten way more scalable than it used to be but whenever there's contention no matter how great your code is everything's always going to fall over we want to avoid as much as possible so the DPDK the data plane development kit this is something from Intel so it's a set of libraries that allow your user space program to interact with the NIC directly there's an equipment in the storage world called the SPDK the storage plane data kit also from Intel and the idea here is that you treat whatever the hardware device you're trying to interact with as a raw device meaning you're responsible for like in the memory space of that device and interacting with it this goes against the unit's philosophy where everything's a file no matter whether it's a file on disk or it's a hardware device you interact with these things as files your F reads and so forth but this breaks this model entirely so now because now the OS we're moving the OS from the low level layers like three and four that means that in our database system we're responsible for doing a bunch of stuff that the OS would do for us and ideally we could do this better but not always so the most obvious thing if we're using the DPDK to do networking stuff well now at since there isn't the TCPI the OS isn't running the TCPIP stack for you on the device we have to do that in our database system you either write it by hand or you can use an open source library like fstack that basically re-implements in user space TCPIP like sending the sequence numbers sending back acts and all that we have to do ourselves the OS isn't going to do this and the hardware doesn't do it but the advantage is that we don't have any data copying because now we're getting literally raw buffers of packets we have to manage what those are off the device we're not calling a read excuse me there's no sys calls everything is done again reading directly into memory so this sounds amazing right as far as we know there's only two systems that actually implement or use DPDK the first is ScaliaDB and they have this framework called CSTAR that they're built on top of ScaliaDB is a re-implementation of Apache Cassandra and C++ with like co-routines and DPDK and some other optimizations where Cassandra is entirely in Java and then Yellowbrick we'll cover later on they also use this as well but we had the ScaliaDB guys gave a talk with us a few years ago during the pandemic and they mentioned how in the CSTAR they used co-routines and DPDK but DPDK for them has been a total nightmare to deal with and I think it's turned off by default at this point I saw the Yellowbrick CTO a few weeks ago at CIDR and as far as I know they're still using DPDK for their implementation again we're doing this they're doing this though in the back end not between the client and the server right why is it so hard well again because you have to implement a bunch of stuff that they do for you, you have to implement it yourself and we tried this in our system we had one of my best master students try to use F-stack to speed up another project we were doing to make a Postgres proxy run faster and we just couldn't make it work the engineering cost is just way too high so to it's a bit crude but this is one of my favorite tweets of all time so this guy's talking about the SBDK which again that's where the storage plan to get but the DPDK certainly applies here so all this kernel bypass stuff is fantastic but it's like peeing your pants because you're cold and then you regret it pretty quickly is it? thank you okay so the next approach is you do RDB and this is where you have a it's like NVMe there's an API that the Harbor device allows you to do write directly into the Harbor device and get to access things on a remote machine as if it was local so for this one it's a bit more tricky because now if you're reading writing to memory addresses on a remote machine you've got to be sure that what you're actually reading is what you're expecting to read so there is a bit more handshaking you have to do to set this up so this is typically again something you maybe don't want to use on the client in the server you want to do this on the back end but if you can pull this off then you get a huge win so it used to be you could only do this on a finnaband which was sold by Melanox I think NVIDIA bought Melanox recently at some point NVIDIA uses it they have NVLink as well but Rocky is basically RDMA over converged Ethernet or something like that this is more common now so RDMA is not used that often the only system I know that does this will sell it to you as Oracle for Exadata but that's like you buy the whole rack you buy the rack of compute and the rack of storage and they're using RDMA to get in the storage you can get RDMA on Amazon but you can only be able to communicate between your own machines that have that and it's a lot more work to get that set up yes so how this works basically is that the client knows exactly what address the data storage on the server yes so the statement is and it's correct the way this works is that the client the application could just be the thing is going to talk to some other machine has to know what memory address it wants to read assuming it has permissions and then the request is give me the contents of that memory so the hardware knows how to go up to memory get whatever you want and pull it back down and it doesn't notify the CPU that it's done that yes is there a security problem for this sure but like DPC you're not letting you don't expose this over the public internet again if you're buying exadata these things are like millions of dollars you're running this on-prem it's a locked cage you know the traffic is just between these two things so the last one is IOU ring which I think some of you guys are familiar with but this was an extension to in Linux to sort of clean up their asynchronous IOU API that allows you to do asynchronous requests to a hardware device either storage or networking it was originally storage and then they added networking two years ago and basically the idea is that you have these circular buffers where you submit a request and say I want this data from this storage device or this hardware device then you get like a callback you provided say okay when it's available in my buffer let me know so you can make a bunch of these requests I don't think it's not entirely bypassing the kernel it's just less you're not paying the overhead of making the sys call and block waiting for the data so you make the request to do whatever it is to read or write on the memory that you provide the OS the OS does it for you in a kernel thread and then once it completes the task you have to do it puts the result in a queue and then gives you a callback so again this is a low latency way to avoid the overhead of a full sys call to talk to a hardware device but you're still relying on the OS to do the low level of marching data in off the device yes there's much of these libraries and you guys look them up for Rust there's one in Linux or in C++ they provide or maybe guys I don't know which one is the most common so as far as I know very few systems do this although you guys are well there's two more the first one is QuestDB so they talk about in 2022 how they added IUU ring and for this one QuestDB is a Java the top part of it is Java Tiger Beetle is another one and they're using IU ring but this is for transactional stuff this is actually written in ZIG not Rust and so I think there's some library in ZIG that made the season for them to do but it's in their standard level we talked to somebody recently or yesterday who was like the embedded fast lanes in ZIG because the SIMD stuff was way better than Rust the interesting one though is Clickhouse so they came out with a blog article in 2021 about they're adding IUU ring in asynchronous IO to Clickhouse we had a guy give a talk from the Postgres team about adding IUU ring to Postgres but that's going to be years away because they're rewriting the whole storage layer in Postgres and I think they're finally going to get rid of the OS page cache which is nice but there's this blog article that talks about here's what IUU ring can do for us it's going to be a big win if you go look at the poll requests lo and behold you come down here and here's one of the original developers of Clickhouse and current CTO he basically says like yeah he tried adding it but it was marginal improvement and it became an engineering nightmare he says it became so complicated that even an experienced C++ engineer throughout the code could not figure out why there are rare hangs of queries they found through their testing so the blog article is 2021 based of Clickhouse in February 2023 here's the same dude giving a live stream talking about how they've now added IUU ring so they did end up merging this code and they're touting how it's the magic pill to make IOU less slow in his webinar but then you go look at the poll request again and this is just a few weeks ago or a few months ago he's posting here I didn't observe IOU ring to be much slower but also I have no big expectations because I wasn't able to find cases when it's faster because he's responding to somebody up above that talks about how IOU ring when you enable that makes his queries run slower so I think huh? yes go ahead like the query execution code itself is blocking yes like batching and then like I need to read these 10 blocks go batch a bunch of stuff and then in the background when it's available I can process it I think that's the only thing they could benefit of I don't know about QuestDB QuestDB is like written by HFT guys out of London and those dudes all sorts of they know how to make Java work really fast I don't know how they implement them there oh they didn't yes so they have a crappy MAP annotation and then they're like okay it's basically like if I chop my leg off and I can barely walk but I still have the leg back on now I can walk okay alright so I think I don't want to comment I think the jury is still out I think that this is still pretty bleeding edge but it's interesting when you guys come up alright so I want to quickly talk about two last things so these are all sort of kernel bypass methods but there's another alternative is instead of trying to avoid talking to the kernel what if we put things in the kernel that we would want right to avoid copying up into the user space so let's take a time let me skip this so this is a technique called user bypass it's not a new idea like people have done kernel modules and extendable OS kernels for decades and what's different now is we'll see in the next slide but the idea here is that instead of trying to get bypass this part here and pull a bunch of this logic up into the database system what if we put database system logic down in the kernel and so that when data comes in we can process it or do whatever you want on it as quickly as possible without having to copy the user space and then if necessary go back down to the hardware to send things back immediately so this makes sense when the data that's coming in with the network or whatever it is doesn't need to be retained for a long time like if it's a say an acknowledgement message and it needs to keep track of that I got it and I don't need to retain it then this technique potentially would work right so because you avoid all the overhead of copying buffers of scheduling additional threads and making system calls because everything now is just running inside the kernel which is always going to be faster so as I said kernel modules are one way to do this but like if you've ever written a kernel module before you can ask you at GBT it's a pain in the ass it's super cumbersome if you crash what do you get? kernel panic you take everything down and then in some scenarios you can't even load kernel modules for security reasons like the the hardware won't let you you know in load a unsigned you know unsigned kernel module so the thing that has changed where it makes this actually viable now is something called EPVF curiosity who has heard of EPVF before well other than people that hang out with my student Matt right so EPVF is we'll talk about what EPVF is EPVF stands for the Berkeley packet filters so this is like in the early 90s they had made for BSD eventually made it Linux but it was a way to specify like packet forwarding rules and filtering rules like through a DSL you then load into the kernel right and so EPVF not really about packet filter anymore but it's basically a way to take write safe code that then gets verified and then load that dynamically as if it was a kernel module on the fly and the reason why I'm saying that it's sort of safe is that they give you a limited API that you actually do in these kernel module programs that you're running right so you can't call malloc you can't you know can't sit in an infinite loop forever right because they're ideally they're trying to avoid you from you know taking down the kernel and breaking everything so you write your code your BPF program in C code you run it through their compiler that generates bytecode that then runs through a verifier it literally does basically branch expansion it figures out all the different possible paths you could go down in your code and counts the number of instructions that you would execute and throws an error and throws back and rejects it if you have too many instructions right so this is a wild thing because again this basically allows you to extend Linux without having to recompile Linux so this is heavily used like Netflix for like observability to be able to get metrics about what processes are running and get this data out but as the you know since Matt's been working on it here the API is expanded so there's a lot more things you can start doing now and basically run the entire database system down in your kernel whether that's a good idea or not that's what his reachers is going to figure out but the idea is that can we start thinking about what part of the database system that we're spending a lot of time on moving data back and forth between the OS or the hardware and the database system what can we start pushing down so I'm going to show one graph from his paper where he was re-implementing Postgres wire protocol proxy so I think a proxy would sit in front of Postgres the client connects to it and the proxy maintains available connections to the database system and just forge your packets along that so in this scenario here packet shows up to send a query request and then the proxy just looks at it says oh it needs to go to the server and this sends it that's all it's really doing it's not doing any computation on it so we're comparing its PG Bouncer which is the most common most common proxy implementation used for Postgres Odyssey is out of the index it runs in user space but they're using hand-written co-routines written in an assembly where the assembly overwrites the stacks of other threads to put inject with the next thread to run it's very impressive it's very complicated and then ours is based on it's a fork of PG Bouncer where all of the authentication stuff happens up in the user space like SSL setup and things like that all the user passwords all happens up there but then when packet show up just to forward them all that's done down EPPF and so the main takeaway here is if you run on a really small machine you're getting pretty significant performance improvement because you're not paying the penalty of copying things back and forth between the kernel so I'm not saying BVF can be solved for all the things that we talked about today but I think this is going to be a better solution than something like DPDK and potentially IOU ring for some things but not everything all right we've got one minute left let me just bang through this real quick so soon we do all the amazations to get things out of the server back to the client client's got to do something with it and put it into the form that the application needs and as I said if it's JDBC OVC like that's copying things as they're oriented format that's you know the overhead is not going to be that significant but if it's the scenario where it's a data scientist trying to get things out of the data system and put it into pandas then that's going to be slow so this here from this is an experiment they did where they took pandas ran a SQL query through pandas SQL API that went to possess my SQL got data back and then converted it into a data frame data frame is like the table of abstraction in pandas and a bunch of other python systems so in this case here the chart showing that the query part is it's not that it's not it doesn't take a long time relative to all the cost of actually copying the data off the bits we got from the server and converting it into the data frame again adbc and aero solve this problem because if your python code can interactively natively operate on aero data then you don't have to do this conversion but if your system doesn't support adbc like mysic and postgres then you have to pay this penalty so the gist of what they're doing is that they have this thing called connector x it is using pollers and a couple other systems I think as well like modin and basically your SQL query shows up that you write in python you then also provide some information about how to split that query up into sub queries or partition queries like range partitioning and then you send out multiple queries at the same time from different threads that are going to get a portion of the data that you would want to put into your python program and then each thread is then going to populate the data frame at different chunks so instead of taking one SQL query get back a giant result and then one thread populates the table they take one SQL query rewrite it by adding like additional expressions in the where calls then send that out in parallel get back multiple results and then the threads put it together I just want to bring this up because it's an alternative if you don't have ADBC then this is another person do this alright we're well over time so I apologize alright so networking protocol matters a lot kernel bypass can make a big difference but it's a pain to have to use I think EPVF is going to be the something that's going to get a number of uptake in the next ten years or so as EPVF gets more expressive okay so next class will be on query optimization and we'll have three lectures on that and that'll be again the core material we need to understand before we start looking at other real-world invitations and I know I haven't posted the updated reading list because I don't know what paper to read for the first class because like there really isn't a good one but we'll figure something out but I'll update the reading list tonight okay any questions thank you