 Yo, hey, yo, hey, yo, yo. Pack the chrome styles, fly like Mrs. Jones. Lyrical mathematics will have the devil's smoke and stones. I put heads to bed, lick shots, and rap with flab. With the church of time, no man's trying to give me head. Epichratus will give the pressure lecture on Wednesday. So first of all, wands back in the back. Yeah, there you go. Thanks for being here. All right, so before we get into today's topic, let's go through the final things and the initiative side of things. So the final submission on Project 2 will be next week on Monday, May 1. I still have everyone feedback. That's on my to-do list for this week. The final exam presentation will also be next week, but then the week on Friday, May 5, 5.30 PM. Let's do it in this room. And then I'll send an email or post on Piazza if anybody has any dietary restrictions and we'll order pizza for everyone, OK? Again, it's not like a light heart and fun thing. Hey, here's this cool stuff we've done when we celebrate it. The final exam, which WAN has just posted on Piazza on Gradescope, that was me due the same day as May 5 as the final presentations, but I'd rather have you guys do awesome job at the final presentations. So I bumped that to be Sunday, May 7, at midnight. So you're going to submit it on Gradescope as the PDF. But then you're allowed to use chatGPT to write your exam. But obviously, if you ask chatGPT to write it, you don't read it and you submit it, you're not going to do well. Because I put the final exam question into chatGPT yesterday. I don't have a screenshot, sorry. I'll post it on Piazza, but it gets it wrong. So it says, in particular, when it starts talking about OLAP systems in the 2000s, it says HStore, which is the system I worked on when I was in grad school, took advantage of multiple cores and made analytical workloads run faster, which is the exact opposite of what HStore was. HStore is about making transactions run faster, not analytical queries. So if you blindly submit whichever chatGPT gives you, you're going to be wrong. But you can use it as a starting point to help you flesh things out. Now, we're going to do a little experiment. So I don't care whether you use chatGPT, but when it has set up an anonymous form that I do not have access to, I want you to go fill out and tell us whether you're using chatGPT. And the idea here is that as I grade your exams, I'm going to mark whether I think it's chatGPT or not, and then we'll do a match-up to see who gets it right, or whether I'm right or not. OK? Again, you do not lose points if you use chatGPT. Well, hold up. If you use chatGPT and, sorry, whether you're right or not, or using chatGPT, if you say the wrong things, like HStore helps making analytical workloads run faster, you will lose points. OK? So again, it's 2023. ChatGPT is not going away. This is the future. Start learning how to use it correctly. OK? You're not OK with this? Again, this is the future. What did I tell you? You disagree? You disagree? This is the future. It's not going to get everything correct, right? Like I said, it's going to get us up. It's your job to go through and figure it out. So the alternative was I was going to have either chatGPT write the exam, or try to write the response, and then give you guys, give you that to mark it up to see where it's wrong, right? But I figured, let's start with this one first, OK? And I said, the form is anonymous. I don't see it. I won't look at it, and then I'll hand my results over to Wayne and he'll cross chat. OK? All right, and then today's lecture is about Velox. And then on Wednesday, we'll have another guest lecture in class. We'll do the same setup we had before with DuckDB to talk about Redshift, and then I'll finish it off for this semester, OK? Any questions about any of these things? Cool. All right, so today's lecture is about a framework called Velox. Or if you watch the videos, the European guys say, Velox, I say Velox. I don't know what, they might be correct, I don't know. So the first thing I'm going to point out, though, is Facebook or Meta, I'm probably going to say Facebook a lot, you know, because they change your name. It's just hard to, hard to bring a habit. Even though Facebook is not a, like, a cloud service company, the way that Google is or Amazon, Microsoft certainly morphed into that, they have made significant contributions to database systems since they're beginning in the late 2000s. But one of the key differences about, I think, Facebook versus Google, like Google had all these, the list of systems that I talked about, Facebook has actually done a great job at actually outsourcing a lot of its work in a way that Google hasn't. I think the one thing we missed maybe from the list of systems from Google was LevelDB, which then they then open sourced and Facebook took a rammer that will cover that in a second. The nice thing about Facebook is, again, they're not in the business of trying to sell you cloud services, although potentially they could have, become like another Google, but they didn't do that. The word of the metaverse. But so they open sourced a lot of their stuff. So this listing here, these are the ones I could think of that were major database system projects at Meta that, again, almost all of these are open source. And so, like I was saying, they don't sell these as a cloud service, but these have had significant influence in the database industry. So in particular, Cassandra has, there's bits and pieces, well, DataStacks is sort of a Cassandra company, but bits and pieces of this are showed up in FontDB, and Cassandra API has been rewritten by ScaliaDB, like a C++ version of this. Actually, who here didn't know Cassandra came from Facebook? Everyone knows this, or you didn't know, right? So the guy that wrote Cassandra wrote it for the mail service in like the late 2000s, 2008, and then they realized they didn't actually want to use it because it was a combination of ideas from Google Bigtable and Amazon Dynamo. They realized they didn't even use it, so they just open sourced it, and said, here it is. And they never actually used it internally, right? But they wrote Cassandra. Hive, we've covered, RocksDB is a fork of Google's level DB, and what's the first thing they do when they took a level DB? They got rid of Mmap, right? And that's in RocksDB. And here's a list of companies that either are derived from it, embedding it, forks of it, it's a lot of systems are using RocksDB. So in particular, Cockroach started off using RocksDB, and again, it's an embedded key value store, but you can build a larger system on top of it. They eventually rewrote their own key value store called Pebble to get off of RocksDB, but this is a good starting point for many people. Presto, we're gonna focus on today. WebScale SQL was a fork of MySQL to clean some things up as a combination of Facebook, I think Tencent, or one of the Chinese tech companies. That sort of fills it out and died. Gorilla, Burengi was a time series database. LogDevice was a login database, and then Vlux, obviously, we care about. So of these, the only one that's not open source is Scuba. There's an internal in-memory analytical system for how they do telemetry on, sorry, how to do analytics on internal telemetry that they collect from their own services. For whatever reason, this is not open source. Now, a bunch of these other ones are also dead. Like these are the ones that are open source, they're not actually, the open source project is basically dead. But RocksDB, Cassandra, and Presto in Hive are major systems. So again, you may not think of Facebook as a database company, they've done a lot of work in the space. And even today, they run still the largest MySQL deployment in the world. It's massive. Like the core Facebook thing that old people use, like you don't need the covers, that's all MySQL. All right, so today we're talking about Velox. This is actually a super hot topic. You might have seen this in the news, but just a week ago, IBM acquired a HANA, which was a spin-off out of Facebook that for PrestoDB, that was using Velox. So this space is very active now, and IBM just bought it and they're in the game too. So Facebook meta, obviously they collect a lot of data. There's the data they collect from old people using Facebook, and then there's the internal telemetry from all their services. Plus I think they own Instagram and a bunch of other services. So they have a lot of data. And so what they found in, as they discussed in the Velox paper, that over the years, they have found that a bunch of their internal teams have been building essentially redundant tools to analyze this data. There was like the Presto team building one thing, and there was another team building other stuff. And so this fracturing of their tool set of these internal database systems caused a lot of wasted time and effort being spent on re-implementing the same thing over and over again. They talked about how they did a survey of a bunch of the tools and they found 12 different implementations of substring that were all doing things slightly different because of how they handled nulls and exceptions, whether they started with zeros or one offsets, right? And so meta then basically said, okay, well, instead of having to have everyone optimize their own individual, these individual systems, let's go build a single framework that all these systems could then use so that the engineering effort could be spent making that one framework really work really, really well, and do a really good job on that and not worry about dividing the effort across multiple teams, right? So this is a, I would say this is not unique to meta. I think this occurs oftentimes in a bunch of different companies. And this is partly due because the way you get promotions at tech companies is sort of, it's based on your contribution to the company, but if you're doing back in infrastructure engineering, like it's not like you put out a new product and then you show revenue go up, right? So you show like, okay, if I lead a team that builds a system that ends up being used, right? That's how you get promoted. And then you move on to build the next thing and then you worry about whoever comes behind you to maintain it. So that's why a lot of these tech companies have these built a lot of their own internal systems. So it seems like crazy why do you do this when there's existing tooling? Obviously because the way the promotions are sort of set up is to, you get a benefit or you get advancement by building systems. But like when we discuss with Databricks and how they wanted to handle Spark SQL, that instead of trying to build an Uber system, like a complete database system that replace everything, because that would just sort of be infeasible, that you were better off building a smaller component, the engine itself, the execution engine, the query engine, and have that seamlessly integrate with what existing infrastructure already exists. And that'll have better, by doing that you have a better chance of providing the improvements that you want to have when through this optimized framework versus you're just saying, hey, here's my new database. Here's the next one that will make less. I'm gonna go use it because people don't want to give up what they're already using. So this is the backstory, this is the motivation for Velox. And that's the library of components you would need to build a high-performance single node query execution engine. So there's no notion of workers on different nodes. There's no really notion of shared disk storage, right? It's like really like a low-level API for how you take vectors in from some input source, do some kind of operations on them, or execute things on them, and then produce output. So it doesn't have a SQL parser. You have to give it already optimized physical plans. We'll talk a little bit about what they look like. There's no metadata catalog, right? There's no, you're not gonna be able to go, say like what tables do I need to go look at? Like in Snowflake, I wanna exit this table and figure out what micro-partition files you need to read. None of that exists. You have to provide this to Velox if you want to use it. Another important thing is that there's not, oops, sorry, sorry. There's not gonna be a call space query optimizer. Again, something else has to provide the optimized physical plan. So you give the physical plan of these DAG, you tell it what to execute, and it crunches on it, and then spits it out wherever you want it, or passes it to sort of the next stage in the code. Right? So again, this is similar to what we saw in Photon. Like Photon itself has no notion of the cluster of machines, has no notion of how to send data from one node to the next. The outer shell, the thing that was using Photon, was Spark itself. Same thing in Velox, we'll see this in Presto. Presto is gonna handle the query optimization, the SQL parsing and query optimization, the coordination of different nodes and sending data around, and then when it comes time to actually do a join or scan a table, you're gonna use the Velox runtime engine to do that efficiently. More efficiently than what it exists. So it's like, again, this is a toolkit that allows us to build a data processing platform like our own Spark, like our own Presto or Clickhouse or whatever you wanna use. Big query snowflake. So just going through the overview of what Velox provides, again the context of the things that we talked about this semester, there'll be, it's a push-based vectorized query processing. Again, they don't call that out explicitly, but when you read the paper and they discuss this notion of a driver and task and so forth, that's described in terms of being push-based rather than a pool-based with calling get-next in the volcano model. They're gonna be used pre-compiled primitives to do operations on expressions and within operators themselves, on columns. They can do some co-gen of expressions, the predicates, but that's in a experimental branch or experimental directory. It's not, you don't get it by default. For that one we'll discuss in a second, they're doing translation of the expression program into C++ and then they fork the compiler. They're gonna be arrow compatible. Again, we haven't talked about exact details of arrow, but again it's thinking like parquet, but for memory data. And the idea here is that you can send and retrieve data between Velox and some outer system using the arrow format without having to spend the time to do expensive encoding or decoding. Because there's no query optimizer, and there's no catalog, there's no statistics, you don't wanna assume that the query plan given it is optimal, so they're gonna support some runtime add activity when the operators start running. As far as I can tell, they don't do major reordering of the query plan, like they can't switch around join orders, but they'll do some of the batch level optimizations that we saw in Photon. And then for joins, as far as I can tell, they support both sort merge join and hash join, okay? So again, Velox is not a standalone system, it's a library that you use to build a new system or integrate to it an existing system. So these are the major components that they're gonna provide you. So first of that, they're gonna have its own notion of a type system. So there's obviously the basic things, like scalers, ints, floats, dates, and so forth. Then you can support complex types, like if you wanna have time zone, or time sense of the time zone, you can represent that using their API. I think if this is like a user defined type, you would use it in Postgres. You have nested types like JSON. I don't know whether how they're doing shredding or if they're splitting things up the way we saw in Dremel or in Snowflake, but they have a way to represent these things. And then the, as you're processing data through your operators, the each operator is gonna pass vectors to, from one operator to the next. And again, we saw this in the DuckDB talk last week where Mark was talking about there's different types of vectors you could pass on, like there's like the flat ones, there's the dictionary encoding ones, there's the constant ones. So since they're trying to be compatible with Arrow, Arrow only has dictionary encoding. So you would only have dictionary encoding vectors or flat buffers, flat vectors. And so they extend Arrow because the real data oftentimes, you wanna use these other encodings. So they support the Delta encoding that we talked about. And if you need to then have Velox talk to an external system or framework that expects Arrow, they do have ways to convert those vectors into the default dictionary encoding that Arrow provides. The expression engine we'll talk about in a second, but this is way to do vectorized expression evaluation using more or less what look like compiler optimization. So they'll compile the expression tree into a sort of flatten it down to like a program using their IR and then they do passes on it as if it was a compiler to do like constant folding and common sub-expression elimination. They have a function API that you could implement, the other built-in functions you would want in your database system. Again, think of this as like in Postgres, there's all these built-in functions that come with the system when it shipped to you, like to do substring for example. So the idea here is that there's an API that you could then hook in if you're using Velox to add the functionality that you want. So when Facebook had, when they added Velox to Presto, Presto had already a bunch of functions that didn't come get shipped with default Velox, they could re-implement those built-in functions using the API. They have a way to support either a vector processing or a row by row processing based on how the database system expects things to happen. And again, this is for the database system developers, not for the application developer. For operators, they have all the standard stuff we talked about this semester, scans, projections, filters, joins, right? There's nothing really fancy there. They're using again, push-based model for the vectorized query processing model. For the storage, they have connectors that allow you to talk to different storage systems, HDFS, S3, Google Cloud Storage, whatever you want. And then the adapters allow you to encoding or decoding or serialization, deserialization to the different formats like Parquet, Arrow, or the things we talked about this semester. Then lastly, there's the resource manager that's going to do the buffer pool management and thread pools. Right, again, so this would be everything right below the query optimizer down below to the storage layer. This is where Velox fits in. So I want to go through some of these in a bit more detail. So again, Velox is just the library. It doesn't own data, same way that Photon didn't own data. They don't have their own proprietary format the way Snowflake did. And so instead, again, they expose an API that you, as an assistant developer, can implement against to extend the functionality of the database system. So again, you can talk to different storage systems, S3, HDFS, you can do different formats, Parquet, Orc. They have to think of Dwarf, which is the paper they mentioned, is their sort of extension or extended version of Orc. Then I don't remember the paper mentioned this, Alpha. I don't think it does, but in a public talk, Facebook has mentioned Alpha. Alpha is going to be the new Parquet, new Orc. And the goal is to get this to be an open source format like Parquet and Orc as well. And so one of the things that Alpha does that Orc and Parquet don't do is the ability to extract metadata about columns without having to deserialize the entire catalog. So in Orc, if I want to know what is the column, and what is the type of this particular column, one column, I can't just go get the metadata just for that one column. I got to deserialize the entire metadata in the footer, then figure out the thing I'm looking for. So if I have a thousand column, but I only need information about one, in Orc you have to deserialize the entire thing. Or in the case of, and actually for both of these, they don't expose the dictionary to the upper parts of the system, like the engine itself. So if you want to start doing things like, just tell me whether this key actually even exists. You can't just look in the dictionary and figure out whether the key even exists for some column. You got to deserialize them, then do examination. The idea is that you can do direct lookups in the dictionary. So again, this is, it's not open source yet. We did talk to them last week about this. But this, and I don't know whether it's always gonna be called alpha, but this is gonna be the new version of Parquet that Facebook is hoping everyone's gonna use and Velox supports it already. Bunch of this we've already said before, again, the internal representation between operators is always gonna be vectors. And at a high level, it is based on Apache Arrow, but it's an extension of it to support additional compression schemes that Arrow doesn't need to support. And then the paper mentions though, we're talking to the Arrow people, hoping to get this merged. As far as I know, this is still sort of up in the air. So the optimization they're gonna do, there's the lazy vector materialization. The idea here is that I don't have to actually populate a vector until I actually need it when you're doing processing between operators. There's the hyper style, the German style string storage that we saw from Duck2B. I'll show you what that is in a second. And then the out of water writes a population. The idea there is that instead of having to write exactly the data and a vector in the way that it comes in, like for this offset right in here, I can actually write ahead of time into different locations. And obviously I have to make sure that the offsets match up. But this is sort of bending the rules of what you would expect to be done, which is okay, actually, because it's a relational database system, you don't, the ordering action doesn't matter. So this string storage actually is pretty clever. As my PhD student Matt pointed out on Slack, we actually used to do the same thing in noise page. And then Duck2B mentioned they did this as well. The basic idea here is that when I have a string, instead of just storing the size of the string and then the actual string itself or a portrait of the string, what they're gonna do is that you can store the size of the string and then a preview or prefix of the string and then if the string is small enough in the string itself. So here I have the size, that's me four bytes, so 32 bits. Then I have a four byte prefix, so four characters, so it's just ASCII. And then in this case here, because the string is less than 16 bytes, I'll store the full string there. So now when I do a lookup when I'm scanning along the column, I would obviously know the size, but then I would have the prefix and then in some cases I'd have the full string there. And I don't have to de-reference it to go get it because it's right there. So instead of storing the string as like a four bytes for the size and then 16 bytes for the pointer or eight bytes for the pointer, I'll spend a little more time and it's for our space. And I think this should be eight bytes, right? Not 16. That should be it. I was looking for the one between the two. You already have like the P here. What's that? It's the only half of it. Oh, you don't even have the rest of it? Okay. Why four bytes for this? This should be five for the size. Oh, yeah, should be five. Yes, sorry. Let's just fix this now, sorry. Sorry. The size is five and I get rid of Andy. And it should be eight bytes. Sorry, sorry, sorry. Make sure the recording still works. I think we're good, okay. All right, so it's 16 bytes, four bytes for the size, size is five, thank you. Then you have the prefix, it's four bytes, four characters, because it still means ASCII, then you have the remaining string, right? If you have something longer, like this, you'd have the size 17, you still have the prefix, but now inside of the eight byte payload here, you actually have a pointer to the full string. And what's cool about this is that there's some functions where you can, well, the idea is like, if I'm doing a scan, like I'm trying to find where a string equals, or starts with when. If I just check the prefix, I know I don't have a match, I don't have to de-reference the whole thing or check anything else. So that's beautiful. And then it's, so it avoids me having to follow this thing. And then in some cases also too, for some string functions, I may not even have to go and check the rest of it as well, right? Like if I need the substring, and I only need the first four characters of the string, even though the string might be super, every string might be super long, I can produce the result just by looking at the prefix. Right? Again, arrow doesn't do this. This is something that would not be compatible with arrow, but again, for their environment, in most cases, this is a huge win. Again, DuckDB does this as well. We used to do this in our system. The hybrid guys do a really good job of like using extra bits or extra bytes to pack additional information in to avoid having to do a de-reference or follow a pointer. We saw a similar trick in their chained hash table where they would have 64-bit pointers, but they would use the top 16 bits to store a bloom filter, and they'd use 48 bits for the address that Intel x86 wants. So they always do a good job, they always do nice little tricks like this where they store some little stuff ahead of time to avoid having to follow through pointers. All right, for the expression engine, again, they, as far as I can tell, my understanding of what they're doing is that they take the expression tree, and they're gonna flat it down into this intermediate representation of objects in C++ that are essentially gonna be like an array of pointers. And so now to execute the expression, you're just following along the function pointers to evaluate whatever was in the expression, right? Now, from what I can tell us to look at the code, the primitives that they're, for these function pointers are pointing to primitives, but they're not templative. So they don't have a, these, again, from my understanding of the source code, they don't have like, here's the n32 version of inequality, here's the n64, they have these generic primitives that at runtime they're figuring out which one you actually need or not. And again, I might be wrong about this, but they have another version of the code, they have another version of the expression engine that's in a separate branch, well, they'll convert the flattened IR that they generate up here into actual C++ code, and then they'll do a fork exec to compile the C++ code into machine code, and then invoke that and set it to evaluate the expression. They link it in. And then we saw this technique with Haiku when we talked about query compilation. Redshift does this as well. The first version of MC will do something very similar, but of course they were doing, they're doing full query compilation, these guys are only doing this for expressions. And the paper talks about how like, the only one to do this for queries that you know are gonna be running all the time and gonna take a long time, like in this case here, the compiler, the fork exec might take like 10 seconds for the compiler to finish. Contrast this with like the hyper approach, they were using the LLVM, which is much, well, there were a bunch of different tricks, but they would use the LLVM to do that compilation in memory without having to do the fork exec. And then produce the library that shared objects that you didn't link into the program itself. All right, so for query add activity, they have the major three checks that they have are predicate reordering. I'm missing a citation, but there's a predicate reordering as a common technique that's often used to figure out the trade off between how long does it take to evaluate a predicate versus how selective it is. And so there's this balance, something could be super selective, meaning like filter out most of the tuples, but if it takes a long time to compute, then maybe you don't want that to be the first thing you're checking. And there was paper, it worked out in the 1990s where you'd actually can move predicates up and down the entire query plan. They're just reordering within the actual filter operators itself. So they have a simple formula that says to try to evaluate the selectivity cost versus the time cost, and they do this at runtime. They do column prefetching where if you know you're gonna be reading you're processing one column and you know you're gonna need to read another one, they can do asynchronous IO, start fetching in the next column you're gonna need while you're still processing the first one. And that's a standard technique that is why we used, I wouldn't necessarily call that an activity other than if it's tied to the predicate reordering, if you know that you're gonna be evaluating a column with some predicate, the first predicate evaluates this column but then the next column is gonna, the next predicate evaluates a different column, you go ahead and prefetch that second column while you're processing the first one. And the last one is trying to allied ASCII checks because those have higher costs than, those are cheaper than if you assume everything is UTF-8. This is the same batch level adaptivity we saw in the case of infotain, where they were trying to recognize as you're reading the data, oh, this is actually all ASCII data, so let me not do the more expensive UTF version of it, let me use the ASCII version because you assume everything's always one byte characters. But another trick they also do is that they know that based on whatever the operator is or whatever the expression is trying to do, if you don't need to, if you're allowed to overwrite the data that's in the vector as you're processing it, then you don't have to allocate new memory. So they call this the buffer reuse. So for example, if I know that I'm running a string function that's going to take a substring of a four character string and to convert it to a three character string, then instead of allocating a whole another buffer to put the output, then I'll store the result of my rewrite back into the buffer where I came from and I pass that along up in my query plan. So I avoid the additional memory pressure that the mallet calls to allocate that memory. So these are two graphs on the paper here. So this is showing the performance between the ASCII versions of the standard string function versus the UTF version, right? And the difference is quite significant. This is why you wanna do this sort of activity. And most data, whether people even think about it or not, most data is gonna be ASCII. They think of all the log files that these systems are generating. If people aren't putting weird emojis in there, all right? It's gonna almost always be ASCII. As you have, I have one story. There was a Davis company, I can't say who it is. They had their whole system crashed because someone stored like the Poop Emoji and it percolated far enough in the system where they finally hit one component of the storage layer that didn't know how to handle emojis. And it took the whole thing down. They had to go in manually, patch the Poop Emoji with something else to make it all work. But you should write your systems to handle UTF-8, but most of the time it's gonna be ASCII. And if you can avoid using the more expensive versions of this, these checks, it's a big difference. And then here, again, they're measuring the runtime per row in nanoseconds, again, avoiding ASCII is a big win, but you still get about 50% improvement also too on top of that if you can reuse the buffers. Because again, it's not having to call malloc, which is always a big deal. All right, so I wanna talk a little bit about how they're using mallocs. And I know there was an example, they talked about, there was a project called Spark CPP, which is basically like the photon stuff we saw with Databricks, where they're replacing the Java-based runtime and the Scala-based runtime of Spark with Velox. So we've already started to cover that. I wanna talk about Pertissimo a bit. So again, PrestoDB is the, think of this as like Facebook's version of Dremel that we talked about, where it was like this query engine that I could run on top of disaggregated storage. So the full story of Presto is Facebook started Presto, and it was called, it used to be called just Presto, and then there was a fork, and there was PrestoDB, and then Presto SQL. I think the reason why there was a fork, because a lot of people were working on Presto, but then Facebook was getting kinda weird with the ownership of it, even though it was supposed to be open source. And so the guys at Starburst, which is a spin-off of Astrodata, or Teradata, Astrodata by Teradata, the spin-off of Teradata, they changed the name of Presto into Trino. Trino is the fork of Presto, and then, sorry, they had a fork, that was called Presto SQL. Facebook then made PrestoDB, so you had PrestoDB and PrestoSQL, and obviously that's confusing, so then the PrestoSQL guys renamed theirs to Trino. But at the end of the day, it is just Presto that has its prominence from the Facebook project. And then since then, Facebook converted the stewardship of the PrestoDB project to the Linux Cloud Foundation. I think it's like another version of the Apache Foundation. It's not my problem, let them deal with it. Anyway, so PrestoDB is the one that's still tied to Facebook. Trino is the one that's separate now. So again, just like in Databricks, they went through and they replaced the core runtime engine of the kernels of the operators and the expressions, all the stuff we just talked about with these Velox, and invoking it through JNI in the same way that Databricks did in the photo time. And so the paper talks about how there's the core functionality that Velox provides, like the built-in things that it supports, like reading running HGFS, and encoding and decoding for Parquet and Ork. But to make it be completely compatible with PrestoDB, they had to go use the API that Velox exposes to you to then all the built-in functions to that PrestoDB expects. And I don't know the status of Prestissimo. I don't know whether it's actually open source yet, but I know there's being used by Intel, being used by ByteDance, the TikTok that people, like this is actually a real thing. So you say, okay, well, if PrestoDB is using Velox, and so it was also to the paper reports like a two-to-four-x performance improvement of PrestoDB using Velox versus the Java stuff. And in public talks, they make the same arguments that the Databricks guys did where to really eke out, if you want to stick with Java, you really got to have people that really know the internals of the JVM and can hack on it and make it basically abuse the JVM to get like bare metal performance you would get with C++. And that you're better off just writing the engine in C++ and using J and I to hook into the overall system. It's the same argument Databricks made before. Trino guys disagree with this approach. And this is from, I think, a podcast they did last year where they talk about, you know, would Trino make a vectorized C++ engine to integrate it with the same way that Velox did or Databricks did. And then they said, they say, the direction of Trino while still including bare metal performance improvements of the JVM was to instead focus on not wasting time with suboptimal query plans before or during runtime. So they're saying they're going to stick with Java, but then rather than trying to spend the effort to rewrite the core engine with C++ like something like Velox or Photon that you're better off making the query optimizer better and having better runtime add activity to avoid having crappy query plans. So who's right? All right, so you have the Velox, people say Java sucks. Let's rewrite anything C++ or Rust. Then you have the Trino guys say, no, no, no, no. We're okay with using JVM. Let's make the query optimizer better. He says both. He says JVM is not an optimizer, like what, sorry? Oh, you use the JVM as an optimizer as a service? Yeah. What do you mean by that? For the query optimizer? Yeah. Well, or it goes to C++. You mean calcite? Yeah. Okay. All right, so he says both. I'm not sure what you're saying. He said both. All right, so raise your hand if you think Velox is right. Abby? Raise your hand if you think Trino is right. Raise your hand if you think these two guys are right and then they're just both. They're just both. They're both right, right? But the question is like, what's easier? Yes, that's the irony of this. It seems like a major undertaking, but are you writing at C++? It's probably the easier task from engineering. He doesn't know what to make. Yeah, because like, yeah, so her statement is, well, it's basically a, it's an unending quest to make your optimizer good and because it's super, super hard. Whereas this one you kind of, there's a path forward which you know you know what you need to do. So they're both right. Which one is easier? I think, again, this is easier. Let's see how far they get with this, right? But he's absolutely right. Like, if you have a crappy query plan, it's like, you can have the fastest C++ engine in the world, you know, if you get a crappy join order, who cares, right? Because you're still gonna be in, you know, in that exponential time. So again, this is a good example. Again, as hopefully, the main takeaway you get from this class is like, you have two different opposing views and you can sort of understand the trade-offs between one versus the other. Now there's other sort of aspects of this discussion that like, we're not privy to, like, maybe like, this company only has a bunch of job developers anyway. So if you don't have any of those C++, you gotta go hire them, right? And that costs time and money. But like, maybe you have, I don't know, like, who's best, maybe you have your own German, right? That could do query optimizer, better query optimizer. So you put them on that and that's better, better, they're better off doing that than you were writing some of those engines. So again, there's other things, other engineering decisions you have to make beyond just like, what is easier and how much time it takes, right? So again, I like this, I can read some of this because there's clear two different directions you could go and the, well the HANA guys got bought by IBM, so we'll see how that works out for them. But the Triner guys, they've raised a ton of money too, right? So they couldn't, like, you know, they haven't raised as much money and I don't think they're as valuation as high as Databricks, Databricks, but they have a lot of money, they could pursue anyone and they're deciding to go with the query optimizer. Okay. So I wanna talk about now some additional projects that may have gotten mentioned throughout the semester and I don't think I spent time on this, but I would mention it now, so you can expose this because this is the future of where these systems are going. And I'll talk about then two other projects that look and smell a lot like Velox that are considered viable alternatives. So the first is this project called Substrate. And as far as I know, this is like one dude that's sort of like spearheading this. He's the creator of Apache Drill, he ended Apache Arrow, he was the co-founder of Dremio. So I don't know if he's independently wealthy, but this is his hobby right now. This is what he's working on. And so the idea with Substrate is that in the same way that the sort of the database community, at least in industry for these data-like systems, we sort of galvanized around these open-source formats like Orc and Parquet and potentially Alpha in the future, in the same way that there's an open-source specification for how to define what the data looks like, there should be an equivalent specification for what query plans look like. So we haven't really talked like, there are this notion of what we call federated databases where it's like a middleware system where you write one query to it, and then it can talk to a bunch of different databases, then you take your query and convert it to another query that like, or like a MongoDB query, or like a Postgres query, and it can take the results and put them, combine them together and produce a single answer. So it's a way to sort of integrate different databases in a way that they weren't really meant to before. And you can sort of say the connector API that Presto and Trino support, you can write Presto queries that then connect to Postgres and retrieve data from Postgres and pull it up. But what is Presto actually doing is converting the, whatever query plan that Presto generates on the inside, converts that to SQL that Postgres can understand and produces the results that way. So instead of having to do that conversion from one SQL dialect or one API dialect to another one, we'd substrate the ideas that you could have a single physical plan in a standard format be used in any possible different database system as long as they support a substrate API. And so it could either be the entire query plan itself, or if you know that you need to retrieve data from Postgres and Mongo, you could take the substrate plan that you have and then pull out portions of it that you then send down to Postgres and Mongo that do like basically predicate push down or join filtering push down, whatever you wanna do and then combine the results. So as far as I know, I don't think any system is making full use of this just yet. I think it's still in the works. I think there's prototypes that are out there. But to me, this is the right way to go. This is the clear direction. This is sort of what's coming next. In the same way again, Parquet and Orc, you would have a system that supports substrates. They would allow you to integrate into a sort of big data ecosystem in a way that existing systems don't do right now. Like Postgres is an island on itself. It doesn't know about other database systems. I mean, farm data wrappers can make that sort of work, but it doesn't natively talk to other database systems. Whereas something like Substrate could make this happen. So I think you'll see a lot more of this in the future. All right, so two projects that look like Velox, considered alternatives, are Data Fusion and Polars. So what is Data Fusion? It's almost exactly like Velox. It's an extensible vectorized execution engine to build a full-fledged database system that's meant to operate on Apache Arrow data. And to make she happy, it's written in Rust. So Data Fusion provides more of a complete end-to-end system than Velox. Like Velox didn't have a SQL parser, didn't have a query optimizer, doesn't have a catalog. In Data Fusion, it does have a SQL interface. It does have a query optimizer, but that's not cost-based. It's just doing here's-to-base rules. They also expose a data frame API. If you want to operate it using a pandas syntax. But again, this could be the core engine that you could build a larger system around. So the probably most famous example of using Data Fusion as this core engine is InfluxDB, they announced a new engine written in Rust two or three years ago called Iox. They used Data Fusion on the inside. I think it was originally written in Go. I think this is their third version of the system they built. Long story short, they started off using SQL. Then they abandoned SQL. I told them it was a bad idea. I'm not saying they changed it because of me. And then they came back to SQL. They did use Mmap. And I told them that was a bad idea. And they got rid of that too. I'll take credit for that one, not using SQL. And then SericeDB and CNOS. I think they're out of China. And this one is like that engine as well. I think these two might be companies. This one might be a hobby project. And obviously InfluxDB is a big company. So again, if you want to have the same idea as Velox, they have an API you can implement extended. But they also support additional things from Velox that doesn't have out of the box. All right, the next one is Polars. Now, you may be thinking, OK, wait a minute. Isn't this the exact same slide? And there's a yes, because as far as I can tell, it's not the same code, but it provides all the basic same features as data fusions. It's a written in Rust. It's a stencil vectorized execution library for a patch aero data. They have a SQL interface. They have four data frames and a query algorithm. Now, I want to say one difference is that in the case of data fusion, they're sort of pushing the SQL route more in Polars than pushing data frames more. They have a command line interface. It's not compatible with pandas, but it's an alternative to pandas. And DuckDB is awesome. DuckDB talks to all these things. You can read and write data from Polars and data fusion. Just fine, because it's all patch aero. I can't find any system that actually supports this. There's a company called Polar Singles. They had signals. They had a database called ArcticDB, but then that collided with something else. They had to rename it to FrostDB, but they're not using Polars, but their name in the company is Polars. It's not good naming, but that's their problem, not mine. Anyway, so there's another alternative to data fusion, another alternative to Valux, but the same idea applies. It's an engine. It's a library that you do to extend to build a larger system. And so the reason why I'm bringing all this up is that even though we just spent the entire semester talking about, hey, how do you build a high-performance OLAP engine? I think for this decade, we've reached the point where I don't think it's advisable, other than if you want to build it for a hobby project or a one-off thing. Certainly, if you're going to do a startup, it doesn't make sense to build an OLAP system from scratch anymore. It doesn't even make sense to forklickhouse or DuckDB. Something like Valux provides all the functionality you need to have a modern OLAP system that we've talked about this entire semester. And you should use that as a starting point rather than building everything from scratch. Something like Valux also, too, with ByteDance behind it, with Intel, and other people, if there's a lot of people working on the open-source version of it and you just build your larger system around it, then as they improve the runtime kernels in the engine itself, you get all that benefit, assuming it's still compatible. So it's sort of like when Snowflake talked about how... Dremel talked about how they decided not to build their own storage layer, they just used what the Colossus framework, the file system that Google had on the inside. So as Colossus got better, the overarching system got better without making any changes. So in the same way, I think if you build your system around something like Valux, as Valux or whatever improves, your system gets better and you're spending all your engineering time on the upper parts surrounding it. So the query engines are getting commoditized. And also, too, I'm not suggesting that Snowflake go back and throw away all their code and switch it to Valux. That'd be insane. You wouldn't do that. But I think the ship has sailed whether you could build a viable system, at least for now, that was writing things from scratch to... Plus doing all the other stuff that actually you need to have, like build the query engine, for example. So it's not to say also, too, that if you take this class, does that mean everything I've told you is obsolete? No, because you can go get jobs and work on the systems that do have money like Snowflake and Redshift and Databricks to build these things. It's just, again, if you're trying to build a new startup, I don't think you would want to write it from scratch. And I think the engineering effort to fork something like Click-Click-Out would be a major undertaking, and it's not worth it. And then you're better off using something like Valux. And I have no connection to Facebook. It's not like they're paying me anything like that. This is my opinion. So if vectorized execution engines become commoditized, then what actually makes one system better than another? This goes back to discussion of Trino versus PrestoDB that the UI, UX, like the user experience of how people use your database system, like the interface, what it's compatible with, like can you interface it with Python or Zobo? And how good your query plans are, those are the things that are actually going to matter in the next decade. Not how fast you can do, you know, push-back vectorized query execution because everyone's doing that now. So I think, again, this is the big battle going forward. And that's why I think the Trino guys might be onto something. But, you know, it's a trade-off between, and they certainly have their own telemetry to say, you know, how often do you generate, you know, how often are queries, how far away they are from the optimal query plan when they actually run, and is that always going to be 4x better than what Facebook sees by switching over to Velox, right? Again, you know, the real database people, we can measure this, we can make a decision based on that. I'm sure I think, like, what if you need an index? It's not something Velox can do. So the question is, what if you need an index? It's not something Velox can do. So to add the index or decide what index you need? I think, first you'll need to add an index, and then you'll need to implement some, like, add the queries using that index. Right, so the question is, assuming something has figured out what index you need, assuming that that exists, then how do you integrate that into something like Velox? Well, it's just another, you know, operator, they do an index scan instead of a file scan. Then you'll need to fork Velox. No, I don't think you need to fork Velox. You'd have your fork, yes, but you're adding additional functionality. It's not a hard fork. You're just interlending it against their API. That's my favorite all-time job. Now, here it comes, Duke. I play the game where there's no roots, homies on the cusley, I'm a fookus, I drink brook. Put the bus a cap on the eyes, bro. Bushwick on the go with a flow to the eyes. Here I come. Willie D, that's me from G, and St. I. But yo, I drink it by the 12 hours. They say Bill makes you fat. But St. I's is straight, so it really don't matter.