 Welcome to another quarantine Davis talk today. We have CK tan who's the co-founder of the tease data CK has been involved in databases for a long time. He's worked on some very very You know influential and seminal systems in addition to green plum When he was a master student at the University of Wisconsin Madison working with David Dwayne. He also worked on shore The sort of early old to be storage manager It sort of predates a lot of existing systems around today, but had a lot of modern Modern database technology in an academic project, which was a big deal. He also worked on Exodus again under under wit So the tease data has been around for several years now And I first became aware of the tease when they gave a talk to PG company is 2014 that sort of laid out the column store the LVM stuff that they were doing on top of Postgres and To me this is really you know hit a hit a sweet spot for me because this is all the things I wouldn't do in the system. We were building so I'm You know, it sucks that it takes it takes a quarantine a pandemic to finally get CK to come give a talk at CMU But so we're super happy that that he's here So then we'll do the same rules that we normally do if you have a question just unmute yourself Say who you are Where are you coming from and then ask questions? So just interrupt him it as we go along. Okay? All right, CK go for it. Thanks Andy for those kind words So, yeah, so I'm the founder of BTS data. We started in 2014. It's another co-founder Feng Tian we were both early engineers at Grim Plum We built up that product and then and then we left and in 2014 We read this really influential people by paper by Neumann and we essentially redid the whole thing in Postgres and You know Q1 was like eight times faster. So we felt we had to start a company and and that's what that's what we did So eventually At the late at later time Grim Plum when pop went open source so we moved the whole source code to Grim Plum and Started this deep green DB product Commercially we can talk about that, you know, if you guys are interested later, but let's focus on the technical part here Okay, so Much time We'll go to like 530 ish like you want to put a present mode or you just go like this. Let's go to present Okay All right, here we go. Yeah, so here's the talk outline We'll talk about what is deep green DB and GP DB and then and then we'll talk about, you know the performance characteristics of deep green DB We're going to some source code and show you guys how ugly it can look And you know, maybe it maybe it's it's easy enough to read And then there's some experiences that we picked up after running the company that I think may be Very different from what students things, you know, especially in terms of transaction stuff But we're good get into that later. Okay, so let's go So deep green DB is Built on top of Green Plum open source Green Plum database We want to from the schedule we want to maintain Compatibility at the bite level this because we think this is how you get customers, right? it is hundred percent compatible at the bite level and Even with the binaries you don't even need to change your program So what you do if you are a customer you could take our binary shut down Green Plum Swap in deep green DB and start it up and all your program should still work and you know, you would just notice the Speed up of some queries. Okay, so you don't have to move the data You don't have to do any code changes to your SQL programs P SQL still connects your ODBC JDBC, whatever still function the same Okay So we address if inefficiencies in Green Plum in in in their computation in the network layer and some connectivity stuff So compute, you know, essentially LLVM Code Gen in LLVM addresses that network is it's kind of a You know for for database engineers Engineers network is is very different We'll talk about that too and how Green Plum implemented and how it was not too good. Do you mean by never to mean between nodes not necessarily between the client? Yeah, between notes it the interconnect the interconnect. Okay. Yeah, so you need it, you know very fast and and very optimized for for big box sending of data And then in terms of connectivity to S3 to all these other places that people really really need now You know to Kafka to S3 But you don't hear about how to anymore, which is a good thing And I'll do a quick demo now and just to show you guys how it does See it would be this guy Okay, let me see so So you just you know, this is just PSQL connect to a database and then it just looks like this Let's see. So we do that show you the line item table, right? And it's just like normal Postgres We can do a count star just to see the speed up right come star from line Okay Do it again, it's about 700 second so how many seconds seven seconds to this is a by the way This is a 20 gig TPCH line line item So if you want to see the green pump speed, you just You run it same query so that's that's a difference about five times, right and you can see so this line item table is a column line item table with compression on LZ for and We can same query plan same optimizer. All that everything the same. Yeah, everything This is and then when you use the physical plan it like it branches between the teas and yeah, let's talk about that later Yeah, so Let's see. So we do Q1 I made a view out of it. So this is essentially TPCH Q1 I just do a select star from Q1. This is without BTS Without the co-gen. This is a very boring demo Time for coffee Doesn't have to it's about a few seconds more. There you go. 25 seconds set the test People this is about five seconds. Yeah So, so that's the difference about five times on Q1 Okay, I think we should be okay. No, let's go back to this guy We can come back to the demo if we have time later and do more So this is the architecture of green pump and and dig green So what you have is some Distributed MPP with shed nothing architecture. We have the master hose here and then you know, all these are the slaves or and This each box here is a machine, right? So on each on the master machine, you can you have one primary postgres instance and then on the segment hose you can have half primary segments and half mirror segments and So this this primary segment may mirror This secondary segment. Can you see the arrow? It's a bit small. Yeah. Yes. Yeah. So, you know, like like this Striping backups pairing So so so the master gets the SQL and and make up the plan and then it distribute the plan to all the holes to to work Okay, and the standby master standby hose is just there in case the master is down And so then the piece equal then connect to the standby via some DNS magic or you know, depending on on on the customer and how they do it it may be maybe Manual maybe by DNS Okay, so that's So this is a in each segment hose we put in additional components. So this is a blow-up picture of the segment hose So in addition to postgres instances For primary and mirror segments. We have something called x drive. It's a component that we put in for connectivity So these primary segments will connect to x drive, which will then connect to you know, S3 S3 pool or all these FPGA cards Okay, and the thing to take away is it is stateless. So, you know, there's no concept of transaction But you can connect to it and get data So this might be getting to the weeds of postgres, but like are these foreign data wrappers or what does this look like? We have foreign data wrappers Adapters that would connect to it. It's just a network connection, but the network connection here is That the protocol is is is how they put it is it's it's built for transporting data Right, so it's not just it's not built for very chatty stuff. It's for building to send big chunk of stuff and The the stuff that goes between the primary segments and the x drive. They're all column columns you know factors of columns okay, okay, and you know the S3 pool will is Essentially a local disk cache. So if you give if we can avoid going to hit AWS if we find it in the local cache I think you can find More talk about this in in snowflake and they do something like this, too Right, so going to S3 all the time is very expensive But if you have a local cache it's faster and and it's also less expensive So when you say stateless again, so like it's like the query shows up If you're touching the foreign data wrapper table, then the query plan gets shoved down to the segments. Yeah, and then Yeah, so here in the segment It would just be a connection to x drive and say, you know, give me this file give me the list of files and then we will hash them and Select those that map to this primary this segment and then we just do a scan on on those files But what if like if it's if it's just hey, give me this file What is the FPGA actually doing that right because it's if you wanted to say Do this predicate, right? No, this is a general x drive is a general Interface so there are it it supports all this other external data. So, you know, you could do what what it so the What it does is allows you to mount a logical disk Okay, so one of the disk may be your S3 Pull the other disk may be your local files Right. It's a mount. So you you in the config file you specify, you know Amount by this name. Yeah, it's connecting to S3 or it's connecting to Kafka or it's connecting to S3 pull and you know, you can write You can write tokens that that would get called based on those mount points But are you doing predicate pushdown when you would want to use the FPGA? Yeah, so there's something called Fung wrote something that would that would have a Simple executor like syntax, okay that you can push push things down into the FPGA You know, essentially we're just scanning files here. So, you know, it's filters and it can do some x if you if you have You know bigger FPGA cards So a little bit more about interconnect. So interconnect is really really crucial You know in a small setup, you probably won't see it but When you when you scale big then it becomes a bigger and bigger problem every day So remember postgres is a it's not threaded, right? So Each process needs its own connections tcp connections everywhere Right. So if you have something like this, you know 10 machines with eight segments per machine and you know Your query has three three joins and each join has distributions Under them motions or exchanges under them Then, you know, you need to make a lot of connections per postgres executable Right. So each of them may be connecting to if you if you just multiply it It's like 2000 connections per machine per query. If you have 10 queries, you know, it's 20 000 connections per machine per query and eventually that becomes TCP would not be able to handle And so eventually green palm switched to tcp But then, you know, we are Database engineers, right? So we don't really write very good network codes The implementation in in green thumb has a network window of one Right, and then it times out and then it resets If it doesn't get the act within one second and when you have a lot of Some not even under not even a lot if when you have some packet loss, you could have Huge delay in your query And that happens just when the system is busiest, right? And that's it's not very very good Um, and also in in green plums interconnect the data flows only one way There's no, uh, the protocol doesn't allow for you to send any kind of hints Uh, we'll see it's it's quite a big big improvement here from the receiver to the sender So, you know, in when you're building a hash table Once you have the left side when once you have built up the hash table You can essentially compute a a bloom filter and send it to all the scans and say, you know Don't send me these tuples that will never join, right? And that that has a huge effect on some of the queries Just because you avoid a lot of the network so So you're saying here with That like with this is like the way green plum currently works. Yep That you can't do these things and then the next slide because this is how you made it better Yeah, we did we that's all this stuff right And so we have something called deep mesh uh an interconnect That we that uh, Xiao Yan who is an ex Cisco engineers built for us So here we have tcp connections between hosts. So between hosts, there's there's just tcp connection So in the case Previous case you have 10 machines. You have, you know, 100 connections in total between the between the machines, right and inside the hosts everything is unix domain And and then we built we also so that's, you know, that's really pretty good in in reducing all these collisions and And packet loss tcp does does a does a good job And then between deep mesh the agent and the segments we have unix domain, which is pretty fast too. So that's that's how we address it Okay Okay, so here's a performance numbers on tpch um We just did this on some machines on on in alibaba Here is a speedup compared to open source green thumb 12 billion rows line type line item table We have, you know one one primary one one mass primary machine and four segment hosts and each host runs 12 12 segments I mean meaning 12 postgres instances right And it's stored in in column store with lz4 compression And runs on deep mesh and the this this speed is six gigabyte per second And you know and network is it's a bit slower there compared to aws um So on average we are about three times faster and you can see from this table here Um, you know q1 just the one that we did just now is five times because you know That's the very little network traffic there So we can we really see the speed up compared to just you know what I did on my local disc here there um So I want to highlight q1 q5 and q17 in in this talk um I think they they show most of the Improvements that that we see green plum by default. Is it a comms for a row store? It supports everything. Okay. Yeah So this is vetease column store compared to green plum column store No, the only thing we add there is lz4 compression. Got it. Okay. Okay. Yeah They don't have that any questions on the performance number. So nothing that jumps up I It's not it's not not surprising right like Yeah, some of those the small improvements some of those small improvements Are on queries that runs really fast So if the query run really fast then your compilation price is um, it's a much bigger percentage of that Of the whole speed up. So so you see one point something when when the query is really really really fast Is this using the like the orca, um Optimizer, okay I mean let's talk about that but like in general. Do you see customers using orca or is it always is using? You know, okay Yeah So that's okay. So how do we code gen right? so um Yeah, so we wanted to maintain 100 compatibility. So if you build a database from the grounds up There's no way you can build everything in day one so what we wanted to do is we would Use the regular green plum or postgres code as backup So what happens is if you submit a query Uh, it goes to the postgres SQL processor. It generates the ast tree It it will you know optimize it And then it gives you the plan tree right and you feed the plan tree to vitas jit And then we determine if we want to jit it or not or we if we can jit it or not, right if there are some Our constructs that we don't support then or you know, if the query Runs really fast and you know jitting would not be economical then we just kick it to the postgres site It it will just run the normal path And if we if we determine that we want to jit it and we can jit it then we you know run our executable Which is essentially just you know a call to a function now Since you have already jit the the new function What does your jit look like are you are you generating the ir directly or you have like an intermediate dsl? ir direct Okay. Yeah, so the code the code is coming Okay It's coming Yeah, so so so during this step in the middle here the vitas jit step This is what we do, right? So we have some quick heuristics to determine if If you know, it's just an index scan on a primary key on on a filter Then we just kick back to the to the old to the regular Executor because code gen could take 50 to hundreds of millisecond If you run a single single row select on a p3 primary index It's going to be less than You know one millisecond so it doesn't Yeah, jitting will will absolutely kill it right And then we convert this the postgres plan tree into the vitas plan tree in c++ Okay, so, you know if there are any plan notes Yeah, that we don't yet. We haven't yet coded then we just back out then give it to postgres again Okay, so so some plan notes like I don't know some more exotic ones that we don't we don't have it yet We would just give it to back to postgres Um, so then we call, you know, uh the code gen on the vitas plan tree and essentially the design is is what was outlined in the new One paper you have producer and you have consumer and they they generates and push the tuple up, right and Yeah, the scan node so on so if you if you know in postgres The tuples are all serialized and the code to To extract attributes out of the tuple It's a general purpose code and if you profile if you profile profile the Postgres code you'll see it shows up Very often very high in in in in the in the cost tree Um, in our case the scan node will actually generate a customized record serializer for those tables you know if you are And you know because we know exactly Uh, which attribute Is at which position There's no more decision the decisions were all made during code gen and at runtime you would just be Um going directly to those bytes and retrieve them and store them in in registers Um, I have a note here that says except really white table, right really white table and It's it's problematic because when we load into lrvm we load into lrvm registers And if you have a thousand attributes a thousand integers, I'm going to need a thousand registers Right a thousand you don't find a thousand registers in in Anywhere in the world, right? So so those get Actually gets backed into storage and then you'll see that performance degrade And there is some again some heuristics that we determine At which point that we don't generate the custom serializer and we back out to use the The postgres serializer the serializer Okay, and then when we generate the code There are nested plans are also also supported so Essentially there is a new when there's a new context that's created for each level and If you have a nested plan you just create a new context and then you ask that context to To jet code gen and things will be separate So one thing to note about c and c++ here is we ran into some trouble at first Just because Some programmers prefer to to write c++ code, but postgres is all in c so in in postgres Especially during error handling They have a when they do set jump and long jump, right? So if you if you call into c and they do a set jump before they do all this work And then you call into c++ And if you encounter an error if you call the postgres eloc Which is an error reporting Function it will do a long jump back To the to the point where it does the the set jump If you so in between those when it does a long jump back you lost all the Stack on c++ right so c++ would not be able to roll back and free up memory So you would have memory leaks all over the place Uh, and you don't see it unless you run into trouble So it's very important to to keep the two things separate So the for uh, we're actually hitting We actually hit the exception problems with lvm yesterday or today i mean fix it all all the arcades and people's thoughts so I feared your pain The the nested query plan part like this would be like a sub plan In the query plan tree you basically say you just say all right, it's a whole new plan You don't try to inline anything you just compile it separately. Yeah separately. It's a function call Okay Yeah, c++ is getting scary I Actually, nothing matters. Like are you I mean you started writing this in 2014. So it's presumably c++ 11 Yeah, okay Yeah, um, but we stick to very minimal c++. We started writing c++ in short right before it was just a Templatizer now Now it's it's it's pretty scary. I you know, I I got scared looking at those codes Um, yeah, so here's some some code gen stuff. So, you know in in in uh, Um postgres you see all this uh in for plus and this is Yeah inflow in you know plus minus multiply division right of abstract data types here Here's our code to generate them and this these are all generated So this is uh, you see the 32 and 32 32 and 16 So this is in four two plus is adding four byte integer with two bytes integers Right and all calls this function to generate the LLBM code And then this is this is that worker function um So the first line, you know, it just normalized the arguments and then we get the type um And then depending on the bit width we do we limit the bit width to 16 32 and 64 right, and then we ignore the Ignore the division and mod here first. Let's look at the add additions, right? So additions we just call jit as at overflow right the two with the two parameters And then we retrieve the value and the overflow bit And we generate code to to check the overflow bit and also to check if the error is not is now if the If the return value is now So we always do the addition no matter whether it's now or not Okay, since it's fast enough An additional check in front of it doesn't really really help. I mean you check it all the time So so that's how the code looks like How do you how do you do bog this? um You don't you look at the results All right, so like if there's a mistake and then you you know, you segfault inside inside of the jit code There's no stack trace. Oh inside the jit code. Yeah um Well, eventually you get good at it. It doesn't doesn't happen. So you just reading assembly Yeah, okay Takes a while but but yeah, I I mean we We had this in our system before we abandoned it because You know, like I have students. I don't have you You generate intermediate code and then compile it. Yeah, we we we now do it. Let's see what does uh It's still a huge difference. So I worked on postgres 11 after they added jit and and um Before they they did the jit. They have the same thing. They they generate the Some code some some intermediate code postgres specific intermediate code And when they do that and I try to convert that from into What we have and it's a bit slower because their model Doesn't let them use registers, right? So so in which case they they're always reading and writing into memory Each step is is reading and writing into memory and so The way my student my pg student perchance here If you can tell a little about it the way we get by with that is um We instead of jitting everything you do have some pre-compile primitives Sort of like vector-wise and those things are super optimized to take advantage of simby so you You avoid some of that jitting of everything like the entire query plan You guys are doing what the hyper guy did in In germany, okay Okay. Yeah, it ticks. Yeah, but I would say we Part of the reason uh prashant did this change away from what you did is It's for engineering purposes that like I I have a rotating cast of students Not very few of them can actually work on this kind of code Yeah, well we we This is just you know that the arithmetic part, right? We even have the whole hash table written in Of course. Yeah And and you know and the and the overflow mechanism in it as well Hey, so this is this is prashant So there were there were a couple of reasons why we went away from doing this type of direct ir cogen one of them was because it's very difficult to debug And uh, another reason was also because compilation times can be quite significant. So small running Um, you may generate a lot of code and that kind of negates the benefit of cogenic So in the new model We always generate we always generate code. Um, but we don't generate I love my are we generate a higher level higher that's efficient to to generate and also efficient to execute and um Over the course of time. We'll we'll eventually to compile that using lvm. So we have this tiered approach We so we get the benefits of compilation and the benefits of debugability because we have an intermediate representation so you can step into g into gdb and Um, I think we have essentially the same thing but not not here But in the fpga stuff that I talked about just now and those are those have a a higher level instructions and one additional thing that uh I was pretty mentioned is because he has this interpreter of the op codes before you get into the the compile lvm stuff is um If you can actually start running the query Uh using the op codes and then when the compilation is done you slide that in Like the hyper guys have a paper on this last year. Yeah. Yeah I think it's crazy. But does it work? It works. Yes. Well, I mean we don't have it in the full system. Yeah, but it works in brashant's experiments You can slide it in. Okay. Yes, because it's because it's just a function call, right? A function call Okay It's a bit different. Well, our function call essentially is the whole query. No, but like like it would be You're reading blocks of data. So so here's the function call for a given block You call that function and so is that is that invocation of the interpreter or the compile code? I see. Oh, so for each block you can restart. Yes Interesting. Okay Okay, there's Your thing makes money. I just doesn't Yeah, um, so here's a little bit of q1, uh, kochen that we talk about just now. Um In green plum, right? So you have you scan the line item you go you bring it to the ag and then there's redistribution that move tuples to to um Group them together based on those four flags, right? You want to Act them And then you do the ag and then you have the gather to send everything to the master and then you emit them, right? Um, so each segment will send four throws to the master and then the master will Act so that's why we see, you know, there's not much network traffic And but there's still network So speed up on pg, uh was 8x when there was no network And when we added the network layer in in grim thumb, you know, it went down to five times so LLVM and all this kochen stuff is very sensitive to to speed of the underlying IO, right? So if you have slow disk, I don't even bother kochen So so that's what You can see it quite apparently Okay Here's q5. So we joined six tables with three broadcasts Um, for some reason grim thumb didn't think of support. You have fully replicated tables from the beginning I guess the reasoning is, you know, if the table is small we just The the the broadcasts would be cheap, right, but it it's still uh, it it makes sense for Green thumb regular green thumb, but with LLVM speed up you can see the impact here, right? So the speed was five times faster on postgres and here we see it about two to three times faster After all this redistribution over the network of data How much is like is pivotal actually actually maintaining green thumb like Um, they're they're they're doing mostly the the the merging um I think the effort isn't catching up to postgres 12 now Okay, so so the the latest version of green thumb will have the the the basic limited Compilation of like the expressions that postgres added Yeah, yeah, okay. Yeah It's mostly It's mostly, um a lot of tp stuff that they are they are adding Oh interesting, okay Yeah, you know all this the postgres improvements doesn't really Address olac, right if you look at the hash join code in postgres Uh, it doesn't overflow. It doesn't have spills. It doesn't support spilling I was amazed at first So here's q 17 where the boom filter really makes some make a big difference. So you have You scan the line item you add and redistribute them And you add again before you put push it to the hash join And here when you scan the Pot table with a very heavy filter And you build up the hash table here You are able to push down the boom filter to to to this act which would Eliminate a lot of the traffic coming up here And then you will see a big improvement in in Speed up. This is like nine nine x faster and it's not It's just very good I Originally it was about two times and with the boom filter it goes down goes up to five nine times. So code gen Makes a difference, but Something like this makes an even bigger difference How often do you see that for the for the real customers that like the blue voter makes a huge difference? Um, very very often, right? Okay, usually you usually you have a pretty heavy filter Somewhere and you can just avoid a lot of the scans especially on the line item on on on the fact table Okay, so here are some something about fact tables that we picked up along the way talking to customers um, so green plum has This I this thing where You can't have a partition. That's the parent and you can have child children partitions right and each of those each of those could be Either a row table or column table or you know, or even a pocket table on s3 and You can mix and match them and you can you know merge them or Depending on on your workload, right? So the row tables may be accessed may be you know written very recently so you just keep them in row store and then Every quarter every month and you move them to a column wise table and then every end of the every quarter you move all the column table into an Quarter table on pocket in s3, right? so A lot of the customers are doing this because when you do scans and you're only interested in tables in this year Then you can avoid going to s3 at all, right? You can just use the most recent tables better for cash Cash awareness, right? So this is this pretty good and this is what most customers are using especially for fact tables the other thing that we picked up is Transaction is bad, right? So Some of our customers have Fact tables that are generated in the factory at the assembly lines, right? so the machines would just generate csv files and push them up to to s3 and You want to scan those files? And it's not controlled by you, right? It's just the file name measures some pattern Then can appear and disappear anytime But if we were to copy the csv files to the Database table and we do the import into the database There's a lot of work and that generates a lot of locking issues And a bottleneck in in in the database So the external table really helps here you just have a excuse me You would just have a external table definition that points to a url And then you and then you know, you call that url to get the data So the whole table becomes like a service to you That certainly can mess up like query planning, right? Because like you don't know how much data you're actually going to read So therefore your join order could be all in a whack the data the query the it actually provides statistics, so there's a a statistics API as well for for the query plan to obtain statistics on the SMT But if the files can appear and disappear at every time and you don't run analyze Your stats could be get way off. Yeah, you still have to run analyze From time to time But fact tables distribution don't really change a lot Yeah So this is the the worst part So hake if you look at the definition I you know, if you if you find a sea library, let me know I've been searching for that forever and There's no sea library because they they have so many dependencies and the The documentation doesn't have I I just don't understand why why you have a storage document a storage format and there's no byte level documentation. There's It's crazy stuff And then the the dependencies is also, you know, there's supports multiple algorithms and each algorithm may not compile to the platform that you You run on All right, and then you they use drift and then they have nested fields and Just adds a lot to to something that we don't don't use So we actually created something simpler and we have essentially the same design as pocket, but it's It's a lot simpler. You know, we do lz4 only And essentially every time you create a column We would lz for it and if we If we save more than 40 percent Then we store it in lz4. If we don't then we just store it in native format Why 40 percent? That's just some arbitrary number. Yeah You know, if it I think if it's don't save more than 30 doesn't make sense, right? We just store it in the regular form sure Because you're gonna pay to deserialize later Um, and then we support primary types only no nested type no no arrays no nested fields We still support zone maps. So that library is it's pretty good. We may look to open source it later What is that called? Sorry, what is the what is the format called? Simple pocket Oh, so it's a subset of parquet Yeah, yeah It's it it It took some design from pocket It's not actually a subset, you know, but um The the design of the file is essentially the same In in in the big pictures A clarification question. This is panos christiansis. We use Facebook So this parquet is compatible with the standard parquet. So it can be read by impala or vertical or is just No on the design Only design parquet. We guess the design the design and the idea is the same. Thank you. And you know, we support Um vector vector rice filters on Uh out of the box on on those columns, right? Thank you. Oh, okay. So 10 minutes on FPGA Yeah so So FPGA has a lot of compute power, but it's it's it's attached to the PCI bus so you're You're going to do a lot of packing unpacking so So, you know, usually we would have the CPU pack the records send it to FPGA The FPGA would need to unpack compute and then repack the results send it back to the CPU And the CPU would unpack the results and most time is doing all this packing unpack unpack pack Round trip, right? So it's a lot of a lot of work. Um So sending it's all because we're sending data to to to the compute engine. That's that's a problem the whole thing um We view the FPGA basically as a chip that you can create your own instruction So the thing about a intel instruction that takes only two registers, right? So here you can make Uh instructions that take A vector of you know 10 megabytes of data And and do everything and then send it back. It's very white instructions. That's what what we call them, right? um So we so we sort of use it in our regular joins it speeds things up by a little by Maybe two times even compared to lrvm um But if you add in the cost of the FPGA, it's it's kind of uh Hard to swallow so Let's see. So if we do a join like this, um, where you scan the line item table twice and do two hash join um So what we do is we we during the hash join Was it not a hash join anymore? So my co-founder feng actually invented this, right? So you took the records from the left side You send to the FPGA to compute the hashes and that's very fast Just send the key key and index to to the key and index to the record to the FPGA and it will compute hashes And then it will sort them and then so what you get back is a sorted List with index And and there are corresponding hashes And then you do the same thing to the right side and redx sort the both sides and then you can merge them Right. So the redx are you doing the sort in the FPGA or the FPGA is only computing hash Yeah Yeah How much faster is like the FPGA hash versus like, you know, the xx hash 3 Um, it's very fast, right? So that's two things here. So number one is it's faster because It's not code, right? It's it's circuit Yeah, but you have to send you have to send the data down to the FPGA. Yeah, that's the problem That's I said before it's a packing impact Yeah, at some point the the equation shifts towards the the computation Yeah, if you have big enough And the second thing is compute is never enough on the host So if you can shift those to the FPGA card, you can do more on on on on the intel chips For other queries, right? Okay That's really not enough of the current query because you're well, yeah. Yeah. Yeah, okay Right. Yeah, it makes sense to to do this Interesting. Okay. Cool. Yeah So when you merge then then you get the results of the joint There's one usage that that we implemented Um, second one thing has to do with the spatial joint So here's something from a customer. It's us. It's a telco, right? So they have towers each tower have a polygon of the areas that they cover And you know every night they want to count how many of their customers cell phones are sum up the cell phones that are within the area covered by the cell towers and so so now this reduced to a a A spatial joint of a polygon and points within the polygons Right, so we tried to do do this in in green thumb and And you know, it never finished, right? We use post gis and there's an artery And now you have to do a nested loop join using an artery and Whenever you see nested loop join in in green thumb Never finished hash join is It's the only thing that really really works Um, so instead what we built some custom thing for this customer where we know we scan all the towers We built an in-memory filter, right? So you have A very coarse circle that covers the polygon. So it's something like it's almost like a bloom filter You would qualify more 5% more than than you would You would probe and disqualify records that Would never fit This polygons And then you would send pack up the records and send it to fbj to do the final Pruning for each polygons That's how so Does this Your customer stuff back is invoked when you instead of calling the post gis intersect function Or that's like some other different sequel You know special, you know patiss data sequel stuff you've added. We give them a a a function that they call Okay, yeah, that's this so yeah, it's about 50 times faster of the after we do this and yeah So fbj is pretty useful. It's just It's just very hard for Computer science students, I mean who code to know the hardware stuff and I don't even really get into it. It's pretty tough So all this is because we get help from xilinx, right? So, you know, we tell them what we want and they They give us a function that we can call and and use it What is it? So xilinx wrote all this the the three bullets at the bottom The no the the last one at the bottom. Yeah. Yeah. Yeah. Yeah, I got it. Sure. Yes That's cool. Yeah Okay, I mean have you thought about like I mean, I know how hard it would be to To replace the entire post gis api with you know patiss magic Don't have a don't have a customer yet to ask for that. Okay. Yeah We are we are We're very limited bandwidth. We we do what customers ask Of course, yes So, yeah, so what I see really exciting in in in the horizon is this computational storage card From samsung, right? So it's fbj plus storage in one card And then the storage could be four to eight terabytes You can plug like 12 of them per machine A lot of a lot of computation and a lot of storage so each card has a CPU and memory because I think they have about four gig on on each card so Essentially one each of these cards is A pretty powerful. It's a it's a macbook, right? Storage you can You have it in in your in your machine Um, so now you can really push the compute to the disk and we And and you cannot go any lower than than that So our idea is to you know Let fbj run ag on each on each disk But because of the limited memory you cannot scan the whole You probably don't want to do the whole thing In aggregating you may you know do until you run out of memory And then you send the partial ag to the To the cpu that the cpu computes it, right? Complete the the partial ag Yeah, again, you know the nightmare scenario is the pocket files and there's no way no way in hell We can get it working in in fbj Well, right on time Thank you Okay, awesome. Um, again We Will do virtual claps or whatever So we have time for a few questions for ck So again, um meet yourself say who you are and where you're from So maybe I have a question related to the fbj part. I'm laying. I'm a pg student here Very interesting talk, right like many interesting topics and one thing I'm curious about fbj Is that like you you you made one comment? You know earlier earlier side you say that the compute is never enough on the nose You want want to have um fbj as to kind of like offload the compute, right? So what I'm wondering is that well if you say the compute is not enough Then if you can you can add more cpus, right? You can add a more more course, right? I mean that we can also get compute that way. So I'm wondering This is for fbj because it's much cheaper or like Yeah, so that's the that's a good good question It's it's not so much a computer science question as an economic question, right? So we whenever we try to sell this solution to customer You know each of those fbj card could be could cost like seven to ten thousand dollars, right and The question is always You know you have an mpp, right and why do I want to add card? Why don't I just add a machine? Right and so so that depends on customer and also depends on Some customers have very limited data storage, I mean, sorry the data center space Heat issue and all these things, right? So they want very compact things Then you you then then it's worth it to them Right and the second thing is sometimes they use the fbj cards for other Other usage as well, right? So maybe they are doing Deep Green DB I have something on my website where you know, they they do something like deep green db on video images That they need to decode And run some machine learning Image recognizer on on those Files as those mp4 files as well So in that case, you know, they would opt for for fpj cards. I see interesting Maybe another minor question honoring Is the word deep in deep green has anything to do with the popular deep learning here No, we just we just Pick up that it's some words that It's a name that I think deep Luke The cto xctl of grim thumb thought thought of and thought it was killed and we just let's use it Sure. I was just joking Thank you All right. Yeah, thank you. Thanks for the answer. Yeah Anybody else I also had a question. So this is Prashant. I'm a phd student as well So on your last slide, you touched a little bit about fpj as and I think of towards the end of the The last few bullet points. I think I'm inferring that you The idea you have is to synthesize um Per query you synthesize some fpj blocks based on the query that you're seeing Is that is that right? No, so so so fung did something That is essentially a stack machine Something like a fourth thing. That's what he said that you could just give it instructions Uh, you know, think of it like a schema scheme or list machine that you can just give it instructions and it would just um, you know It's it's a simple compute um engine I see. Okay. Yeah, something like like what you guys did, right? It's simple. It's instructions that that that are that are meant for computation, but Uh at a higher level than LLVM Mm-hmm. Okay Stack machines push left column push right column add No Do most of your customers are running are they running on premise or are they running like on ecc or azure? Um, all sorts. Okay I mean in recent years Do you see a trend moving towards like self-hosted gleams on in the cloud versus on premise or just it's still all random Still random still random. Um Yeah, so the the The The selling pitch From a cloud database. It's not really the functionality of the database. It's the convenience right So, you know, you don't need to hire on dba. You don't need to hire You don't need to move machines. You don't need to do start and stop machines all this stuff, right? But I mean there's no I don't like there is a green plum as a service, right? It's either someone running themselves On premise Yeah, it's it's essentially a rental that you can you still need to start and stop sure. Yeah, yeah Yeah, we found this needs administration. Um, it's a complicated piece of software, right? So So the reason they can open sources is because you would be crazy to run this You know without support Yeah, um, yeah the so, I mean The impression I have is it's a very, um As just as you sort of said at the very beginning you took Thomas's paper from the hyper guy and you're like, all right. Well, this is what we're gonna build uh the column store plus the lm but like The the mesh part or that is, you know, that You know, it's called a d-mesh like that's certainly not in that paper It seems like a very like as you sort of said like the customer has these problems and it's whack-a-mole Like, all right. This part's slow. But how do we fix that this part's slow that how do we fix that? Is that a fair way to describe how you have you worked on this thing? Yeah, because we have no choice, right? It's it's we we're not a fresh start You know in a university, right? Yeah, we need yeah So we need 100 compatibility and all we can do is to sort things out that are That still maintain compatibility So I guess my question would be like, um Is there anything that even though it's not being driven by customer requests Is there anything that you look at what green palm out exists today? That you say, all right If we had if we could fix one thing Because this would be you know, this would make a bunch of queries go faster or make our lives easier Is there anything like that that you think No, I think I think if if I have the bandwidth I would do snowflake Sure. Yes, okay Well, wouldn't we all right? Yeah, yeah, okay, uh any last questions Since we are not that hard we can talk about uh any if you are interested later. I have a short time Since Andy brought the market space, uh, what is the reach of green climb? I saw that you run on alibaba in the aws Yeah, does it mean it's china in the united states or you have rich? Europe in other places Um, we we don't reach into we don't have the bandwidth to support Europe But I think in any countries where you do On on prime solutions that it's pretty popular with a lot of local local small businesses Uh coming up and claim that they support it I think you see it a lot in in russia as well russia is especially I'm pretty good at this Russia china But I guess another way to think about it's like like what is the Is the sales process you guys is the people coming to you after being exacerbated with pivotal support or trying to support make green bloom go faster Or is it like Again, there's the people like are you actually selling? Hey, you're doing room pump. We can make a five effects faster for like Is it more people is there more people you find that more of your customers are showing up your website? I think oh, I want this because I I have been on problems now No Well, we have We're such a small company. We don't have really good marketing. So not many people know about us All right number one We do have customers that come that have a performance problem and knock on the door Um But a lot of the stuff has to has to be sales per sales person going to to the customer Because customer doesn't know that they can it can be faster Right. Yes. Yeah, the customer thought this is already the best that they can can have for open source. Yes I'm doing quick tests to see what happens if I search for green bloom acceleration Yeah Oh, yeah, so here's another story that you guys may find interesting. So a lot of the customers are doing batch jobs at night So, you know, think of this as you know banks and greedy firms and they Every night they have to import files Do etl and then they have some windows to run x And if and then they have to update the dashboard at 7am Okay, so the window is it's pretty small and you know, if you can relieve two hours of the window It's yeah, yeah yeah All right, so just Googling green pump acceleration. You don't show up Yeah, I know We need a lot of help. Yeah. Come on. That can't be that much for that ad Right. Yeah All right with that guys, uh, then let's thank see can, you know, virtually, uh, for doing this We really appreciate you spending the time with us today and talking about this