 The Databases for Machine Learning and Machine Learning for Databases Seminar Series at Carnegie Mellon University is recorded in front of a live studio audience. Funding for this program is made possible by Google and from contributions from viewers like you. Thank you. Welcome guys, it's another seminar here at Carnegie Mellon. So excited today to have Pat O'Keefe, he's the Senior Vice President at Feature Base, the only bitmap database as we were discussing before the call. So we're super excited for him today to talk about what is a bitmap database and why that makes it special and different from everyone else. So as always you have a question for Pat as he's giving us talk, please unmute yourself and say who you are and feel free to do this at any time that way Pat's not talking to himself for an hour on Zoom. Pat, we appreciate you being here down in Texas, the floor is yours, go for it. Thank you Andy, thanks for having me and thanks everybody else for coming along to listen. I don't have it about me slide, but a little bit about me. I come from a software engineering background, mostly in and around databases for most of my career. I used to work for Quest, software, did a stint at Dell. If you've been in and around the Oracle world and you're aware of a product called Tode for Oracle, I worked on that as an engineer and then eventually was VP for the engineering team that built that. So love me a database. So what is Feature Base? So I thought I'd just broadly cover like or we'll break the talk into three sections if you like. So and the first one is it's a database where bitmaps are the primary storage mechanism. They're not the only storage mechanism, but the primary one. One of the things that we always try to keep in mind as we build our product because we're a business is that if you're a database, you know, your product strategy probably should be make it easy to get data in, make it easy to get value out of the data while it's in there and then make it easy to get the data out. And so that's so we'll talk a little bit about that. And then this idea and this probably shouldn't be a new idea to anybody who's been following this series, but this idea particularly true in the ML and AI world right now of bringing the compute to the data and not taking the data to the compute. And as as any said, if anyone's got any questions, I feel free to interrupt. So let's talk about databases as bit or bitmaps as databases. So at highest level, Feature Base is a is a hosted database process we're written in going. In fact, we are where there is an open source version of Feature Base. If anybody is interested in spelunking the code, though for various business reasons, which I will not go into on this talk, we haven't contributed back since April or so. But but certainly feel free to go and have a look. It's Feature Base DB slash Feature Base is the repo on GitHub. Like I said, we're clustered. The consensus mechanism is raft or at CD in our case. And data is shattered across nodes. And you can do all the sort of things you'd normally expect. You can set up a replication factor. So the data exists on more than one node and reads can be serviced from any node that has that data. And like I said, we shot we shot we shot across the nodes. The sharp key is basically the bitmap offset in the in the table, which we call underscore ID. And transaction wise, the best way to think of Feature Base is is we've come from it like a heavily biased analytics background. So we're more base than acid. Though trending towards acid more and more over time, mostly because the use cases to which our customers tend to put us cases where it's a large volume of data. They want to do fast segmentation type queries. So queries with a lot of filters on pretty high cardinality sort of terms. And they also want to keep it updated. So this idea of like real time analytics because you want to participate in an ad auction or something. And you want to do that as close to real time as possible. Drilling down a little bit inside a node. Sort of again three layers here. There's a language layer sequel over the top, which is in fact relatively new. We used to have a language called people or Pelosi query language referring to the old older version of both the product and the company. It was very assembler like if you like. And we found a lot of customers had difficulty with adoption trying to understand a way of querying, which wasn't sort of familiar to them. So we thought let's add a sequel layer, let people reuse their industry knowledge. There's a compute layer. So that's the thing that obviously executes query plans when they're when they're created. And then there's a storage layer. And I've said that bitmaps are the primary data storage mechanism, but not the only one. We broadly have two data stores underneath. And though we don't sort of talk about this quite so much publicly, but there's a so-called B store and a so-called T store. And the way to think about those is B store is roaring containers in B trees. So what we basically do is use the higher bits of the bitmap offset for a table along with some other information as a key into the B tree. And then once you get down to the leaf level in the B tree, you get either the leaf level is big enough that we can store the RBF container in page or otherwise we'll do overflow pages. And as it so happens, a roaring containers default size is 8k. So we just have multiple overflow containers or multiple overload pages that contain roaring bitmap containers. And that's basically how we access the data on that side. What is the bitmap built on? Record IDs or actually the values in the B tree? So think of the bitmap notionally as you have a bit list and an offset, and so the offset is the number. So for example, the way it's modeled is you have a table of users and it's underscore ID column or it's key column is numeric. The actual value is the offset into the bit list. If it's a string value, we effectively just do dictionary compression. We go store the string in another B tree to generate the next number and then use that number as the offset. Does that make sense? Yeah, makes sense. What happens if there are a large number of values in that dictionary? So just about every string, let's say in a string column that you're trying to put into the bitmap is unique and you've got a billion rows. Do you do something special for cases like that? Or maybe you might talk about it and you could defer it. Yeah, so I will talk about that later. That is one specific case for which you really do not want to store that stuff in a bitmap. Okay, cool. I'll wait for it. Thank you. Yeah, and then the T store is basically, as you'd expect, it's tuples on pages in a B tree. If you looked at the code and then probably looked at Postgres, you'd see a haunting similarity in the implementation. So when I call insert, what do I land in? The B store or the T store? Yeah, it depends on the column storage specifier, but it will always at least write to the existing bitmap in the B store and then go write the value using the bitmap offset key as the key in the T store. Okay, so for any tuple ignoring replication, every attribute within that tuple will exist and only one of those two stores, right? There's not like T stores a copy of everything. Yeah, correct. Yeah, we don't basically make a copy, though you could semantically do that at the application level if you wanted to. Got it. And then it also sounds like you were saying you're restoring the, you're using B plus trees for the dictionary as well? Yes, the current implementation we have under the hood for that is BaltDB, but we're going to switch that to the T store implementation at some point in the next few months, mostly because BaltDB has a specific set of problems that we'd like to sidestep. It's also dead too, right? Like the guy's not working on it anymore. Yeah. Because BaltDB, there's the revival of it, but yeah, I understand the point. Yeah. Yeah. Okay, sorry, keep going. Okay, so yeah, so I think two storage layers. So let's talk about bitmaps. So for some of you, this will be nothing new for others. It's worth going through this in a little bit of detail. So bitmaps are sort of good at encoding relationships, right? So you have a thing and then either it's something happened or something exists or some membership of values or whatever. And so we tend to think of that when we're storing the stuff is thinking of everything set as sets of something. So if you think of a table, like I've illustrated on the slide where you've got people and pages they've visited on the web or on your website. So you can see Fred's visited index and pricing and Mary's visited index and careers. When we basically encode that as a bitmap, that pages visited becomes what we call in this case a string set. So it's a set that's got that dictionary compression mechanism in front of it to turn the string into a number. And then each person gets an offset in the bitmap and then basically we have a different bitlist for each of the values that we see in pages visited. And so this set can get very, very, very wide. And so this idea of a set of pages visited becomes a set of bitlists. And then each of those bitlists are stored in a series of warring containers. Does that make sense? Cool. Yeah, this is basically Pat O'Neill's traditional bitmap, right? Would you say is it different than that or it's pretty much exactly the same? Pretty similar. Got it. Most of the challenges that we've had have been, I mean, it's a very simple idea, right? Most of the concerns are practical ones. It's like, okay, so how do you then update this after the fact in a performant way? And how do you deal with the concept of null, particularly as you move into a relational model and empty sets and a whole variety of other sort of stuff at the periphery? Yeah, and will you talk about what data types you support? So obviously strings are a big part of it, but do you support floating point numbers and the whole set of data types you see in SQL, including datetime floats numeric? Yes, so I don't have a specific sort of slide or table on that, but it's pretty simple on the B store side. Or the way to think about it is all storage types with a couple of exceptions can be stored in either the tuple store or the bitmap store. We have int, so 64 bit int. It is bit sliced and plus you can store it in the T store. We've got a scaled integer, so like a decimal number. So it's also bit sliced. There's a timestamp. Funnily enough, the internal representation of the timestamps are 64 bit integer, which is also bit sliced. And we have two variants of a set. So the set or what we call ID set, which is a set of numbers, which means you don't have to do the dictionary compression. We call it key translation step. There's a string set, which gives you the key translation step. And what am I missing? And then there's what we call mutex variants of those. So it's a set that can only have one value at a time. So if you go and update it, it clears all the other bits and then sets the bit that you want. So that example you talked about before of like a bill in row table with a bunch of strings in it. That would be a string type, which internally we said is a mutex. We also support obviously non-bitmap strings in the tuple store. We added vector support. And we also have single and double in the t store. We may do a bit slice floating point, but our customers, that's an interesting one too. It has all the problems of like the high-cardinality string, only really good for range queries. And so, yeah, we don't yet have that one. Cool. Thank you. Okay. So we talked about sets, right? So relationships. So as it happens, I used the term bit slice before. So this is the example of that. So integers can be stored as a set of bit places, right? So if you have, in this case, a 64-bit integer and you'll get 64 columns of 64 bits, sets of bits, each representing a bit place. And then each bit list is obviously, I talked about a roaring bitmap. Roaring is a very interesting technique. It gives you all the benefits of compression, but you can still do bit-wise operations on the compressed representation. And that's sort of really what gives, that's the core part that I think gives databases to do this. The latency edge on these types of queries is because you just do not have to shift anywhere near as much data to compute on like filter predicates and stuff like that. And so you'll see, in this slide here, I'm serving two rows into a table. And I have two offsets, one and two, and then the appropriate bit set in the bit lists. So why is this interesting? Well, this makes range queries really fast. So if you think about a query where you have a table where this missing gets from clause, select ID from my table where age equals 18 and 35. So I'm storing ages as a 64 bit integer. The interesting part about that is we're storing ages, right? So one of the things that we can do to feature bases is when we create that table, we can hint and say, hey, the min value of this is one or zero, if you like, and the max value is over 150, assuming that no human being within any timeframe that's meaningful will live longer than 150 years. We only need eight bits for that, right? So whenever you do a query on that age column, the most cardinality you'll ever have to worry about is eight. And then even better is if you're doing a range query between 18 and 35, you only need six bits to represent 35. So you can ignore every bit list after six. And so you only need to read six bit lists or six sets of rowing bit maps. And even at a billion rows, if that date is relatively sparse, that can be as only a few pages that you'll actually have to go read to calculate whether or not an offset meets that filter condition. And so where this also becomes interesting is filter combinations can be fast. So if you think about, my query is still missing a form clause. If you think about this, where you've got an hand in the filter, right? The output of expression one, which is between 18 and 35, is a bitmap of IDs matching expression one. Expression two is a bitmap of IDs where the credit limit is greater than 5,000. And intersecting the two is just a bitwise operation to get the matching IDs and bitwise operations we can vectorize with SIMD. So that becomes very, very, very fast. So we saved IO and saved a bunch of compute time. Where there's slope is that case that you talked about before. So if I have an integer column like age and I want to do something like select distinct age, it's got to read everything and then reconstruct all the values out of all the bit lists. And so if you think about how that actually works, it's a starter scan down the ID list, like the offset list. So for each offset, go read bitplace 63 and then go read bitplace 62 and all the way down to zero. Now I have my number with a bunch of shift lefts and then go on to the next one. And then if you have to have a key translation step to get the string back, the worst possible way to do it. So when I talk about this particular case, this is why we went to the only bitmap database to not all bitmaps all the time. Because sometimes you just want to store a string. And this particular case is one of those. I want to do some filter conditions and I want to return the user's ID and email address. And so I don't need to go spelunking through a set of bit lists to do that. I can just read the bytes off the page in the T store. Similarly, sometimes you want to store a vector. No question. Presumably you can store, sorry, do you support like author table? Like, if someone realizes, oh yeah, I don't want to be, actually, maybe rephrase it. The decision of whether you want to be a T store or the bitmap store, that is something you expose or something that feature-based does automatically? It's something that we expose. So you'll see here device user ID string max to store. So I guess how often do people realize that they're wrong, that they shouldn't have done something that is a bitmap? Maybe it goes back to my question before, where like the people coming to you are more sophisticated because they realize they want feature-based and therefore they know upfront they should make a decision, make a decision about what the cardinality looks like. So like the reconstruction cost is expensive, but most people get it right the first time. Is there no way to think about it? Sometimes we've got some features in the, we have a rules-based, currently rules-based query optimizer. We're heading rapidly towards a cost-based one. There's a project going on right now around group buys to do stuff like, well, if you know in a group buy that it's high cardinality, you basically get, you get, it goes exponential very quickly, the high, and you don't need much. So at that point, what you are better off actually just doing the filter and then doing a scan and then pulling the whole set value out and then grouping by a high level, and not turning it into like a series of intersect operations under the hood. The optimizer is now smart enough where it will say, yeah, you really shouldn't do this. Maybe you want to consider making this a tuple store column and it'll warn you and you can then just do the insert select to create the new column. Awesome, thanks. And then obviously this one, you can specify tuple store and vector, but you can't store vectors in B store. So we just do a smart default. So that's sort of the low level storage stuff. I want to talk a little bit about sort of easy in, easy value, easy out. But if there's questions, let me know. So again, I did watch all the videos in this series up until this point. No, this is not the first time we've seen this chart either. This is a horror nightmare. It's interesting to me, just industry-wide, like how much of, like we've identified all this clearly because all these things have boxes. How much time we actually spend over here still as an industry astounds me. Like you think we would have solved this problem by now, but no, we haven't. And I think a lot of the issue is, is you sort of have to make a lot of decisions about how you can implement all of this before we can implement any of this. And I think at intersection with all of the normal organizational challenges you have within companies just to get access to various datasets to do anything. And so one of the things that we want to spend a lot of time with our customers was just making ingest, like, as easy as possible. And we went through an enormous amount of work on this. When I first started the company, we didn't have SQL at all. And so we built a lot of this sort of stuff. We took advantage of the fact that with each of the compute nodes, sort of, that's that slide from before, there's an executor and a SQL layer. And so we can do a bunch of stuff both on the node and in some cases deploying additional compute as necessary to do stuff. This happens a lot in our cloud variant. So you can see here there's a table I just created and we implemented a bulk insert statement which has this idea of the ability to map data from a source format declaratively and then same thing and transform during the same step. And if you think of this as, like, effectively an execution graph, so, you know, source, target, a map step and a transform step, we can also decompose this down into an execution graph that can be distributed and in fact do. And in this particular case, this was from a demo I did ages ago, but basically this just takes some data out of a CSV file, reads the text in Generator UID ID column, puts the text in the voucher column and then embeds it with OpenAI. The problem of course with this is you'll run, that's why it says rows limit 50 because you're going to run into OpenAI's throttling limits really fast if you do this. But this was just an example of a lot of the stuff that we built behind the scenes to make this really, really easy for customers. And then with the thinking that once you have like a SQL layer to do, like a language layer to do all this stuff, you can put a tooling layer on top. So, you know, once we have some command line tools, for example, that can pull data out of a Kafka queue and then format it into bulk inserts like this and then execute them against the server. This is sort of another variant. So that variant is like, I have data in a file and I want to push it into the server. This is some stuff which illustrates the idea of like being able to pull from the server or the compute part if you like. Because again, this top thing is basically just a table value function which represents a remote query in this case into Redshift. That's pulling some product review data. I can pass in a hot, like a high watermark. So I can say, give me the next 50 rows from this particular starting point. And then this idea down the bottom of a pipeline. So something that can be like a schema object or an infrastructure object that can be created and managed separately from the database inside the infra but is still accessible via SQL to manage it. And then all this thing is doing is saying, I'm going to move some data from a source to destination. So in this case, every three minutes it seems, I am going to insert into a product reviews table the result of querying my product review data from a foreign table if you like. And then I'm going to do a batch apply. And so that was the solution to the throttling problem. So batch apply basically does this function call but batches the rows up in a way that lets you call open AI with an array of text and get back an array of vectors and then it inserts them into the right spot. And then you can start the pipeline by just again another SQL statement. Stop it through a SQL statement. You can go look at the system tables and look at the status of this thing running. And so again, it's this idea of making it really, really easy to get data in and work with data and not let all of the complexities overtake you because it's just a simple, let you use SQL in a language that everyone knows how to use to do this stuff. Bringing compute to the data. So this was another variant of some stuff that was obvious for customers that wanted to be able to do this. This is a contrived example. But let's assume for a minute that I have some demographic data about crime in cities. And I'm a city planner. I build a new city somewhere and I want to predict based on some demographic data what the murder rate might be. And so again, this sort of query encapsulates a bunch of ideas that we have around doing this again from a SQL level. This is sort of not new. We've seen this before, but it's sort of really important, I think, back to why bitmaps to matter to us is that what it means is that if you think that... Oops, go back. If you think that the way to define the training set is with a where clause and if a where clause is a bitmap operation that's very, very fast, it means that I can pull the data off the disk with a low latency query, push it into the training stuff in this case for a linear regression and save off the weights into a system table and I'm done. And if I want to retrain, I just do an alter model and it automatically does it with the same training set. And so we don't have to run a query, pull the data out into a CSV, export the data, put it into a Python notebook, do the thing, get the numbers, put them back, do all that sort of stuff. And so again, it becomes easy. And again, we're trying to do for our customers training and inference in the database or at least as close as we can get to the database, not possible in all cases. I don't need to cover as what is it, this PyTorch or something custom? So in this particular implementation, I think this is just native Golang code for a linear regression. We link other stuff for other models. So I think XGBoost and Scikit-learn and all that stuff, right? For us, that stuff's possible right now. Golang has a specific set of challenges with sort of calling in native code that doesn't exist in other languages. So that's an issue we're sort of dealing with right now. I'll show you another example. Let me come back to that in just one second. This is basically the inference of other side of the program. Another question I have, actually, before you jump into the next slide. So you're showing this model, create model. You had the pipeline one before. And this is all done in the SQL layer, which I like. But is there anything because it's a bitmap that makes this better or this kind of ML stuff? Because you guys know the bitmap store. How does that help for this kind of stuff? So the answer is, I think it's degrees. I wouldn't say bitmaps make it better. They make some of the queries that you want to do as part of this better. But when there is some higher order stuff, there is some higher order stuff around some rags stuff that some other folks in our team internally are doing where I'm not intimately familiar with it, but basically the gist is that they do PDF breakups. Go and embed the chunks. Look for key terms. Go ask the machine learning model, hey, what are the key terms out of this chunk of text? And then basically you get back a set of strings. So that's a string set. And then we go store the string set in the bitmap. And then one of the things that we do as part of the inference workflow before you reprompt the large language model is go find... So it's a similarity search, but it uses TanaMoto on the string sets. So go find some like sets of key terms, and then add those documents to the prompt as well, which is really, really fast because there's no floating point involved in any of that. All right, that's helpful. Thanks. So yeah, so here's inference. This syntax is going to change. If you saw the Postgres ML stuff, that's a better way to do it. This version here is not just pure SQL. So in this particular case, this is just a UDF that calls out the Python. Well, in this case, Python, it uses a sandbox that basically does use a standard in standard out to orchestrate executing external code. So that's Python, but it could be anything, right? Could be bash or Rust or whatever. And in this particular case, there's some stuff in the runtime that lets you do stuff. Oops, why is my thing jumping? That lets you do stuff like load model and it'll work out that that name maps to some schema object that actually has that bin file stored somewhere where it could be loaded into this sandbox before it's executed. And so this particular case, this was fast text, which we trained on a bunch of SQL and we just want to get embeddings back from SQL that we pass in. So that's another variant of, again, this idea of bringing compute as closest to the data as we can possibly get. And as a lot of this stuff evolves, we think we won't have to shell out as often on this stuff. We can actually have it in process inside. And that's all I have. That's great. I have a whole bunch of questions. So maybe I'll ask the first one. Andy, unless you want to go first, someone else on the call. You're the bitmap expert. Go for it. This is different questions in that bigger picture that you showed with showing how complex enterprise data ecosystem is with the query engine layer being just a small portion of it. You mentioned that your platform is written in Go. I'm guessing all the bit manipulation stuff is probably written in C and you hooked it up through SQL. What does that, how do you make that Go and all of this work fast with it or are you using some native Go libraries that let you do stuff like that? I know there are running map native Go, running bitmap libraries and stuff like that. Yeah. So the core storage engine is all native Go. For performance, there's a tool called Goat, I want to say. And basically what it does is it's a tool that can take, we've done this recently for SIMD stuff on floating point vector operations. So what you do is you write the code in C, you write the function in C using intrinsics and then there's a post process on that which says okay, Go generate assembler from this and then you turn that assembler into the plan nine assembler that the Go compiler can understand and then you compile that into Go code. Got it. The second question is because we've been battling around with this and haven't quite solved this problem. If I've got two tables and doing a join across both of those and those columns have high cutting, high distinct, high number of distinct values. So they're not going to be in a bitmap form but then a bit sliced form. Do you do anything special when joining columns where both of them are in large, like 32, 40-bit sliced land? Yes. So in the executable level, we haven't quite pushed this up into the SQL level yet. So in other words, there's no SQL plan operator, so join operator that's aware that if my input is a bitmap and the other side is a bitmap, I can just do a bit... Bitmaps are fine if it's bit-sliced or it's high cutting. So bitmaps are fine, right? Those are easy but if it's bit-sliced columns as the join keys on both sides. Yes. Again, not in SQL but at lower level, if it's two bitmaps and it's an AND or whatever the bitmaps operation depending on what you're trying to do. So if it's a quality or a range, it'll do it as a bitmaps operation. Oh, sorry. Sorry, I didn't ask the question correctly. Imagine I've got two tables and I'm doing a join. I'm presuming your layer allows joints between different tables, right? Yes. Yes. And so what's the join algorithm if the two join keys are in a bit-sliced form? Oh, I see. Yeah. So imagine I've got a date in one side and another date column. So high cardinality, high lemma, distinct values and I'm doing a date equal to date. And I need the underlying 64-bit date space even after bitstripping all the zeros at the head, I still have 40 bits of useful data. Do you convert it into the tuple store version and do join in that native representation space or do you do something fancy? Yeah, no fancy in this yet on that. So it will, even though the underlying storage mechanism is bit-sliced, for a join at least right now, they're getting materialized back to integers and then whatever the join expression is, whatever the operator in the join expression is just applied to the two numbers. Yeah, this is fascinating how far you're pushing it. I love it. One last question, then I'll let others go. Same thing with aggregation. If the group by column is in a bit representation, in bit-sliced representation and their large number of group buys, I know you had alluded to that earlier. How do you do that type of aggregation when you're managing that group buys? And a related portion is, do you do anything special with the math? So imagine I've got a sum of A star B and A is in a bitmap form, the other is in a bit-sliced form. How do you mix and match expressions and then how do you deal with large number of grouping key values? So grouping key values for a bit slot, so it's basically the way to think about it is go do any filtering that you need to do first because then you eliminate as much as possible. Then it's basically counter bits. And so where the problem becomes is if it's grouped by this thing and this thing and this thing, and the thing is a set, so it's not a mutex, then if it's more than a... If there's a large percentage of the rows in the table that aren't eliminated by the filter, then you get sort of a combinatorial explosion of CPU time. And so we're looking for better ways to do that right now because it's not... The way we're doing it is not ideal. And then expressions. Do you do native bit arithmetic for... Yeah. And what do you do with the two different columns, let's say A star B and A is in a... Both of them are in different formats. So one might be bitmap, the other might be bit-sliced. Do you convert to something, one of them into something, and then, of course, if it's in one of those, both are in the same format, then you can work out the math. If they're different formats, what do you end up doing? If the different formats think of it like there'll be a thought conversion that goes along, and then we just use native arithmetic on basically the materialized value. Got it. Awesome. Thank you. I have so many questions, but I'll see if others want to go. I'll fire away. All right. Anybody else? I have a question. Go for it. So I'm new to bitmaps, and I'm just curious. This idea sounds so awesome, and it just seems like everybody should be doing it. Why is not everybody doing it? If you could give some context on that, I would love that. Yeah, so it's hard. And hard, I think, for a couple of different reasons. One is we've talked a lot about, like, reducing query latency. So not only just the fact that bitmaps are particularly well suited to solving some of the query latency problems if your data looks a certain way, the problems on the other side is updated. So having to maintain a bitmap. So in other words, if you want to change, if you have a table with a column in it that's a bit sliced in, right? So a timestamp or an age. When you want to go and update a value, so update my table, set age equals age plus one, wear some condition, right? And even if it's one row, you basically got to go to the offset in the bitmap, and then 63 times, or 64 times, I'm sorry, you have to go and basically write out 64 new bitmaps because you can't, you've got, for that column, you've got a bit list that might have, that could be notionally a billion bits long, right? And so to change one row in the middle, you've got to write out a whole new bit list. Now, that's one of the reasons we store this stuff in a bit tree because basically you might not have to write out the whole bit list, you can just write out a piece of the key space as a roaring container, but you've still got to write the whole page each time. So for an update, the minimum number of writes that you're going to do, at least for us right now, is like 64 pages plus one for the exists, if that changes, because you might set that to null. So you've got to clear all the bits, then write the exists. And then if it's more rows than that, you're going to write a lot more pages. And then depending, and we use a copy, at least for the B store today, we use a copy on write for the transaction, like shadow pages for the write ahead log. So multiply the number of pages we just wrote by two. And that's where it starts to become challenging. And so we put a lot of time in making that as efficient as we possibly can. But yeah, it's not surprising to me that it's not a widespread technique, like I said, because it's hard. And then one more thing to add, I think, on that. So just the technical challenges in implementing all that, I think there's some data modeling type things that I think people coming new to this stuff wrap their heads around, because everyone that comes from a relational wants to normalize everything, right? So we have don't store the data twice. But for us, if you're doing an events table, back to bitmaps are a way to encode relationships, you want wide set columns, right? Sure, put every website you've ever visited into one row in one column. And that's normal for us. So there's an educational sort of thing around data modeling, too, that sometimes people take a little while to grow. So it's more about identifying the exact situations in which this will work better than others and usually people are not so good at it. Okay, thank you. And then I think a lot of our time has been spent trying to alleviate the user from having to do that and just making the smart decisions where we can. Thank you very much. Question from Martin. All right. Thanks for the talk. Really interesting stuff, hearing from a bitmap expert. As a broad question, have you looked into, because Professor Patel's mentioned the bit slicing and we've got that form, have you considered somewhere more than a pure bit slice, but still slicing up values? And so maybe you've got 32-bit integers chopping into two halves of 16s or so. Have you looked into any problems of that sort and that opens up a whole can of worms of different kinds of mutation problems, of course, just any broad thoughts on this? Yeah, off the top of my head, I think the answer is yes. One of the interesting things about timestamps and when you think about querying them is people want to group around subsets of things, right? So like you'll have these scalar functions which say, here's a timestamp, give me the month out of that. And then use that as a group by expression. Being able to do that natively, so for the optimizer to say, you're doing a constant expression on a timestamp that happens to be just pulling a component out of a timestamp, we know that the day is in these bits. Can we optimize that for you? We've given that a lot of thought to solve those sorts of problems. So basically doing partitioning of values on the fly so that the people don't have to say, here's my timestamp and here's my day column so they have to do a bunch of pre-compute and storage and then may have to maintain that after the fact. I'm sure there are other examples but they're staking me right now. No, the insight on the date is really great. Thank you so much. You're welcome. By the way, Pat, that was Martin. He's doing his PhD essentially in this whole area and trying to build all kinds of crazy bitmap store area so he loves this space. So that's awesome to see you guys are pushing it to the edge. All right. Any other questions for us? Otherwise, I'll have a time. Jignesh wants to probably another round too. All right. My first question is like, why share nothing and why not share disk? Are you talking about a cluster level or at like the storage level? The storage level. Unless I was mistaken, look like you were, like was it shared nothing or is it a shared disk architecture? So right now we are shared nothing. Am I got that right? Yes. Shared nothing. In the cloud product, we are moving towards shared disk. Okay. It was a legacy of being an on-prem at some point. Yeah. Pretty much. I think ultimately certainly the goal is to not only move to shared disk, but probably shared buffer pool. There's benefits for us to do that and even if we have to pay like a network penalty, every time we want to get a page out of the pool, we've got to go across the wire. And one of the other things to note too is T store represents the most up-to-date buffer pool implementation on B store like RBF. We still use M that. And so we're sort of slowly migrating away from that and then making sure that we have unified buffer pool, unified data file, unified right ahead log, all of that sort of stuff. So if you like it with a strangler fig pattern, way to refactor the B store storage layer. And most of that's for performance. We think we can probably halve the number of IOPS we do on B store if we moved to all the machinery that surrounds T store today. You're the first person we've had ever said they want to go shared memory. Are you going to rely on like RDMA or CXL to make it go faster or just regular TCP? We don't know yet. It's just an idea. And it may well be... Used to work with oracles. That's why. Okay. Yeah. It's a rack. Yeah. Racks are probably a reasonable model for thinking about like where we're starting with. We, like I said, no firm decisions that we made, but certainly shared disk may be worth doing shared memory. Yep. Okay. And then for you guys, are you using like, is it a custom drawing bitmap implementation or is it based on like an open source one? And then have you ever found like it's compression scheme to be insufficient for some data sets or some forth and therefore you have to make custom changes? Yeah. It's pretty customized. Most of the implementations out there are 32-bit. I think ours is now 48-bit. And yes, there are some cases that we find that will eat CPU like there's no tomorrow. Once you trend towards every bit being set. Got it. Okay. Interesting. Do you know what percentage of the columns for your future-based customers are using the bitmap store? Is it like up to 90% or is it somewhere 50-50? Yeah. Okay. Yeah. And then can you talk a little bit more about, you mentioned you're using SIMD in some cases. Like is it... I'm assuming you're processing vectors at a time or batches of tuples. And like is pretty much all the operators, all the manipulations on the bitmaps themselves, does that always be vectorized? Not all, not all yet. But as time passes, their goal is to get to all of them. And is that just engineering limitations you haven't got around to doing it or is there fundamental reasons? No, we just haven't got around to doing it yet. And you target AVX 512 or AVX 2? AVX 2, 5, 6 more. So most of our customers are running in AWS and Azure and 512 for both of those providers is not there yet. Yep. Okay. Cool. All right. So I'll pass the baton to Dignesh if you want to go next. Yeah, I think that workload question is exactly where I was going to go. It sounds like a lot of your applications are probably data science oriented, trying to do ETL and other types of feature prep in SQL. Is that accurate or are people doing like warehousing workloads on your platform? Yeah, most people aren't doing warehousing. It's usually, it's not quite true actually. Oh, let me rephrase that. Most people are not doing traditional warehousing. Like in other words, throw everything in the feature base and point Power BI or the farmer or whatever at it and do ad hoc stuff. No one's doing that. And I don't think that would be the best way to use our stuff. What most people tend to do is, what most people have in common is that they're trying to do, they're trying to feed a lot of changes in at the same time as do low latency queries out. So and the nature of the queries tends to be small result sets. So not mostly not big scans of stuff. It's the sort of stuff. If you think of like the traditional segmentation type use cases, there's a lot of that sort of stuff. A lot of anomaly detection type, driving anomaly detection type applications. We have a customer who's doing, storing a bunch of fingerprints of binaries and then looking for patterns with CVEs and then saying, if I have a vulnerability somewhere that's occurring in this fingerprint in this binary, show me all the systems worldwide that have that binary on their system and do it for me in 40 milliseconds. Yeah, that makes sense. So a lot of IR like work, that's what I was guessing on set value type attributes and that type of payment and set intersection type of stuff would go blazingly fast on it. Last question that I'll stop. I know we are running short on time. Do you do any logical or physical horizontal partitioning of a table before doing any of the bit maps techniques or do you treat the whole table as one big partition? The way sharding works is that so you have the key space for basically bitmap offsets. Each table is when you create it, there's a number of partitions in the table and then there's a consistent hashing mechanism that works out which shard on which cluster node that that write goes to or that read comes from and basically that's, so if you have a three node cluster or a five node cluster, the shards will get basically equally distributed amongst all of those nodes. And do you ever have to worry about updating an existing value or is the model just append only or do you allow updates? Yeah, updates happen all the time. Got it. And how do you deal with that when a value moves that's in a bitmap form, for example? Yeah, so it depends on how many values change. We tend to batch stuff up. So larger batches are better. We just finished a whole bunch of work around bulk insert. So if you hand bulk insert, here's a million rows of CSV. What you don't want to do is give that straight to the storage and say, how go insert this? Because it'll generally tend to thrash a lot because you'll just get dupe writes all the time. It's like, I'm going to write this page. I'm going to write this page again and this page again because we're having to maintain a bitmap. If you can pre-sort the keys and then give it to each shard and say here's a bunch of keys pre-sorted, then that's way, way, way more efficient. So we just did a bunch of work where at the front end of that bulk insert statement, we take in all the rows. We pre-sort out into batches basically in our internal wire format. And then we hand the pre-sorted stuff off to the shards and say here, insert that. And that's one of the ways in which we sort of make inserts go faster. On T-Store, it's basically write a new row version. So your earlier comment, you're trying to get closer to as it makes sense. You probably have some sort of a snapshot isolation or some of the model like that right now. Yeah. So every request gets a transaction ID. Anything is not committed. It's not yet visible. So it's sort of... And that's on T-Store. On B-Store, it's a little bit looser. We have consistency at shard level, but you can read uncommitted on another shard from B-Store. Again, that has tended not to be as much of an issue with customers as what we thought it might be, just given the nature of the use cases that they're trying to solve for. Got it. Thank you. You're welcome. Andy, back to you. All right. Any last questions for anybody? So my last question is going to be given that you guys are a very specialized storage. It looks way different than anyone else. Is there anything beyond, you know, at that storage like solution layer, so not like SQL syntax stuff, is there anything that you've seen and another database system that you wish feature-based would like would implement? Like if you had a magic wand, didn't worry about engineering time, you just poofed we have it. Is there anything that you guys, you wish you had? I'm sure there is. I just can't think of it. Can't think of it without my head. No, that's fine. I mean, you're in the weeds in your system. So it's like, you know... Yeah, I mean, there's lots of stuff that we'd like to be able to do better. Like, I think... Let me answer it this way. I would love to have a way to better let people take advantage of the strengths of this, of bitmaps in the context of a database like the way that we've sort of set out to do them and do them today without having to write the rest of the database because that turns out databases are hard. Yes. Yeah. Of course. And then you're building a one that doesn't look like anything else and it's super hard. Yeah, that's it.