 So today we have Arjun Narayan, who is the CEO and co-founder of Materialize, and as I hyped up on the mailing list, Materialize is probably fair to say that the hottest database start-up right now is that, that's a fair statement. I'll take it. Prior to that, he was a software engineer at CockroachDB. And then prior to that, in his past life, he was a PhD student at Penn, working on differential privacy, which is somewhat related to or is related to what Materialize is all about. So I met Arjun, I guess 2016, when I visited CockroachDB. And then they came and visited Carnegie Mellon in 2017. And I distinctly remember at dinner, you said to me point blank, oh, I think the timely data flow from Frank Machere is very cool. I want to put that in CockroachDB. You remember saying that? I do remember saying that. In fact, I said that to basically everybody at Cockroach. And they said no. Right? Well, it's not that they said no, it just wasn't, you know, aligned with the product they were building. It's sort of its own thing. Right. So he dropped the mic and then went off and built Materialize. So with that, Arjun, go for it. Thanks very much, Andy. So hi, everyone. Welcome to this quarantine talk. I hope everyone's doing okay. And before I dive in, I want to call out that, you know, you can try and Materialize right now. You can download it right now, you know, from Homebrew or just going to the GitHub. And, you know, you might want to download it before we run out of stock. The other day I went to the Oracle store and they were rationing OLTP to just two TPMC for a warehouse. It's pretty bad out there. Anyway, to the Oracle lawyers listening, that was a joke. I swear on David Dewitt's honor that I didn't run any benchmarks on your database. All right. To business. For agenda today, I'm going to first talk about like streaming databases, what they are. Then I'll cover a little bit of background on the streaming ecosystem and timely data flow, the data flow engine that's at the heart of Materialize. And then finally, I'll talk about Materialize and then I'll give you a demo of it actually in action. I want to add that I'm open to questions, heckling, interruptions, everything. I have a white board set up in front of me. I can free from draw just as if we are in a classroom lecture. So, like, just go for it. All right. So what's the streaming database? What I mean by that is that instead of being optimized for processing ad hoc transactional or analytical queries, it's optimized for view maintenance on an ongoing basis over streams of already processed transactions. And so, like, let's sort of rewind the clock about 30, 40 years and talk about, you know, what data processing was like and sort of build up some intuition for why this is a thing you might even care about or want. So traditionally, the world was divided into two broad categories. So OLTP, online transactional stores and OLAP or online analytical stores. And of course, you might have some different flavors of these. You might have like a time series OLAP database. You might have like a graph OLAP database or a graph OLTP databases. But broadly speaking, most systems pretty much fall into one of these two categories. And they're sort of architected very differently, right? So the systems on the left, OLTP systems, they're designed for high concurrency, lots of transactions happening simultaneously, you know, lots of rights, lots of reads, some of these grouped up into transactions that have to be atomically committed or rolled back. Meanwhile, on the right, you don't have right or concurrency, but you have lots of other different things. So you have things like, you know, very complex, large transactions, multi-way joins, subqueries, aggregating over large dumps of historical data. And this is sort of like a different set of constraints, a different set of things to optimize for. So one could imagine that one ends up in a widely different part of the design space when you look at OLTP versus OLAP, right? And I'm sure like folks in this database seminar, like get that and are quite familiar with it. So just sort of running through some sort of give some intuition, mostly because this is the intuition we use for benchmarking these systems, like imagine you're running an online store, right? So like you're taking orders, you're keeping track of inventory, you're shipping these orders out. This is the TPC benchmark view of the world. The OLTP systems are used to keep track of live inventory, right? Like the major concerns are that you don't sell something that you've run out of stock. You ensure that when you ship an order, it's reflected transactionally in the shipments table so that, you know, if you have multiple people currently shipping out outstanding orders, no two workers ship the same order. You know, that's sort of constraint. And on the other hand, the OLAP systems allow business analysts to answer questions about the company as a whole. So someone at HQ, some sort of analyst is, you know, trying to figure out our sales up this quarter, you know, you want to like seasonally adjust the inventory stock levels and look at it year on year, you know, ask questions like what is North America sales versus Europe sales? That sort of thing. And that sort of query requires grinding through all of your data, potentially like joining a bunch of tables, things like that. And the two systems sort of end up in radically different physical layouts and execution engines because they're fundamentally optimizing for very different tasks, right? So the systems on the right pretty much operate on static data that's infrequently updated. They don't really do locks or don't really have to worry about isolation a little bit, but not the way that OLTP systems have to. They're optimized for really, really fast reads over lots of data. And on the other hand, the OLTP systems pretty much can't handle any large transactions. They're pretty much designed for getting people in, getting them served and getting them out and never losing data and never lying to anyone, right? And in the old world, the stuff in the middle, the ETL, it's just like there to get stuff from the left to the right, right? So ETL sets for extract, transform, load, and that's exactly what's going on. If you have two different formats, because they're optimized for different ways of storing things, let's just say, you know, to paint a broad brush oriented for OLTP and column oriented for OLAP, you're taking stuff from the format on the left and transforming it and loading it into the format on the right. Now, the main problem with this, like, why would we even want to do something different from this sort of workhorse architecture? Is that OLAP systems fundamentally are working on outdated views of the world. They're looking at day old stuff. And sometimes for some companies, it's several days old. And even at best, it's often like hours old. And there's lots of useful things you might want to do that requires more recent data than a day old data set. And so, you know, the challenge for most people is when they're trying to architect some application or some use case, if they're traditional architect, they can connect directly to the OLTP system or perhaps connect to a read replica or something, they can do up-to-date reads or they have to make do with stale data. And if they do the up-to-date reads, they pretty much have access to it in the form that is optimized for those OLTP transactions. And otherwise, they just have to work with old data. Now, look, this isn't a new problem, right? So people recognize this even in the 90s and you definitely had some technologies for shipping data around from OLTP systems. You had these enterprise services, you had enterprise application integration platforms, and roughly, those systems evolved to what I would call sort of this idealized version of the architecture today. You have broadly speaking two pipelines. You have a streaming pipeline for getting data out in real time. And then you have a batch pipeline that's very similar to what we had before, right? The streaming pipeline sends data live, whereas the batch pipeline still takes the hours. And now, this is the idealized picture. And let's look at what it is today, right? So the reality is, while the batch pipeline's gotten a lot better, the streaming pipeline is very, very nascent. You're pretty much on your own in terms of building things. The official term for this is microservices, but I think the more technically accurate term is the Wild West. A lot of writing your own join algorithm and application code, why don't you do that? Use a NoSQL store on the side for state management and scale it up and scale it down yourselves. And these are all sort of papering over the real problem, which is that there really aren't tools that help you out here. Who's this dude? Sorry? Who's this dude? There's this YouTube video. This is a screenshot from a YouTube video. I recommend you watch it. I think it's supposed to be a parody video, but I think the more apt term for it is a documentary. It exactly captures the hellish world you get into where you have to talk to about 14 different systems to even get the state that you need to compute the view that you're trying to compute. And so our goal really is to make all of this a lot saner. So there's so many of these microservices that morally speaking, for a database audience, would recognize they're just computing and keeping up to date with materialized views. Sorry, my slides keep flipping. Apologize for that. Is this PowerPoint? This is keynote. It's just my iPad's really sensitive, so it's just like. Oh, it's not the auto thing. OK, yeah, yeah. I just have to not touch it. Now, morally speaking, like I was saying, these things are a lot of work, not every microservice. There's microservices that are doing transactional things. But many microservices are, morally speaking, just computing materialized views over changing streams of data. And the problem is existing databases aren't really architected for efficiently maintaining materialized views over rapidly changing data. And that's exactly what materialize is all about. It's a database that's just purely optimized for view maintenance. And I think this is something new with appropriate caveats for when you're giving a database stop, everything was already invented in the 80s by somebody, et cetera, et cetera. But commercially speaking, this is fairly novel. Again, the term materialized view is a new one. Like many OLTP systems and many OLAP systems often have support for materialized views. However, systems like materialized that are designed for view maintenance can often handle, and this is our contention, that it can handle higher loads for workloads that reissue the same queries over and over again against fast changing data. And I'll get more specific about this later on the talk. But ideally, we want to do work proportional to the amount of data that we have to keep around in the final result, rather than in proportion a number of times that the result needs to be inspected. So just because 1,000 people issued the same query or the same service, say a dashboard, issues the same query 1,000 times, we don't have to pretend to be surprised 999 times that we've seen this join. Like we can maybe go ahead and incrementally maintain some amount of work. All right, now before we start to talk about what materialized does, I'm going to give you some background on the streaming ecosystem and sort of where materialized fits in. And some disclaimers aren't necessary here because these are my opinions, and they're a little bit subjective. But hopefully, they can help give you some intuition for one of the top five. So what's fundamentally different about online view maintenance? First, queries are long running. So in contrast to, say, OLAP systems where queries are optimized by a query optimizer at execution time, streaming queries need to be optimal for approximately forever. So all TP query systems or OLAP systems can make a best effort guest to a query plan and then sort of collect front-end statistics on subsequent runs of the same query plan. They can switch up the plan. And so if you're issuing 1,000 queries and you can replant 1,000 times, but in a OLVM system, once you quit the view, it's pretty much replanning involves shutting down that view and starting over from scratch. So this means that query planning is a lot more, it's a lot higher stakes game. And second, query planning is itself a lot harder because existing query planners and, say, OLTP systems will maintain an evaluation context and oftentimes bail and just use that context to rerun a query sort of de novo, say, a subquery. You just don't have that option in a Dataflow engine that needs to statically have a query plan fully planned out physically from the very get-go. It also means, say, something like error handling is a lot more difficult. The show must always go on. You just have to find some way to keep making forward progress. This also means that there are no table statistics. Most of the query optimization literature in OLAP systems is totally oriented around the idea of getting really, really good cardinality estimates of your tables and using that to choose the perfect query plan. But streaming is a little bit like entering the matrix. So you can't think about picking the perfect join ordering based on all your perfect cardinality estimates. You first have to realize that there is no perfect join ordering. You have to do all the join at once and be very robust to individual streams changing, while at least swinging the number of events. So maybe joining a slow-moving stream against a fast-moving stream. And all of a sudden, the slow-moving stream is also a fast-moving stream. In traditional, sorry. With your streaming workloads, isn't it oftentimes like, say, I realize I don't think of a data warehouse view of it, but there's the fact table and then there's the dimension tables. The dimension tables would be static in the streaming world. And then you're getting new entries to the fact table. So in that case, you do know something about the dimension tables. So you could have some statistics. Or are you saying that in materialized, that there are no fixed static tables? So one challenge here is like in a traditional OLAP context, you'll absolutely have that. In streaming, oftentimes, people will do things like they will reissue the dimension tables because those have been updated. And so you'll suddenly see a whole lot of changes on your dimension tables because they just got sort of back ETL updated. And this can be catastrophic for some query plans that end up being quadratic if the thing that you assumed was not changing at all suddenly changes on you. Now, an OLAP warehouse wouldn't care because it would just throw everything away and replan everything. It doesn't really run into this problem. But in the streaming world, things that don't change can occasionally change in ways that would be catastrophic for your query plan. Does that make sense? I think so, yeah. But I'd have to sort of see. I can see people doing stupid things, but is that common enough? Like drop the whole dimension table and then load it back in rather than doing an incremental update? Yeah, I mean, unfortunately, it is common that that's how you, sometimes it's not even your fault, right? It's like an upstream system's fault that's doing like a daily batch dump. What do you do if someone's issuing a backup and reissuing? Yeah, OK. And again, this isn't like a lost cause, right? It's just a set of constraints you have to optimize for. And honestly, some things are easier here. For instance, the rights are all ordered for us by the stream processor, right? Just like in OLAP systems, there's no real concurrency control that we have to do. Events are pretty much pre-ordered. Our task is more to keep up with the events as they happen at really high data volumes rather than do tricky admission control, locking, maintain serializable histories, things like that. And then finally, we're totally going to restrict ourselves to query patterns that are relatively well-known. I say relatively well-known and mostly repeated because we do want to be able to do ad hoc queries. What I do want to say is an OLAP warehouse will probably beat us on truly ad hoc random query workloads on fixed batch data. But you still want to leave some room for doing ad hoc queries with similar patterns. They think like queries that we use the same indexes. All right, what do we want from streaming? Now, this is where no holds barred. These are my opinions, right? Some of these folks may not agree with. First, people want SQL, right? So writing streaming applications should be as easy as writing a crud app. You just write decorative queries about what you want and it should just appear. Writing imperative code should be a last resort. And when I say SQL, I really mean actual SQL, like the horrendous stuff, like the stuff that's really buried under the dark corners, the eight-way joins, the group buys with halving clauses, the subqueries, the correlated, uncorrelated, all that jazz. And hopefully I'm speaking to fellow cult members in this talk, but I sort of bring this up because a lot of existing streaming engines don't really support SQL without full support for arbitrary joins, arbitrary non-window join condition. It's not really SQL. Second, this sort of manifesto is like, if there's no change in the data, don't do anything. This seems like something somewhat obvious to say, but existing stream processors have these massive hardware for prints, even when they're selling relatively low amounts of data or not, just because the queries are very complex. To rephrase this sort of a little bit, it should behave a little bit like Postgres on your system on your laptop, right? Like I'm running Postgres on my laptop right now. I'm not issuing any queries against it. It just sits there quietly in the background. Stream processors could also behave this way. And then third, joins are absolutely crucial. Windowing your joins should not be mandatory. And a little bit about this. Existing streaming frameworks today mostly require that the streaming joins be windowed along a temporal dimension. And what this means concretely is if you have an input stream that's changing over time, the join condition is only evaluated over some fixed window of data. So I'm sort of gonna draw this out a little bit. So if you have two input streams, right, and I've got time coming down, and this is input A and this is input B, right? Like sort of you keep getting new events over time. And if we're joining these two together, the join is only evaluated sort of on some window and then this window moves over here and then it window moves over here. And this is just like all wrong, right? Like a join means, you know, any event here can match with anything to the entire history of the other stream. And do you support like the streaming SQL window stuff, like the tumbling windows or the sliding windows semantics or? Like we intend to, we don't consider that hard. We consider like the hard part is supporting joins that aren't windowed, right? The joins that are over the entirely entire, like frankly, like windows can be mostly or, you know, just express your join as a join condition over a door, over your streams, like, and if it's possible for us to only sort of keep materialized some small subset of it, like leave that to us, right? Like we'll take care of that. You just like declaratively say what you want. Yeah, absolutely. I always wondered like how many, like reasoning about the window types and shit like that, like I don't think most people can do that. So I think like not having to declare is awesome. And fundamentally like the existing systems force you to do that because they're not capable of dealing with unwindowed joints. Like it's a little bit of a sleight of hand where they say, hey, you can window your joints. Like I didn't ask if I, if I can't, like I want to not do that. Does that make sense? Do you want to name names? No, I would like that. I would tell you these funds, Chris Anthes, professor at the University of Pittsburgh, I will raise some exceptions to your statements. Actually the streaming window or the window operators are because they support a specific functionality. What you are advocating, basically you are going back to streaming, processing of traditional databases where now you are creating your windows or your snapshots in order to achieve these notional false streaming windows. So there are two different functionality. It's not that one got it wrong and the other got it right. At least the way they understand your statements is you're trying to breach and transfer technology known from the traditional databases into the streaming. And of course, as you know from the query processing, you have the streaming mode of evaluating queries. So this is exactly what to at least understand you are advocating. Am I wrong? I think that's fair. Thank you. Andy, any follow-ups to that? No, I didn't. Keep going. All right. All right, so let's get to the materialize, right? So now we've got a little bit of the religion out of the way. Let's talk about like how we deliver on these sort of desiderata, right? So materialize is built on top of two projects, timely data flow and differential data flow. The way to think about this sort of layered architecture is timely data flow is a streaming compute engine at the heart of materialize. It's a scale out, stateful, cyclic data flow engine. It allows folks to write arbitrary Rust code as operators and these operators run as part of data flows in a scale out cooperatively scheduled fashion over a large cluster. More on this later, but just remember that these, timely data flow has no opinions as to like what operators you write. You write arbitrary programs that maintain arbitrary state and sort of pass messages along. The only interesting things about this operator that timely data flow has is that along with input data, these operators are fed timestamps, right? And these operators, you know, they need to occasionally relinquish the capability to emit outputs at some times for the system as a whole to make problems. So you can think of an operator is getting taught like inputs at P1, P2, P3, and at some point the operator makes a statement, you know what, here's some output, I'm done with T2, anything I, any output I send will occur at times T3 or greater. On top of that is differential data flow. Now, timely is an opinionated about the operator, but boy, differential is a very opinionated set of operators, right? Just because timely let you write arbitrary operators doesn't mean you should. Differential operators should be familiar to the audience here, right? Like there's operators named join and aggregate, filter and map and things like that. Care has been taken to build minimal tasteful operators. And one additional operator that will not be familiar to the audience here is called arrange. You can think of it sort of as like an index building operator that takes care of efficient state management for any other operator that wants to outsource state management to the arrange operator. Kind of a workhorse operator in differential data flow because a lot of other operators, for instance, join, like it has to build large stateful indexes in order to deal with large historical windows of data that it must evaluate joint conditions over. And then finally at the top, we have materialize. So materialize is probably the thing that looks the most familiar to this audience, right? It does things like handling client connections, maintaining catalog of tables and views and I shouldn't say tables of streams and views and things like that. It does parsing, planning, optimizing of input queries and constructing these data flow plans from these input queries. It definitely has to do some things very differently than the more familiar sort of traditional databases. Constructing data flow plans is a little bit different from constructing like volcano plans, but it's sort of close enough, right? Did your stream require you to have like punctuations or like the guaranteed delivery of time stamps at different fixed intervals? Like what is the streamed semantics like thinking about like the early streaming stuff for like the early 2000s? I'm not sure I'm familiar with that. You might have to ask more, you ever have to say more or if we could also punt to Frank who has a sort of encyclopedic understanding of the literature. I mean, PONOS, like the punctuation stuff was like, like I'm guaranteeing that every single often an event shows up. There may be no data in it, but it says like, this is the boundary of a window. So you mean like you get an event at a time T and you have a punctuation that says like, you have seen every- We are now at time T. Yeah, you totally need your inputs to tell you when they will not issue you, when they are done issuing your inputs as of a timestamp. And if they don't, I give you that and you can sort of make some assumptions but then you may have to sort of go back on your word which means you don't have correct answers. Like punctuation is necessary for sort of correct answers or isolation guarantees. In practice, there's a lot of systems that just don't have punctuation, right? Like if you're doing change data capture from an old TV system that has transaction IDs and you're passing those through as punctuations then everything's going to be fine and dandy but oftentimes as we've discovered existing middleware often throws that stuff out and then we have to make assumptions. Does that make sense? Yeah, keep going, good. All right, so to understand timely data flow let's like talk about sort of traditional streaming systems. Like the way they mostly work is they take their data flow they have data flow operators too and they partition the sort of one data flow operator per worker, these my worker typically, I mean like a compute core, like a physical CPU and they pass messages on to each other. And this actually can get very expensive for complex queries even on a relatively low data volume. So this way of sending data around can often mean that relatively small amounts of data just like passing no ops through the whole system until it comes out the other side. And if you have a very complex data flow graph that that's a lot of busy work for a little change. In contrast, timely data flow, timely data flow, hold on, yeah, is timely data flow cooperatives cooperatively schedules every operator on every worker and then shards the entire data flow graph by queue. So this thing scales down like the interesting part here isn't that it scales up it totally scales up at very large query ones. But what it also does is it scales down, right? You can have a single core version of timely data flow that as if some of you are familiar with Frank cost paper literature outperforms other big data systems that sort of are given an arbitrarily large number of compute nodes to compete. Now, the other thing other sort of principle of timely data flow is that the timestamps drive the movement of data and all data items sort of are sent through operators with an attached timestamp and the operator as I said before can emit sort of data at that timestamp or can also after it's done say, I'm done with this timestamp and sort of the link which is the right to ever emit outputs. And there's no corrections. Like I can't come by like a minute later and say, Hey, remember timestamp T one? Here's actually a correct value for it. No, it can't do that. Materialize would reject that. Materialize would reject that. You mean as like an input to the system? Yeah, like on my stream, like I guess you basically say the stream cannot send updates. The screen can totally send updates. There's a difference between the value of key K has changed from V1 to V2 at time T3. Like you can totally pass that through the system. It's just like, don't tell us the time at which that happened and don't sort of retroactively go back and rewrite history. Does that make sense? Like updates are totally possible in the system. But updates happen as time advances. But it's timestamp T, it's timestamp T10. But now I get an update record that says at timestamp T4, I totally, the value was six, but now the value is seven. Why would you ever want to do that? Sometimes you don't know the value of the trade until after the trade occurs or something. I've seen that example in the fine check. Well, then you can, you know, so we have what you may want is a multi-temporal timestamp where a timestamp can have both an event time and a system time dimension, right? So that allows arbitrary historic, like we can totally support that, right? So, sorry, let me be specific. We can support that at timely and differential data flow. Support for that in materialized has not yet been added. Excellent, okay, cool. Yeah, so differential data flow on top of sort of this build these data flow operators that are fairly opinionated and, you know, these execute. And on top of that, materialize builds the SQL, the SQL parsing planning and execution that we all know and love. The catalog, a coordinator to coordinate all of these data flows being installed and uninstalled, et cetera. So, you know, let's talk about sort of things we really care about. So writing performant data flow programs is very hard. And this experience has taught us that that's all the more reason we can't expect application developers to actually do this in their day-to-day. They have even less time to ship than we do and they have, you know, other constraints. Second, the workhorse of differential data flow is the arrange operator that does efficient incremental index maintenance. Now most other stateful data flow systems, like the outsourced to state management to sort of a cyclical instance of RocksDB per operator, right? Like, I love RocksDB, but RocksDB is a poor choice as operator state manager for streaming systems because it's optimized for objectives that are sort of orthogonal to stream processing. Like it's very good at durability, at high read concurrency and all these things that probably orthogonal to needs of a stream processor. Sort of phrased another way, like RocksDB, you know, it has additional compaction threads, right? To efficiently maintain its log structured merge tree. But efficiently compacting state is primarily the computation task required of all data flow systems. The choice of when to schedule compaction is the task, is probably the most important task that must be considered alongside all of the others of computation tasks. And has to be sort of fused in sort of, in with the other operators that you are scheduling in your co-operatively scheduled data flow system. So is RocksDB a bad choice because it's an LSM or is it because it's a full-fledged storage manager that's meant for other purposes than what you're trying to do in materialized? The latter. So the range operator also maintains an LSM sort of under the hood. The problem is that, you know, most other streaming systems have punted on state management or sort of dealing with when to do compaction and things like that by outsourcing it to a full-fledged storage manager that was built from the ground up to be the storage manager for an OLTP system. Yeah, okay, okay. When really, like, this is the hardest part of building a stream processor. And then you sort of have to be very intentional about what you compact when. Yeah, cool. And then sort of finally, sort of, another thing we've sort of come to believe in is that SQL really requires 100% coverage. Like there's a lot of, you know, SQL inspired by SQL or almost SQL or SQL except the joins sort of thing. And like that really doesn't work for the users because the abstraction layers require you to be able to forget about at least at least to some extent what's going on under the hood, right? If you have to think, if you have to mentally think, you know, what is the underlying implementation going to do? Like you'd never be able to sort of make progress unless you like fully understood the underlying system. And at that point, the abstraction layer just sort of gets in your way, right? Like when I write a SQL program, I don't think about the number of, you know, x86 instructions that happen on their hood unless like I'm all the way off the deep end and writing really, really, really high performance code. Most programmers are able to live at the abstraction layer. And we put a lot of effort into getting to very, very close SQL fidelity. And of course there's a few instances where, you know, it's apples to oranges like the streaming setting is not the old database setting where you do have to make some changes and you can't have full functionality, but like those have to be the very, very, very minor exceptions. Can you give an example? Window functions. Window functions have because inserting a change to change an entire window, like change the position of so many different rows that you end up, it's better for the user to not think that way in when dealing with streaming systems and when they can replace window functions with some other form of writing their query, they're better off doing that rather than us trying to figure out how to optimize away these massive amount of internal changes that have to be shuttled through the system. I mean, how much that covers? I mean, let's be honest, like the SQL standard is six books. Nobody has 100% coverage. Yeah. So like what's the bare minimum you support? Like. Joins, subqueries. CTEs. CTEs. We don't support CTEs yet, but like we absolutely plan to and we should. Like these things are sort of necessary to port, you know, most of the education and expertise that people have built up in terms of dealing with SQL. Right. All right, cool. All right, let's jump to a demo. I had a demo open somewhere. It's my zoom. Oh, there we go. All right. All right, so let's take a look at what materialized looks like. So first off, like materialized today pretends to be Postgres, right? So this is sort of an opinionated choice because we love Postgres and also a lot of tooling, existing tooling runs on top of Postgres. So the first thing you'll see is that, you know, you just connect and materialize using PC SQL. Second, you know, to, let me pause this video a little bit and give you a little bit of context. I'm gonna set up a demo where I'm running a MySQL instance in upstream that is running the TPCC benchmark. Now, like I hinted to a little bit earlier, the TPCC benchmark is, you know, there's an online store and orders are happening all the time on some upstream warehouses, orders are being taken, orders are being shipped, sort of all of these things are happening transactionally and then all of the transactions that actually commit are being flowed through Kafka into materialized. So Kafka is a stream processor that's just sending us all of this and it's sort of one stream per table that's being modified. And over here, you see these are the tables, there's a customer table, the district table, item table and so on. And we are going to materialize some queries over these tables, right? For folks who are familiar with all that benchmarking, we're gonna be materializing TPCH queries. So if you take a look at the customer table, there's, you know, customer information, payments, things like that. And we've installed a few views. Now, we are not yet at 100% coverage, but we are working very hard to it. These are of the 22 TPCH queries, we've installed some subset of them as views that are being incrementally maintained under the system. Now, when we say incrementally maintained, like they are being actually materialized as the underlying data is changing with every update. They're completely queryable and they are live connected to Kafka. So if you just do a select star from query one, you're getting, you know, exactly the query as TPCH query one, absolutely up to date as of this second. So your show view, show tables is throwing me for a loop because that's MySQL syntax. We actually support, so we support both the Postgres syntax and the MySQL syntax for these things where it's just like quality of life. People just like type the thing and they're like, why did you want me to do a backslash, you know, whatever, so it's just like add support for both in the parser. So it's Postgres plus plus, it's like things that, again, this is a little bit just quality of life improvements. All right, so create view test one, you can create additional views downstream of these views. You can chain all these views together. And this one's really interesting, right? So take a look at this, right? So it's create view test one as select sum. So maybe you're selecting the sum of the count order all the time and you just want to sort of remember that, you know, you can create a view for that. Now, if you create a materialized view of this, you're going to get a lot lower latency, right? So the query patterns that you expect to run over and over again, install materialized views for them and the speed at which you can query from them will be much, much faster. All right, we're just going to drop those views that I spun up. All right, and this is query two. We're just going to select from that as well. Again, you can do all the things that you typically might do interactively as sort of additional predicates when selecting from these views or you can install that limit all the way up as its own view so that it is pre-computed for you. This is TPCH query five. For those of you who are not familiar with the TPCH queries, almost all of them are sort of picked from query patterns that are a little bit adversarily picked. So, you know, there's six-way joints, there's eight-way joints, there's correlated subqueries that you really can't efficiently execute unless you perform the correlation in your planning and optimization stages, things like that. This is a video, right? You can't run explain, right? Yeah, this is a video because I don't trust my laptop. As CEO, I never run any code, so my laptop's a total junk. So, I had Frank run this for me and send me the video. But like, so if we downloaded this, like would explain work and actually show the plan? I mean, it'll show you the dataflow plan, the dataflow explain works and it will show you the dataflow that we run. And you can sort of inspect that, that there's multiple different flavors that explain just as in Postgres. It's not apples to apples to Postgres because it's not doing the same thing under the hood, right? So, as Frank has shared in the chat, you can look at the SQL docs for explain, it's here to see precisely what we will show you. Another thing I want to call out here, I'm going to pause here for a bit. When you select this, you can also select the internal timestamp at which this query was executed. Like I mentioned before, we're flowing timestamps through the system alongside every piece of data. And when you select a query, you are getting that query executed at a single timestamp across that query, right? Like this is important because, you're getting essentially snapshot isolation guarantees as the underlying views are being updated, right? Like every row is coming at exactly the same timestamp. And we added this other thing, which is in the SQL standard, which we think is really cool, called tail. And what tail does is, it'll, you can run select star and get you the result, and you press up, enter, up, enter, but you can also ask the system to, hey, like flow me the diffs as and when they happen. So, what this is telling you is, at timestamp, blah, blah, blah, four, seven, four, the second, last line, you know, retract the row where I said supplier 452 had this number and then also add in instead, one, nine, five, six, two, two, three, as their revenue, right? So this is a revenue, slight revenue bump of what looks like a little under $500 for supplier 452. And this is being sort of eagerly pushed to you. And you can also do tail with snapshots. You get the select and then you can sort of get subscribed to changes that are happening. This is not quite, you know, SQL standard, but it's still, you know, we think very cool. You can also just do the up, enter thing that folks may do. All right. Do you support, subscribe, notify? That's on the roadmap. We don't support that right now. It's sort of an obvious one to add in support for. Do people ask for it? Yes. Yes? Yes. Like there's no people actually using that functionality in Postgres today. Yes. Interesting. Okay. Awesome. People love that. On the other thing I wanna call out here is, sorry, I just wanna sort of rewind a little bit over here. You, we have these internal logging views that we keep running alongside the views of, the sort of data views, right? And these allow you to introspect sort of like, how many records are in the view? And, you know, some views are very large. Like query five has a lot of intermediate state, but take a look at query six. It's just maintaining 229 records, right? So the memory footprint of maintaining these incrementally updatable views can be quite small, right? It's not, all of this is running on a single node, like it's totally feasible to run this over a large sets of data. We have some additional sort of, there's an additional blog post that you might wanna check out that talks about how this can often be surprisingly very, very small. Incrementally updating can be proportional to the size of your output, which may just end up being a very, very small number of things, say you're keeping a leaderboard of the top 10, like we don't have to keep very, very much more. We have to keep some amounts more, but not too much more. It's certainly not proportional to the input size. So the, I mean, there's a buffer pool, right? So you're able to swap pages in and out, right? Or is everything- Is there actually a buffer pool where you write things up to disk or does everything have to fit in memory? It uses just system OS swap as buffering up to disk. So it comfortably buffers up the disk and maintains fairly clean performance even when you get your resident sort of set size to hundreds of gigabytes. You're using M-Map. Yeah, pretty much, yeah. I believe we're literally just allocating memory and not only using M-Map, we're just using- Do you have a single memory? How many writer friends do you support? Sorry? Like, is it a single writer thread? So one thing that's different here is that each core of the data flow system and timely data flow maintains its own independent sharded state. So each, so if you're running this on 16 threads, you have 16 separate threads of execution that are all maintaining their own state and none of them are, they're message-passing state between each other. That's why you can do M-Map. Yeah, again, we're not doing M-Map, we're just using virtual memory. I mean, it's anonymous M-Map. I don't know, it's basically, yeah, it's, yeah. Keep going. Yeah, no, then that's pretty much the end of my demo and the end of my talk, so I'm happy to open it up for questions. Okay, I have a lot of questions, but why don't everybody else want to chime in and go for it? At the end. I'm gonna leave it open here. Un-eat yourself and then say who you are and where you're coming from. You might want to run some- Oh, there we go. Go for it. My name is Sai, I'm a PhD student in the University of Buffalo. My, great talk, by the way, and my main question is, so this data flow actually uses multi-way joins, this worst case optimality to kind of do some computations, right? We do not currently use the worst case optimal joint. If you're referring to a specific worst case optimal joint algorithm, we do not. We are using multi-way joins of a different flavor. I can point you to the content that explains this. There's some on Frank McSherry's blog on sort of the precise algorithm we use. We do not currently use the worst case optimal joint algorithm if there's the specific one. The, Andy, do you know the one I'm talking about? Yeah, I mean, I think this NPR or the logic blocks stuff. Yeah, no, we don't currently use it. That stuff is really cool, and we do hope to sort of use that in the future. So you are using some variant that uses the best ideas from multi-way as well as Parvati, sir? I don't want to say bests. I don't want to say there's anything optimal about what we're doing, right? What we are doing is sort of, we do process multi-way joins together without having to resort to a tree of binary join operators. Beyond that, I'm not going to make any statements about whether what we do is optimal, because it's not the worst case optimal joint algorithm, which is better than what we do. Okay, thank you. It sounds like it's a quick and dirty something to get up and running. A little bit of that flavor, yeah. Yeah, that's fine. At the same time, it's like, you know, it's been able to maintain views over a TPCH query, so it's pretty good, so. I don't want to talk it down too much. Most systems don't do the multi-way join stuff, so anything you have, the basic is probably better than what's already out there. Yeah, I'd like to make that statement and no statement further than that. Yes. Okay. All right, anybody else? Hello, I'm Gastoninos Costa from University of Pittsburgh. I have two questions. The first one is about consistency. What is the idea of the consistency in your system? And the second one is about the, I think that the materialized idea is using versioning, right? And you are combining versions in some cases or not? I don't fully understand the second question, but let me ask you your first question and then throw it back to you. So consistency-wise, we compute correct answers based on the inputs that you give us. And so all answers are exactly as of a single timestamp. If the inputs you're giving us have timestamps attached to them, then we can give you exactly answers at specific timestamps that's essentially snapshot isolation at all times. I mean, I don't believe we can claim anything stronger given that we don't look at, we don't do concurrency in our input since they are ordered for us. And I'm gonna ask you about sort of the second question. Okay, thank you for the first answer. For the second question is about combining different results in different timestamps. Let's say I have a window of five seconds and another window of five seconds. Can I combine them to get the 10, seven window answer? I'm not sure I follow if I fully understand the question. So if you're saying that- Yeah, you're saying you have a window at five seconds and then there's another window at 10 seconds and you want to basically combine them? Yeah, this is the case. So do you want to- So I think you're talking about if you're using like streaming SQL with like the tunneling window semantics or sliding window semantics. No, we don't support that. Okay, thank you. Do people ask for that? Like the streaming SQL from calcite kind of stuff? No. Okay, that doesn't surprise me. Hi, so yeah, so I have a question. So the interesting thing about like you showed in a demo. Who are you? Sorry, who are you welcoming from? This is Don, I'm from University of Utah and I will be in Pennsylvania in a few months or whatever. So the question is- You'll be new faculty at Penn State. Come on, you got to tell us all. I don't really care that much about that anyway. So the reason why I'm asking this, I'm actually checking a project. And according to our demo, one thing is interesting, you're doing a demo on TPCH and also like awesome like pretty much, I will say a little AP queries that you get some like aggregations out is more close to the feeling that I feel more like a data queue-ish stuff. So like my understanding of like materialized view are more like, you know, I select like a slightly seared like piece of data with some like filter with some basic transformations is a group of records rather than just a few, you know, aggregations. So I'm wondering if you support that and if you support that, how can I actually do this, you know, like a very efficient way because a lot of times this can be really messy. So can you say a little bit more about the query pattern that you're asking about because I'm not sure I fully understand it. So let's say, let's say like you ask for like a TPCH query but you just dropped a final part. You dropped the aggregation part. You're simply just trying to get, for example, do some grouping and I don't know. Are you trying to get the whole table back out? I mean, the group by is essentially the aggregation. Like what are you at like? Yeah, group by is one case. For example, like how do you have like group by? Like it seems like when you actually showed us out is actually like one timestamp was two like numbers out and like another timestamp was another set two numbers out and another like timestamp wasn't another time like two numbers out. It can be for example, like I asked for a query that actually pulls out a subset or data or the subset of aggregations of our data. For example, I have a thousand groups and every time when you update and materialize real, you actually have a thousand Lero's what are the two numbers? That's what I'm talking about. And like even it can happen in this like weird situation when you actually add a new group by like having more and more like records coming in, which is also possible. So that's kind of my question. This because like you want to materialize something like in every single timestamp. It's not really just a few aggregations. It can be multiple records. Yeah, yeah, more. Yeah, that totally works. Right. Like the reason we picked the TPH queries for a demo is like we just pick the hardest queries we can find that to materialize. But in some sense what you, the query patterns you're saying would actually be easier for us to materialize. It's like, don't do the final grouping. It's like, yeah, we have to do all that work anyway. So you can do something like a versioning. For example, I asked for a particular version of this materialized view so that it kind of like piece out different versions and you'd make a like a combine all of these small pieces and go whole thing when you report to me. Is that what you're saying? Because the materialized view can't grow. The number of rows can grow from 10 rows to 20 rows to 30 rows. It's not really like a thing that you rebuild it every time you kind of do it incrementally. Is that correct? Okay, cool. Yes. Get out of my way. Thanks. Sorry, it's my cat. Awesome. No, that's it. All right, I mean, I have a ton of questions. I'll just ask the last one. What I mean, so you have like the time the data flow stuff that like the core part underneath it, but in terms of everything above above like the core execution engine and maybe also the storage. So like I'm talking like the catalogs. There's the query parser. There's the optimizer this binder like what aspect of the implementation was like surprised you the most of that like turns to be the most difficult because you're working in like a materialized view environment. This is this is your second time at the rodeo doing this because you did it at cockroach like the mimic Postgres and now you're doing it again. So we definitely benefited from like like like one of the reasons we're very confident we could do this is because you know of our team that are five of us have were at cockroach labs that that's at one point or the other. So, you know, we had we had a very good sense of like the scope of the problem and what we had to do. So, you know, there wasn't very many surprises there. What was sort of really challenging sort of and I hinted at this a little bit earlier was just like query planning is a lot harder in this setting because you really have to do everything possible to get out a static data flow graph and at least at cockroach and also in Postgres because we looked at the Postgres code as well like oftentimes they use an escape hatch where you know you you you get as far as you can but then eventually like you just sort of recursively call back and like push that push a new transaction through with like the remainder of the plan so it's a subquery or something and that escape hatch ends up being really useful for like these really gnarly queries and we just don't have that escape hatch because everything has to be perfectly completely unrolled to the final form before the execution can start and like once it starts there's no there's no going back and that that ends up you basically say like like if there's a subquery if you can de-correlate it or you can flatten it you like you always have to do that like in Postgres you try but if you can't say all right well I'll do the stupid thing and execute it for every single tuple. Exactly we can never do that and that's the hardest part and turns out everything can be flattened but you have to go down some fairly dark corners to beat up the query plan. Okay, interesting and so the materialized optimizer is written from scratch. Yes. Is it top down or bottom up? I have no idea. It might be both simultaneously. We have some thoughts about sort of improving the optimizer so it's not it's a little bit ad hoc at the moment. Is it so it sounds like it's not cost-based it's heuristics. Again like what cost would you use? Like what cardinal estimates do you have? I know so like a join for example like would you ever like do you support hash join versus this is like a servers join or is that not the way to think about it? I don't believe that's the right way to think about it. Okay, fair enough. You definitely want to make me make me. I mean there is an intermediate state you have to maintain but is it hash table or you assume it's sorted like there are some decisions you can make where a cost-based query output could help. Yeah. But it sounds like you've gotten this far without it. Yeah. Okay. It definitely things like sorry. It's what work will do for 10 years 20 years. So. You're the new Oracle. Basically the optimizer today like Frank points out on chat optimizes for worst case memory footprint and sort of it's definitely a bigger concern to not have your memory footprint grow very very fast. Right. So because you're going to have to keep that around for forever and we will trade off computation in order to keep our memory footprint same. Okay. So we will trade off doing more computation to keep our memory footprint same. Okay. Okay. All right. So it's 534. I have a small child. I am the biological father. I have to go help in this theory here. It's going to go on. So let's thank Arjun again for joining us and all of our friends from materialize. Are there materialized shirts that are available? There are materialized shirts that are available. Where do I send them? Well, I'll send that. We do have a separate email. But this is awesome. So again, everyone. So thanks for coming. We had a good turnout this time. Wait, wait, wait. There are 50 people on the call. So make sure that you have 50 shirts. Okay. I can help see a few people maybe people in Pittsburgh, but beyond that. We can produce the shirts, but distribution is somebody else's problem. I can drop by Pittsburgh and pick up one when I drive to Pennsylvania. All right. No, no, just to bring this up because it's absolutely great that we have 50 people across the country and participate in Greece. So it's fantastic. Thank you very much. Yeah. Thanks everyone for coming to this talk. I really, really appreciate it.