 Hello everyone. Welcome back. It's been a little while. I figured out that in the middle of this sort of International everyone stay-at-home Period we might as well do a stream and for quite a long time I've had people ask me like can you stream something about the work that you do there research Do you do and I've been a little bit hesitant to do so and the reason is that Working on a research project is a little different from the other coding projects We've done first of all there's sort of a large code base that's there already that there's a lot of intricacies to understanding but also because doing research is not just coding a lot of what you're doing is just sort of Thinking really hard or like drawing diagrams and trying to figure out what you're even supposed to do But but now that enough people have asked and and we have this opportunity where I'm working from home anyway I figured we let's just do it and see what happens I think the style of the stream is going to be a little bit different from the other streams in that They'll probably be more Explanation here and less direct coding although we will be doing both Because I'm about to dive into a pretty technical part of a pretty technical code base So I'll try to explain what I can as we go along, but please I'll try to monitor chat a little bit But if you feel like you're not following then let me know and I'll try to explain a little bit better where we're at Remember, I'm very familiar with this code base and you are not and so I'll try to keep that in mind What was the other thing I wanted to say here Yes, the other way in which this is going to be different is that I Have a solution in mind for the particular problem that we're going to tackle But I have no idea whether it's going to work It might be that we try to implement this and then it just doesn't like some other problem crops up And it's going to take like a week of thinking before we find a solution to it. This is sort of how research works and If that happens, then hopefully that'll still have been a useful stream But but let's just sort of keep that in mind as as we dive into it I Also want to mention that some of you might be in this video to sort of look at how does a database work in Internally and in a sense this stream is a bad example of that and the reason is because Norya is not like a normal database Internally it works fairly differently because it's really a data flow system that acts and pretends like it It's a database when it's not actually But I guess you're about to find out. All right, let me just see here whether there's anything we're going to start Oh, yeah, that's another good point How long is the stream going to be so this one it's going to be a little bit shorter as my plan than the normal streams The normal streams are sort of relatively open-ended and we we know that the thing we're trying to solve is going to take like Several sessions to complete whereas here my hope is that maybe we can solve this problem in this stream And I'm aiming for maybe like three hours But really we'll just have to see and see to what extent we get stuck and and crucially for me It's going to be how long does explaining things work because as I mentioned We'll be diving in really deep here fairly quickly And so there's a lot of stuff for me to explain before we even get to the technical meet Alright, so With that all said, how about we start the technical bit So the research project I'm working on is called Norya It is the sort of tagline as a dynamically changing partially stateful data flow for web application Backends, which is a mouthful to say the least but the the basic idea here is that The system arose from the observation that databases are not really built for the web applications that people write today when people write web applications, they're usually very read heavy not always but often and Database is sort of optimized for the opposite use case reads have to do all this complicated processing, right? They had to execute SQL queries have to I mean parse and plan and execute your SQL statements Whereas writes are very straightforward, right? The writes just do an insert into a table that sort of goes to disk And this means that the work that the application does the most takes the most amount of time and resources Which seemed backward to us And so what Norya does is something called materialized view maintenance And this is a there are many other systems that do the same thing and the the basic idea here is to sort of flip this model on its head and say that we're going to store the results of all queries and then we're going to Compute the result whenever the inputs to the system change So think of it as rather than executing the query when you do a read We're going to execute the query when you do a write and then sort of remember that result so that subsequent reads are fast And this is very similar to what many applications do in practice today, right? People stick like Redis or memcache D or something in front of their SQL server And sort of cache the results that they get out of the database or that they compute or the results of many queries potentially Just stick it in memcache D and then when you do a read on your next request You're going to first check the cache and then check the database if it's not in cache One reason why we wanted to Not have people do that and maintain their own cache and have the database do it instead It's because the database knows what all the queries are so and it knows the sort of dependencies between the different Data sources and the operators so in theory you can do a better job of managing that cache then The program or it can do it has all the information it needs in order to do that job And so it shouldn't have to leave it to the programmer to do all that sort of massaging and for every application to have to Reimplement it itself So Noria's Basically, you can sort of think of it as a an automatic cache although. It's really a lot more than that the other aspect of this so You see here. It's dynamically changing So the idea here is that if you add more queries to your system then Noria will automatically start managing those new queries If you have queries to compute roughly the same thing then Noria is going to merge as much as that compute like if you have Two queries that both do a join over the same two tables then ideally then Noria should only do that join once if a right comes Into either of the inputs to that join Noria is also partially stateful So this is a big difference from most existing materialized view systems and the idea here is that Imagine you have a sort of prepared statement query right something like Take a trivial example here of Select all from like Articles join votes And then you're gonna select out the count of votes So you're gonna group by the article ID where article ID equals question mark It would be insane for the database to compute all All of the join results for all articles ever and then maintain that because it most of the articles are not gonna be viewed again Generally is sort of a window of these recent articles get visited and older articles do not And so in Noria we have this notion of partially stateful data flow where only the results for parameters of queries that you have asked for get stored. So if you've asked for Article 7 then the current vote count for article 7 is going to be stored in in Noria's caches If you have not asked for article 16 then article 16 will not be in the caches And if there are rights for article 16 those will also not be incrementally computed think of this as basically we can support eviction So that's sort of the the setup here for what Noria is like which brings us to The problem we have to solve today There's a there's a bunch more stuff here and numbers and like we're far faster than my sequel Which is not really that much of an achievement because it doesn't take that much But it is really cool that it can do a lot of this automatically And now we're gonna get technical to try to explain what the actual problem is So let me draw for you an example I'm gonna start out with an example that's pretty straightforward and that does not have the problem that we're gonna fix today So this example is one that you'll see a lot Even in our research paper we have this so imagine that you have to Think of these as bait base tables So it's gonna be article and vote and these have the schema you would expect right article has like an ID It has a title and a body and vote has something like an article ID and a user ID and then Imagine that the user wants to compute the vote count For each article right so this is gonna be an aggregation Think of data sort of flowing this way when a new vote comes in it needs to be sent to the aggregation in order to get counted Right, so this is gonna be vote count Right, so this is an aggregation. It's just a count that groups by the article ID of vote And then there's gonna be some operator down here, which is going to be actually a left join Which we indicate for the symbol here and the sort of a data flow path from here and here So you'll notice I'm drawing these arrows and the reason for that is because the Noria is a data flow system Similar to some other systems out there that do a similar kind of materialized view maintenance strategy And the idea here is that as opposed to a regular database where these queries the arrows would sort of be pointing the other way Like if you did a join the join would have to go to the article table and then fetch the article and then go to the vote table and like pull all the votes that match and then Count them and then produce the join here instead what we're gonna do is we're gonna have this data flow This is gonna be a view at the end here Which is sort of gonna be the the net total result of doing this join And the idea here is that if a new article comes in that article is sort of gonna come down along this data flow path It's gonna hit the join operator The join operator is going to do a lookup into this side of the join For the current vote count for let's say this was article 7 then it's gonna do a lookup for 7 If it gets back a value saying like let's say 7 its current vote count is 42 Then it's gonna produce a record that flows down here saying 742 and maybe some other fields about that article right This does that part of it makes sense right the join operator gets an update from the left saying Article 7 was added it queries the right for the current state of that key It gets back some number and it stitches those together into a single update that flows down from it Ultimately hits this view at the end and this view is materialized think of it basically as a hash map So this is gonna have essentially a giant sort of table of These are parameter values, right? So these are basically the question marks in your query So if the query is something like what we're really writing here, right is select Article dot ID and Count Vote dot user From Really should have written this in advance it would have been easier from article It's gonna be like a left join Vote I guess it's gonna be like using ID not quite but sort of Group by Article dot ID and crucially the query that the user issues is gonna be something like where Article dot my handwriting is terrible on this equals question mark right and What you'll note here is that? There we go What you'll note here is that This question mark here is sort of gonna be what we're gonna use as the key in the hash map Right, so we're gonna have inside of the box here for let's say the user uses question mark of seven We're gonna have the results for seven right, so Ultimately what's gonna happen here is When we do when we execute this query Nori is gonna sort of go look at this query and go. Oh, that's this view I'm gonna look up into that view using the value seven which is the user the value the user provided for this parameter and I'm gonna look up what the results for seven are and If it finds an entry in this table, then it's done the the read doesn't have to do any more work It doesn't have to do a join it doesn't have to do an aggregation. It just returns those results And that's great right our read is super fast and now imagine to sort of complete the picture here Imagine that a user Votes for an article, let's say they vote for article seven right so a new vote comes in for article seven That arrives at the vote count operator the vote count operator goes hmm. Let me think for a second here That's right here to make it clear It knows that well it knows many things but amongst other things it knows that sevens count is 42 It sees that this is for seven and therefore it needs to add one to this so it's gonna emit what we call an update So the update consists of two Deltas one is a negative for seven forty two saying that in the past I sent you the information that the the sort of that this tuple existed that for the key seven the value 42 there And it's gonna send a positive for seven forty three Right This combined update here. It's gonna flow down into the join The join is gonna do a look up to its other side and find the corresponding article information for number seven and then it's gonna produce two output records these records are gonna be a negative for seven 42 and the other fields for article seven is gonna be a positive for seven 43 and the other Fields for article seven right so title and body And in fact that most of the columns most of the values in these additional columns are gonna be the same for the negative and the positive because they Haven't changed and there are various ways in which the system could optimize this so that it doesn't send the same values twice whenever a value changes Ultimately these both then arrive in the view and the view sees both these records It sees the minus which is gonna make it remove the results It has from seven and then it sees the plus which is gonna make it add new results for seven Which is gonna be this updated value of forty three And so now if the if a user issues another another query that looks the same and with the value seven The seven is gonna do a look up into here again And it's gonna see the updated value that includes the count forty three instead of forty two Okay, so this is sort of the the very high level overview of how Noria uses data flow to give you materialized views and and crucially as we see here incrementally maintained materialized views Right another strategy you could imagine is that when a new vote comes in for seven We're just gonna forget everything when you about seven like we're just gonna remove every entry in every table for seven And then if someone asks again, we'll just recompute it or you can imagine that if a vote comes in for seven We're gonna query all votes ever for seven and just recompute the results from scratch Of course doing this sort of just add one is a lot more a lot cheaper and what Noria does when it can So so I'm gonna take some questions from chat here because this is sort of the This gives you the model that we're gonna be working on when dealing with the bug that we're gonna be solving later So if anything is not clear about What Noria does here or why then now's the time to ask Does this work with well on a right heavy application because of all the updates? Yeah, so that's the trade-off this is making Noria is specifically made for read heavy applications If you have a right heavy application Noria might not be for you It is not intended to be it is specifically for the case where you do far more reads than writes like orders of magnitudes more read than writes That said like it's not as the rights are terrible here Is that the rights are doing a lot more work? Noria does support things like batching updates and stuff, so it won't be that bad, but Noria is certainly not built for that kind of work I'm wondering why Noria was not built on top of stable technologies such as Postgres or Redis instead of starting from scratch So it's not clear how you implement this model on top of Postgres and Reddit specifically Noria needs to have full knowledge of all the queries it has to control how data flows between all these operators It needs to have full insight into every right and every read in the system The second part of that answer is that they're really slow So Noria has like orders of magnitude faster for reads especially than like Postgres or MySQL and Part of this so there's sort of an amendment to this which is Noria has much weaker guarantees about consistency Then Postgres for example gives it gives Similar consistency guarantees to sticking memcached in front of MySQL as opposed to sort of full asset guarantees that are a real database would give you And similarly for Redis it's not clear that that would buy us anything that said Noria does have So these base tables Are actually stored in RocksDB And this is where our our imagination here is that these are actually The what storage engine you use there should be swappable and it should be pretty straightforward the interface there is very easy Yeah, so this answer the question of is it strongly consistent, it's not strongly consistent although what we do Noria's consistency story is not well defined and this is one of the challenges with the system That it's sort of hard to even explain what we provide but in general the goal here is that when you do a read it's going to reflect Past writes and it's not going to go back in time for any given view So basically think of every view is going to be a snapshot of you doing a query at some point in the past We don't give a guarantee about what that point in the past is and if the system queases like input stop to the system then the Result that you get from reading is going to be the same as if you just executed the query over the the base data Is The caching view really a hash map as you hint or is a more like a bag So the caching view is actually an EV map so this is something I've talked about elsewhere in one of my talks I think it's on YouTube somewhere It's a so it it is more like a bag And it also does some other cleverness to deduplicate the the records but essentially it's it's sort of a Yeah, essentially it's a bag and then it has a bunch of cleverness around concurrency so that you can do reads completely in parallel without synchronization and You can still do writes concurrently so that's sort of a separate data structure that we're not going to be talking about today Is Noria production ready? No, it's a research prototype. It is definitely research prototype Like I think it works. I would not use it in production I think it could be made production ready if like a bunch of engineers sat down with it and just like Went through the code base with a comb and so there are a bunch of things that matter in production That does not matter for research like there are a bunch of things. I know I want to change about Noria's Sort of interface and inner workings and that sort of stuff that it's just like not worthwhile spending time on in a research context But if you were to use this for something real you would want to How far along is it and so Noria's you can totally use Noria like it supports sort of most of sequel I don't want to say all because it doesn't But it supports like all the stuff like the query I've shown you here and a bunch of others So one of the things we've done is take in the Lobster's website Which is sort of like hacker news and we can run all of their real queries Like the ones they actually get when you access the Lobster's website all of those we can run and we have benchmarks that That show that in in the Noria paper, which is linked from the github repo Is right heavy an absolute value or relative to the number reads well So Noria has some amount of writes per second it can tolerate what that number is depends Essentially on the complexity of your query graph right the more work you have to do on every right The more expensive your right path is going to be And so if you have really complex queries your rights are going to be slower your reads are the same speed No matter what the queries are which is kind of cool Effectively if you have more complex queries it translates to lower right speed And to higher read update latency So the latency between when you do a right and when it becomes visible is basically the time it takes for that right to Propagate through the all the operators So in that sense, there's no absolute right performance number It also supports charting and so that's the kind of stuff that would let you speed up the right path The only thing you could be based on is a storage engine But I think all the querying engine its primitives cannot be based on some existing tech Yeah, exactly. Basically all of Noria's Query stuff is custom made because it sort of has to be it just does not work the same way a normal database does There are no scans. There are no index look-ups in the same way All the queries are turned into data flow like when you write this query to Noria Which you can you can write exactly this query and give it to Noria What's gonna happen is that Noria's sort of Internal engine is gonna turn that into the data flow graph on the left And from that point forward the sequel does not exist as far as Noria is concerned. It's just a String that it uses to look up node indices in the data flow graph So there is a replacement for postgres and read have a application. Yeah, you can think of it that way We have we've actually implemented the mysql binary protocol so in theory this is this is shim we've built that you can stick in front of Noria and then Just make it listen on like port 3307 or something And then any application that uses the mysql client library in theory You should be able to just like switch to point to that instead of my sequel and everything should just work in practice It's not quite that simple because Noria doesn't actually support all of sequel But that's sort of the idea that you should be able to use it that way Rocks to be uses LSM trees the rights aren't too terrible So the bottleneck is not rights to base tables that the rights here are just inserts which like are Even if they have to go to disk, they're not that expensive compared to all the compute you have to do internally in the graph Great, oh there are more Who's working on Noria so we're actually a bunch of people that are working on slightly different parts of Noria and Over time who's worked on what has also changed a lot It's primarily a research project out of MIT out of the parallel and distributed operating systems group Although we have some contributors from Harvard Including Eddie Kohler, who's a professor there and Jonathan who's Doing his masters at Harvard and then we have several master students and PhD students at MIT in my lab And two of the professors from Pidos. They're also working on it How do you know not to propagate the updates on a right that's a great exam So the question here is we only store data for keys that were previously queried How do we know not to propagate the updates and I'll get to that in a second that is Sort of related to what this bug is about My roommate well, it's my girlfriend, but yes, she is playing animal crossing Yeah, sorry and Malta of course who's that brown I still think of him as being at MIT But Malta is a now assistant professor at Brown University and has also been a major contributor to the project He's one of the co-authors on the Noria paper so he has been working primarily on the sort of Translation between sequel and data flow and the various optimizations you can make there And sort of the the dynamic aspect of being able to add more queries and have it reuse parts of the data flow that Match up between queries think of this as sort of multi-query optimization, which is also a well-studied problem in the literature And he's also now working on sort of policy and security Privacy and GDPR compliance stuff on top of Noria Noria is turning the queries into data structures to be written to disk. No, that's not quite true So none of this is data structures This is think of you can sort of think of each of these boxes in the data flow as a separate process You can think of them as actors. They're not actors. You can think of them that way where When the article actor gets a new insert to the article table It's gonna store it to disk and then it's gonna send a message to all of its children in the data flow One of which is this join saying a new article was added and then the join actor is gonna go Oh, I need to produce a message to my children telling them about this article So I need to ask my parents to construct what that output is going to be So none of this is data structures. In fact, all of the state that's stored Below the base tables is in memory Yeah, so this other question here is about SQL queries are usually way more redundant than normalized data and the original schema Cross joins for example have an exponential nature and it seems like if you store that your memory will go through the roof So this is one of the reasons why Noria has the support for partial Materialization where we only materialize the keys that someone asks for because if we materialize the result all of the results for Every query your memory would just explode right when you do a join the size of the output of that join is sort of Proportional to the product of the sizes of the two tables Well, depending on how you write you your query And the semantics of your data and so this is why Noria tries very hard to not materialize everything ever it Well, whenever the query allows instead it tries to only materialize the things that the user is explicitly asked for And allows you to evict those things over time This is sort of one of Noria's primary features Updates and leads are fine. So updates are Think of it as sending a negative followed by a positive to the base table and Removals are just sending a negative to the base table No, there are no actors in or yeah, I'm just it's a useful mental model for thinking about how this data flow stuff works But it's not actually what's being used internally in fact, what actually happens is that These are more like each is their own future And then we just run all of these futures on a thread pool and we separate them across machines and stuff What happens if a hot key in the cash gets invalidated also a good question So if a hot key gets invalidated, there are no invalidations in Noria in this way if you look at this right here the a new vote came in for article 7 which was in cash The value gets updated in place. It does not get removed and so subsequent reads are Basically never miss there are cases where they might miss and we'll get to those in the second great All right, so now let's get into let's get a little bit closer to the issue. We're gonna tackle today So I'm gonna maybe try to move to the side here if I can Apparently I cannot so Who is gonna be interesting, I don't even know how to move in this I Guess I have to switch to the move tool Did not use to be the case very sad All right So here we're gonna talk we're gonna use a slightly different graph Let's go with that color is pretty So I'm gonna go back to This graph that we had previously But sort of more in the abstract at this point So one of the things that's cool about Noria is that all of these edges right are really just channels They're just ways for one operator to talk to another and so we can actually make any of them be network links If we want so this is one of the ways that Noria can run in a distributed fashion But we got the question earlier of how do you do this sort of like if you have a table down here and It starts out empty Then what happens if a user read comes in right? if some user is like well, hey What's the current value for seven in the query that we saw before The database goes well, I don't have an entry for seven. What do you do? and crucially what happens if like many Clients all ask for seven at the same time which could also happen Yeah, and this is where Noria's part notion of partial comes in and this is actually much of what the complexity Noria comes from is dealing with these kinds of cases of Being able to have missing state. So specifically what happens to Noria here is we issue what we call an up query So an up query is a way to send a message Back up through the graph to your parents like towards the roots We we generally call these loose and these roots For Somewhat weird reasons databases do them in the up the other way, but it's fine So here if a query comes in for seven what we're gonna do is we're gonna issue an up query Which you can sort of think of as traversing all of the edges in reverse All the way back to the nearest place that has the data So you can think of this as if this has a materialization If this stores some data that contains seven then we're done We can just respond from the data that's stored here and we don't need to do this additional up query But if seven is also missing from this operator Then we need to forward the up query to our ancestor and then our ancestor needs to look at its data And see if it has seven and if it does it responds otherwise the up query recurses Ultimately if nothing in the graph has that key if like no one has ever asked for seven anywhere in the world Anywhere for any operator then you're gonna end up with a query hitting the base table at which point it basically It doesn't become a scan, but it becomes a lookup in the base table Which is guaranteed to have all of the results for seven So we so it's gonna have like some amount of records as what we say here is that up queries Recurses until they hit a materialization Well until they hit in the materialization There are two types of materializations There are partial materializations like this one and this one and there are full materializations like this one and The biggest difference between these is that if you do a lookup into a partial materialization And you do not find the data Then you need to recurse if you do an up query into a full materialization and you do not find the data Then you know that the result is empty This also means that for partial materializations. We actually need to store tombstones We need to know that We need to know that we know the result for a given key and so therefore we remember sort of tombstone saying seven was empty But ultimately this is gonna recurse all the way up to seven and then the sort of clever part here is when an up query Let's go with this color When an up query ultimately hits some data so in this case it found this data over here Let's make this the yellow box Right, so this is the data for seven and no one down here has seven then what it's going to do is it's going to forward that chunk of data for seven Just along the normal data flow path think of it as it's gonna send it as if it was a right But it's gonna essentially create one batch that has all of the results for seven in that one batch When that batch of records for seven hit the join the join is gonna do the same thing a join normally does when it gets an update He's gonna do a look up into here for whatever the join key is it's gonna get a result And it's gonna stitch together some new version of that yellow block that includes the relevant record from the right-hand side It's gonna forward those and it's also gonna fill in the value for seven because it now knows the value for seven It's gonna forward that down here. It receives seven It then fills it in because it sees that this is a response and now has the data from seven and any subsequent query for seven Is now gonna hit instead of miss So that's a that's the basic premise for up queries We should spend a little bit of time to sort of talk through this as well because Up queries are a key concept to what we're gonna be dealing with today So if you have questions about up queries now's the time I Know that I'm moving quickly through this This is a lot to take in But in order for us to get to somewhere we can solve an actual problem We need to go through all the things that work first So like ask questions at whatever sort of level you feel like you don't understand this I Kind of pull-based look up you can think of it that way Although it's important that the response to an up queries sort of treated like a normal right It this means that the operators do not need to have special code to deal with The the sort of partial case the operators is written to take records from from their ancestors and produce records for their children All right So for a filter to take sort of a simple operator the filter operator takes records from its parent and then for each record in that Batch if it matches the filter then it's included in the output it sends to its children if it does not match the filter It's not included a Join is a little bit more complicated right where for every record in the batch input you get you do a look up in your other ancestor for the What whatever is the join key and then you sort of spliced together the results and that produces the output batch But it but the nice part of this is all of the operators are basically Are more or less unaware of the fact that partial exists the operators can just can just know about their own semantics and that's all Let's see here Could an up curry upstream up curry hit two notes at the same time with different results Yeah, so there's gonna be the next step, which is how do you deal with consistency when you have a crease? So it's a good question. We're gonna deal with that in a second Does the up query recursion tend towards the shortest path the In general when you analyze a given graph the up query must take a given path There might be multiple paths in the case of unions or in the case of joins you get a choice In terms of getting a choice Noria tries to be smart about which path it chooses Like if you have a full inner join the upgrade can take either path, but it will should not take both And the details for that are relatively complicated, but for a union you have to up query both paths I'm only talking about things that have two ancestors multiple ones generalize But if you have an up query that comes to a union it has to take both paths And if you have an upgrade goes through a join you pick either path and then when the response comes back It's gonna query the other side and that query you can think of as being the other part of the up query So you end up exploring both paths and there it tries to be intelligent about which one it does the replay from and which one It does the look up into How do you remove tombstones So you can evict entries in Noria Currently the process is relatively stupid. It basically does randomized eviction, which is not great But you could implement something like LRU or whatever For us the important thing was to support eviction Choosing an intelligent eviction scheme is sort of not that interesting This is one of the examples of a difference between Production and research in research. It matters that we can support eviction It is not terribly important that the eviction is also smart. That's sort of a separate area of research Because most materialized view systems cannot support eviction at all Except on sort of the full query level like remove an entire query or keep the entire query Whereas here we can be a lot more fine-grained about what we include and what we remove Don't understand sgbd's I don't know what sgbd's are So you'll have to explain that to me Is it worth understanding why we traverse to the left parent and then the right parent In the case of a left join so in the specific case where this is a left join You must query the left side The reasons for this are somewhat complicated and not relevant to what we're going through today But you must choose the left if it's a full join an inner join or an outer join You can query either ancestor. It does not matter There might be different as a performance, but it does not matter in terms of semantics There is eviction when materialization gets too big yep, you have a main DB at the bottom There's no main DB. Well, so so if you're talking about them the view at the bottom It's really just like a hash map. It has no smarts. It doesn't know about queries All it knows about if someone asked me for seven I look up in my hash map and if I have seven I return that otherwise I upgrade and so every Every operator every node in this data flow graph only knows about its own semantics It does not know about queries or anything like that None of these are databases in and of themselves. They really like the join operator really only knows about joins The tables at the top have an interaction to some kind of persistence engine currently rocks DB But it does not have to be But that's only for durable storage. They don't use the query interface really for that With the upgrade go both paths from the join so I think I've covered that Reminds me of laziness. It is sort of like laziness You sort of force the funk that is seven when you get a enough career for seven The model sort of force falls apart if you look at it too closely, but sort of Because you sort of essentially manage this tree of dependencies you need to explore in order to get the final result What data type using underneath underneath where Would it be too hard to add pluggable eviction? No, I mean eviction is sort of just separate that the way eviction actually works currently is You can insert an eviction record into the system at any point like you can insert here saying I want to evict something from here And then it's gonna flow through the data flow and when it hits here it gets evicted from there and anywhere down You have to evict from From anywhere you evict you have to evict all the way to the leaves for that key Because otherwise you run into consistency issues and we talk about this in the paper Eviction scheme sounds conceptually similar to garbage collection Maybe kind of a little bit different Oh DBMS, that's fine Graph data searcher there is a graph data searcher actually to keep track of the full data flow graph Although it's relatively immaterial We don't really do any advanced analysis of the graph itself, although you could for things like query optimization Great, all right, so now we're gonna move on to step three Which is Here All right, so Now let me pose you with a problem Let's come up with All right, now we're gonna start talking about unions All right, so I'm gonna have two base tables So these are base tables A and B. Let's call them A and B and then over here. I'm gonna have a union And then down here we're gonna have some kind of view All the view is gonna be partial the union does not need to be materialized because it doesn't need to have any state to compute Right, so an aggregation needs to have a materialization because it needs to store the current count or this current sum a Union is stateless as a stateless operator much like a filter A view is is materialized of course because we need to be able to read from it And the base tables are materialized because we need to be able to query into them and they need to be durable And now let me pose you the following problem So imagine that again some query comes in for seven here. It's a terrible color Let's do a better color Let's do This color So a query comes in for seven here And let's say that the view does not have seven so it needs to upgrade well because it's a union We need to get the results from both A and B, right? So V here is going to send an upgrade up here for seven and it's gonna send an upgrade up to here for seven Okay, that's great Now a is gonna respond with its records from seven B is gonna respond with its records from seven and Then these are gonna arrive as two separate replay responses both for seven to the view Now imagine that the view here Just the moment it gets the first replay response is gonna start serving reads So this initial read that came in for seven let's draw that maybe in a different color this initial read that came in for seven The moment this first Response comes in for seven the state for seven is no longer missing, right? Because in V's state there now is an entry for seven there But it only has half the records it only has the records from a and it does not have the records for B And so if we now responded to this query we'd be including only parts of the results Now there's an argument about with eventual consistency. This wouldn't be a problem. That's true Although this doesn't in the more general cases is a problem, right? We can't consider state to be present until we have all parts of that state And so what we're going to do is we're gonna have the union wait Until it's gotten replay responses from both sides and then issue only a single replay response downstream So rather doing what we're currently doing here the union when it gets the first seven This is the first replay response for said or up career response for seven It's not gonna forward it instead. It's gonna keep this little box Where it's gonna stick the seven and then it's gonna open a similar box on the other side And it's not gonna allow through this replay for seven until Eventually the replay from seven from the other sides arrives and fills this box at this point The union is filled both of these boxes and then it can combine because it's a union, right? It can combine both of these up career responses into a single seven response Which then fills the state in v Completely immediately So that makes sense why we need to do this why we can't just expose the partial state Let's think about this for a second because this is also important to what we're eventually going to get to fixing All right So hopefully this one that this should not be too complicated yet This is just like obviously we can't just take the first response we get we need the union basically needs to know about the fact That it's a union and that if if one up career response came then there will be others Well, that's a lot of questions at the same time Seven is not a primary keys results may be in both a and also be Yeah, seven does so Noria does not require primary keys anywhere But here even if seven is a primary key It might be a primary key of a and b in which case there'll be one entry in a and one entry in b That they both need to be replayed Yeah, there's no uniqueness constraint here on on the column that seven is hosted in How is it that a materialized child can have a stateless parent from which it needs to draw data and the reason for that is because Here the view is materialized right and the parent the union is stateless and that's fine because if this view needs state Sorry, maybe you can't see what I'm drawing If the V at the bottom needs to fill some state it just asks its nearest materialized parents or ancestor rather so in this case It sees that it's the The way this actually works You you could think of this as the upgrade is just sort of sent along the edges until it hits something that is materialized in reality the way this works is that Noria does an analysis of the graph and it tells V that if you are missing state You should send an upgrade to a and you should send an upgrade to be it doesn't even mention you to be And so V actually sends the upgrade directly to that it's not hop up the graph the way I've drawn What happens if they're different value I think you need to expand on that Can you at least start letting data stream over the network before the full state is complete In theory you could although we're about to get to a complication that would probably make that hard Great alright, so one of the places where we run into a challenge now Here so what I'm gonna do is in red here so bread is okay I'm gonna start injecting rights into the system So here there you can imagine that there are a bunch of concurrent rights coming into the system Right and some of them might be for seven, but some of them might be for some other key like let's say eight This poses a problem with upgrades And the reason for this is Imagine that on this path me You can need to erase a little bit here Yeah, no draw why can I no longer draw? Oh, that's why Let's bring this guy back And then let's bring back what we had here Okay, so it's the seven response here and the seven response here All right, and now imagine that there's some rights that come ahead So these happened before if you will the seven up query we got and some rights that come after Similarly here and here right because we don't want the system to sort of stop We want the system to keep going basically at all times if the system had to stop whenever there was an up query and not process Any rights the system would grind to a halt whenever any read missed And so we really want the system to just keep going as fast as it can in any given position And this obviously complicates matters, but it's something we'd like to do Okay, so now let's imagine that this is a right for seven. This is a right for eight This is a right for Eight, this is a right for seven So before we get to this let's move back To here we need a little bit of a primer someone asked earlier What happens if a right comes in for or how do I mention in the very beginning of the stream that if you get That we don't do any work for a key that no one has read if no one has asked for keys seven Then if we get a right for seven we should store it on disk, but we should not compute anything for it And the way this works in practice is that I guess yeah, let's go all the way back to here actually So we're for our vote count table for example here It's current state is it knows the count for seven But it doesn't know the vote count for anything else and the reason for that is no one has asked for any other key So it hasn't bothered computing it imagine now that a new vote comes in for eight The eight So right now comes in for eight comes down here to the vote count The vote count looks up in its table and it sees well, I don't have an entry for eight, right? This is missing. There's no value here What's it going to do? Well, it can't forward anything right? It can't tell the downstream view what the new count is because it doesn't know what the old count is So it has two options either it could compute the current count for eight and then respond or It can just drop the update Noria currently does the ladder and the reason for this is If no one has asked for eight then why bother computing anything for eight You could you could spend a bunch of resources sort of counting the let's say million votes that there are for eight But why bother when no one has asked for it? Let's just do that work when someone asks instead Because it could be that no one will ever ask or that they won't ask for months It could very well be and so Noria currently when it gets a right for For a partial entry that it does not have it just drops that update and does no more work And this is the way that Noria avoids doing work for keys. It doesn't care about Which then brings us back to this problem All right, so consider what happens when these rights flow through the union right ultimately these rights We're gonna feed into this union and the union it's stateless, right? So it doesn't know whether to drop or forward updates So it's just gonna forward everything it wouldn't know whether we missed it's gonna forward it to the view the view down here has seven but does It does not have seven yet, right? This is the green stuff here is after we've gotten seven so it gets a right for seven which it drops And then let's just imagine that it gets the two things from the left, right? So it gets seven and eight from the left It's gonna drop both of them because it has no state for them And it doesn't so it doesn't know what the past date is so therefore it doesn't know what to do about those updates So these are just gonna be dropped And then the when the up query responds for seven eventually arrives and the state gets filled in So at that point we have the state for seven in the view then any subsequent right for seven or eight So anything that comes after the up queries, right? It's gonna flow logically After this right they're gonna come in after here and those are going to be applied because at the point when we receive them We do have the state for seven. We do not miss when we get that right Okay, just Let's pause there again for a second before we continue Let's see in the case B stops working does a key processing union request until it's out of memory What do you mean by stops working? I have not talked about failures at all yet Could you probably get an incomplete flag on these updates so the upstream processor can start their computation When you say upstream do you mean downstream as in physically down on the drawing? You could although it's not clear they can do anything until they have the complete result often they're computing on full values and like There it's not clear they can do anything with a partial response although in some cases it could What happens if a returns a different value for seven than B well the semantics of a union is that you get the Results from both. It's not a discreet union. It's just straight-up union So it's all the records from a and all the records from B with no deduplication Has in tell me if X matching pattern Z comes in. Yeah, great Let me just sit and think for a few minutes. That sounds about right Okay, so I hopefully this sort of image makes sense of We're gonna drop writes for keys that are missing But if we get updates for keys that are present then we're gonna apply them So now you know most of the stuff that you need to know about sort of partial and how that works But now we're gonna get to a place where it gets complicated And so I'm gonna move to another drawing so that this one stays less busy In theory if I can find my mouse there we go. All right I'm gonna try to keep the same color scheme I hear so yellow are the operators Okay, so we have an operator here Which is a and we have an operator here, which is B Looks more like a D, but it's a B Just have to trust me on that and we have a union and then down here. We have a view Great So now Let's see what happens if We're gonna have I guess up queries were something like this color We're gonna have an up query response from a Actually, here's how I'm gonna draw this to make it a little bit clearer what's going on Change this drawing a little So it's clear what the order of operations are So let's go back to this no, nope Do this And then we're gonna say that sort of think of these as being like at the same time This is sort of the the single pipeline into the union, right? So the union that there are two things that are connected to the same sort of socket on the union So like the union has one channel input Because they can only the union can only process one thing at a time And so it chooses to I think of it as a select over a and B over a receive and be received, right? And so it only gets one at a time and then a and B send entirely separately So now imagine what happens if If the order that the union gets things in is it's gonna receive an up query response from a Let's assume that these are all for the same key for the time being Then it's gonna get a right from B Then it's gonna get a Actually, let's do this Then it's gonna get a So the order here for the union is that it's gonna get an Upquery response from a then a right for the same key from B That are right for the same key from a and then the up career response from B And let's see if you start to spot the problem I Couldn't then know that the bottom start processing some data from the partial union like filtering I'm in some cases it could the complexity is probably not worth it to have it processing complete data Ultimately you're gonna have to do this merging somewhere and the union is a good place to do it And remember the downstream operators is still processing rights and stuff. So it's not as though the system stops If The upgrades are tagged somehow could you either drop all the updates before it gets tagged responses back? Or start aggregating the values as well There are sort of tags here the tags are generally what Upquery path is this and What key is it for? But ultimately like the merge just has to happen at some point and so Doing it early while processing rights means that you avoid the complexity of having downstream things have to deal with incomplete data like Computing on incomplete data is complicated. And so just not having to deal with that is easier And because there are concurrent rights any way that the union is letting through it doesn't really make too much of a difference And you have the full documentation somewhere. There is no full documentation. Most of it is in my head Sadly, there's a lot of detail in the paper. So from the Github repository for Noria There's a link to the paper from OSDI 2018 and that has a lot more detail about how the system works and why it works the way it works How do you get a1 b1 and a2 b2 No, so So this is an important point every channel here is in order So we assume that if a sends if a sends Let's do like a 1 and then a2 Right and b sends b1 and then b2 Then at the union you will receive them in order. Think of this is like each of these is TCP So the union might receive like a1 then b1 then b2 then a2 That's a valid thing for it to receive but it cannot receive like a2 a1 b2 like b1 Like this would not be a valid order because it doesn't respect that it doesn't respect in order delivery on every path across paths it basically does a select over them sort of a non-deterministic select Yeah, so the problem here as some of you has started to observe Is that what the union is going to do? Let's Avoid I'm gonna avoid littering here by getting rid of this stuff The problem here Is that the union let's name these so this is going to be Don't want to race this is going to be a Actually, let's do Let's say that this is going to be a1 This is going to be b2 Just having names is easier. This is going to be b1 and this is going to be a2 and then I'm going to go ahead and Erase these up here because they're not terribly important right so the problem we run into here is The union first receives a1 right so it it sort of constructs Think of it as it constructs. It's like it's left box and it's right box right and it sticks a1 in here and Now it's going to be waiting for b2 Right, so okay. We've handled this guy now. We got b1 What does the union do? Should it forward b1? Right, it has to make a decision It can't it can't skip ahead on the channel to b2 because it has to handle these in order So it needs to handle b1 first. It also can't choose to handle a2 It it got like it did a non deterministic select from its parents and it got b1 What does it do? b1 is for the same key as a1 is So it turns out that the right thing for the union to do is to drop b1 Drop b1 and How does it know to do this? Well What it's going to do is going to look at it sort of pockets on both sides and it's going to see If it has a non-empty pocket for the same key as the right it received and the right is from a different Side than the pocket, right? So so the right here is from the b side and the pocket that's filled is from the eighth a side Then what that means is that the union knows That Eventually in the in the case here of a and b right. I've gotten a replay an up career response from a I will get an up career response from b The thing I just got was a right For the same key Which means that I still haven't received the up career response and The right happened before the up career response. So the up career response must include this right Or the way to think about this maybe is that b2 b2 must include b1 It might include a bunch of other stuff But it must include b1 because b1 happened before b2 at b So when b took sort of a snapshot as what is my current state for the key? Let's say all of this is like for key 7 because that's what we've been using Right. So when b got the up career for 7 at that point it must have already processed b1 and Therefore b1 must be included in b2 because they're both for 7 That makes sense And therefore dropping b1 is the right thing to do if we did not What we could in this particular case we could actually forward b1 as well It would be fine because the view would just drop it But there's no reason to forward b1. We know that no one downstream of us is gonna care about b1 Okay, now let's imagine that we get a2 now what do we do? Okay, so we've already gotten a1 and By the by the same argument the b2 must include b1 a1 must not include a2 Because at the time when the a1 sort of snapshot was taken for the key 7 a2 was not at a a had not received a2 yet Because it had not yet produced a1 and so when it produced a1 a2 was not present at a and therefore a1 was not include a2 So dropping a2 is out of the question if we drop a2 then the state represented in a2 is gone forever And so it never gets represented in the downstream view which would be bad It would be as if that record never existed But we also can't forward a2 if we forward a2 the views is gonna drop it because it doesn't have the state for the key yet Because the up career is not yet completed And so the it turns out the result and what you need to do here is If you get a right for Think of it as for a non-empty pocket So in this case like a come the a pocket has an entry for the key that the right is for then a2 Should basically be unioned into This stored thing so this is now going to include both a1 sort of union a2 And keep in mind here a1 is like a snapshot of all the state of 7 and then a2 is just an update to that state So all we have to do is apply that update to that snapshot of the state So notice that the behavior is different depending on where we get the update from Okay, and then finally when we do get b2 eventually then Now the story is pretty straightforward. We get b2 b2 fills this slot And at this point we can produce an up career response from this whole thing Which then includes a1 Plus b2 plus a2 But crucially not including b1 and this will be a sort of consistent snapshot at some point in time In the the input sequences from a and b All okay assume all of this is for the same key currently. Yeah, sorry I hit the mic All right, that was a lot for us to go through so questions make a box for b1 Make a box for b1 There's no box for b1 How do we distinguish between b1 and b2? Is there some sort of artificial time stamp? Okay, so there are two questions here one is How do we know that b1 is a right or an update and b2 is an up career response given that I said that Upcareer responses are basically treated like normal updates as far as the operators go and the answer to this is Up career responses are tagged as being such like they still have records just like normal updates do But they also have a little think of it as a flag that says that I am actually an up career response for a query for this key Yeah, apart from that you can distinguish b1 and b2 by the sequence in which they arrived But you wouldn't if all you had was the sequence you're right You would not know which one was a an up career response in which one was just a normal update Could the pocket not be a list of sorts? Yeah, so in practice these these pockets aren't pockets at all. They're really just Hash maps from the address of the source and Then it the union knows how many parents it has and so it Basically looks for any given update it looks up in the hash map the source of the packet and then sees whether there was an entry or not for that source for that key and then when the Length of the hash map is the same as the length of the or the number of ancestors then it knows that it's completed the up query How is order determined Don't know what you mean by how is order determined? Why can't we use b1 instead of b2? So b2 b1 is an update b2 is a snapshot. I think of this as b1 is going to be something like One vote got it's going to be like plus one vote for Article 72 b2 is going to be here are all the votes for article 72 And so b1 is sort of a delta whereas b2 is the complete result set and so b1 is not sufficient to fill in the state at the downstream nodes B2 is that b2 is much larger than b1 is because b2 includes everything for that key for the upgrade key Whereas b1 only includes a particular delta Is not the same thing on both sides of b2 includes b1 Yeah, so b2 includes b1, but a2 But but a1 does not include a2 and a2 similarly is an update It is not a snapshot it only includes one delta whereas a1 is a complete snapshot of the state from a for the upgrade key writes an update to send via a channel which acts as a cue Yeah, like these yellow lines here are their channels if you're if the Nodes on either side are on the same machine and there are TCP streams with like serialized and deserialized if the parent and child are in different machines Okay, so hopefully this this problem makes sense and it it also explains Why unions have to buffer and what they have to do in the presence of concurrent updates Like you you can see how deeply we've like Transitioned into a very deep part of Noria here Are there questions about this before we move on? Or sort of about the general workings of this area of Noria reminds me of video codecs In some sense you could think of it this way. Yeah, like a1 and b2 are Are what are they called keyframes? Whereas b1 and a2 are just are just Delta frames or iframes If if you come from a video editing background that that might help Yeah, we will eventually get to coding but first I have to explain what it is We're going to be coding and what the problem is Basically, we're about to get to the bug The the next drawing I make is going to be about what the bug is All right How does the union know that it's waiting for b2? And why does that mechanism not automatically solve the problem? The union knows that it's waiting for b2 Because again assuming all of these are for the same key Because it sees that the left pocket is non-empty Right, so it knows that there's sort of a an empty hole in the right pocket again all of the same key Right, so the left pocket for that key is non-empty And the right pocket for that key is empty and therefore it knows that it's waiting for something from the right-hand side We use b2 because it's a whole thing not because of consistency guarantees correct. We must use b2 Because b1 is not complete b1 is a delta b2 is not a delta Do you foresee a lot of issues with latency of tcp streams? No All of this is the right path and you can do a lot of nice matching here And the the latency of t speed doesn't really bother me because again, this is nori has written for read heavy applications and the reads never well The reads in the common case right in the stable case of the system Are just going to be reading from the view down at the bottom They're never going to have to traverse any of these inner edges And so the reads are just super fast every time. They're not affected by all of this inner latency Are we talking about a very small amount of time creating this hash like this isn't of the first read where writes occur in a short span of time What hash are you talking about? There's no hash here So noria could be used for video streaming. I don't think that's the takeaway All right, so I think we're going to move on to what the actual problem is All right If this is the first stream you're catching you're catching a weird one. I can tell you that straight away all right, so Now we're going to have to talk about sharding But we're getting there. We're getting there All right, so Here's what we got Let's go So now I'm going to present you with a slightly different problem Which turns out to be similar Imagine that you have some base table Uh, and you want to shard that base table So really let's say we have two shards So this is a shard one and this is a shard two And let's say that this is sharded by some column c1 right Let's sort of denote the The sharding key over here this way I'm much better at drawing the left ones than the right ones. It's weird Um, and now imagine that you have some operator down here. Let's say that it is a I don't really want to define what it actually let's do that. Sure You have some other table over here. I don't really care what it is. It's b. It's not sharded. It's fine Um, then you're gonna have a join Down here Uh, and the join is also sharded So there are two shards of the join j1 and j2 Uh, and These are sharded by whatever the join column is Um, so let's say that this is cj For the join column, right? So if the if you look up by like Article id then it's going to be sharded by article id This is for various relatively uninteresting reasons It makes sense to do this And then there's going to be some view at the bottom and the view at the bottom is also going to be sharded And what I mean by sharded here is that we're going to have multiple instances of the same thing of the same operator and When they're sharded by let's say this is cv So I guess we can we call it because ca um When when a particular operator is sharded by some column What that means is all of the rights for For which that column's value Modulo the number of shards is equal to that number. So for example Let's say that the value for for ca for some record Let's say we have some new record up here and the value for ca it's going to be this field is Two Then what we're going to do is we're going to do Two modulo the number of shards in this case two shards Which is going to give us zero and zero is going to be the zero shard I guess really I should name these zero And so this entire record is going to end up going to this shard As opposed to this shard Right If we got another record where ca was one then we do one modulo two we get one and it would go to the other shard This is how sharding is roughly going to work. Um, and then whenever you have connections between operators um, the connection sort of Logically looks something like this Notice that I'm not drawing them all the way to the boxes Uh, this I guess I can draw all the way to the box um and then You can sort of think of there as being like a shard a sharder operator here That does this sort of modulo computation and then sends the updates to the corresponding shard And draw these this little ball that has multiple children Um Right, so it's going to be this little ball that we're going to call a sharder Um, and so imagine that all of the updates that arrive on the path from b and all of the updates that arrive from either a one And a two they're also going to flow together And then they're going to go through this sharder and be sent to the appropriate shard of the downstream operator um, of course, you can avoid this if the let's say here if c j Uh, if c j is equal to c v Then this is much easier, right? Then what you could really do is you don't need to do this extra sharding You can just have j one connect directly to v one Uh, and you can have j two connect directly to v two You don't need to reshard, but if you do need to reshard So the picture we're drawing on the left here is if c a is different from c j is different from c v So this is c a Is not equal to c j um And this is c j is not equal to c v um Okay, so that helps But in order if we're going to have this one sharder that all of the updates from above are going to come to Then we need to make sure that all of the updates go into it And so we actually need to combine the updates From multiple shards both here and here in order for them to go to the sharder And so we need some kind of Thingamajig here. We're going to draw that with a triangle just for fun Because it looks like a triangle So we have this triangle here and what i'm going to call this this uh triangle is i'm going to call the triangle a shard merger And i'm going to call this a sharder All right, so the shard merger Merges the output of different shards and the sharder shards all the inputs it gets Right, so b for example b in the top left here Um b is not sharded But it but the output stream of b needs to be sharded because j is sharded And so there's going to be a sharder there But not a shard merger because there are no there's only one shard so to speak of b So it doesn't need to be merged And it should be fairly obvious at this point that a shard merger is really just a union Right because it's taking mobile inputs and just combining them And whenever we have a union low and behold We need to do this buffering Or in other words, we need to implement this sort of pocket system Right that we've been talking about Does this sort of model of sharding make sense before we proceed The the bug we're fixing will appear on this image. I promise Um Income sharding. Yeah, you're not wrong Um It seems to me john is getting better in drawing and writing on his thingy I mean, maybe I mean, I think it's possible to maybe understand what this drawing is supposed to be It's unclear It's like the aggregator pattern sort of so While you think about questions about this this layout one thing that's worth observing here is that You might think that there's sort of a bottleneck here, right? There's sort of an obvious bottleneck where what like If this is not charted This line if this connection is not charted and the sharder Is not charted Right and this is not charted And this is not charted It seems bad and in fact this guy isn't even charted And this guy isn't charted Then are we really getting any benefits from sharding? Do we actually expect to see a speed up if people enable sharding? Um, it's a good point. This is one of the things that we'd like to get rid of in oria If we could avoid having to do Basically, this is whenever the sharding key changes and you have to do a shuffle as we call them We call this sort of combination of a shard merger and a sharder Right, I can draw this here Somewhere if I can find my cursor because it's really small This plus this is equal to a shuffle So we don't have to do a shuffle between b and the j's Because there's no sharding of b But we do have to do a shuffle between a and j because their sharding key differs And we have to do a shuffle between j and v because their sharding key differs Um, and currently these shuffles are not charted In oria and we would like to have sharded shuffles To basically avoid this bottleneck, but keep in mind that this bottleneck is entirely stateless Right like the both the shard merger and the sharder are stateless And the relatively quick operations are really just doing routing And so first of all, you could sort of imagine you could do this as a very low level network types function But also most of the Most of the compute which is usually where the the resources of the system are bound up is still sharded This is mostly just sort of the the wiring And so we expect this wouldn't matter too much because you still get to shard the compute and the storage crucially Which machine does the sharder run on? Does it matter? So keep in mind that sharding does not have to be for multi machine Sharding also matters for multi core You could imagine that if you have many cores you still want to shard operators so that you can process like More join operations in parallel for the same join normally noria is Every noria operator is only ever processed by a single thread at a time Think of it as every operator is a future and that future itself will only ever run on one thread even though there's sort of Multiplexing of which thread runs which future there's never two threads running one future at the same time And so you might want to shard an operator even in the single machine case In order to be able to take advantage of more cores And this is one of the things that's nice about this data flow model It's pretty easy to map it on to both multi core and multi machine As far as where the sharder and the shard mergers need to run They can run anywhere It is nice for the shard merger to run Relatively close to the parents and the sharder to run relatively close to the children, but it's not required So yeah, it could be anywhere really Shard sounds like a made-up word at this point. Yeah, you're not wrong I have said sharding a lot up query as well as a similar one where like it just sounds to me now Sharding reminds me when I used to crush algorithm in order to know what data went where yeah, so so that's another question here Which is how does the sharder decide where to send each data the scheme? I've given you here. It's just a symbol like like hash hash partitioning where You just Hash the value and then you take it modular the number of shards and send it there There are improvements we want to make here too where you'd really like something that's a little bit more dynamic Both so that you can add and remove shards But also you really want more like a value range partitioning where You could say if if one key is particularly hot You might want to say this key goes just to that machine and no other key goes there In order to satisfy that load And so that's the kind of stuff that we're looking at adding but have not added yet What's the difference between sharding and multi-threading? so sharding just lets you have more operators To handle the same work So sharding in and of itself is really you can think of it as a slowdown right it it's adding more operators to do the same work But it enables Better multi-threading or more multi-threading because you can run those additional operators in parallel So sharding enables more multi-threading in your in your data flow It also enables more machines to be used in order to sort of take on the load of the system Looks like the sharder can be easily sharded but the merger can't be sharded because it must do buffering So this is actually one thing i'm not sure about I think you might be able to shard the merger as well But I I don't know yet. This is one of the like research questions, right? And the problem we're going to solve today is somewhat related to that although not quite the same Uh, what key do I press to cycle through the autocomplete? uh control n Uh, how's the shard merger stateless? um The shard merger is stateless in the sense that the only thing it needs to keep track of for these pockets for replays and whenever the re the for up queries, sorry Um, it needs to buffer an inquiry response only until it gets a response from the other shards Or the other ancestors But beyond that it's stateless like it doesn't keep persistent state per key It only keeps relatively ephemeral state for um for up queries In the current system the number of shards is fixed Which is another it's sort of related to but not quite the same as wanting range partitioning All right, so now let's get to what the problem is Um And we're only going to solve one part of this problem today because The other part is much more complicated So let me bring up our up query color here this color All right, um Imagine That how are we going to do this? Imagine that an up query comes for seven seven is our our enemy here um And it goes to view two because So this is where like cv is going to be the Both the column the view is sharded on which is going to be the same as the lookup column of the view Right, so if you want to query the view on column Value seven you only have to ask one shard This sort of speeds up reads a lot because if you had to ask all shards it would be kind of unfortunate Although there are cases where you want to do it um So your career for seven goes to uh view two because seven mod two is one Which is two because I didn't zero index these indices, which is annoying, but it's fine Uh, so a query comes in for seven to v2 and v2 misses Uh, so v2 needs to up query all right, so v2 Has to send an up query And in fact, I'm gonna make a slight adjustment here to make this work out Um, I'm gonna say that Um, this is actually a right join Should have drawn this the other way around so it was a left join, but it's fine. Um And you'll see the reason for this in a bit So this up query Needs to go To both shards of j And to see why the lookup key here is seven Right, so we we want everything where cv is seven Right, that's the query that we're sending up here Well, we don't know because Because of this Right because of this The fact that cj and cv are different We don't know whether to whether we Whether cv equals seven is at j1 or j2 In fact, there might be some records on j1 and some on j2 because j is just sharded by an entirely different column And so unless we knew something else about the data, we have to ask both shards Um, all right, so j1 and j2 Both get an up query for For where cv is seven Is all well and good imagine that they both miss Okay, so now they both have to up query. So I'm going to draw these in slightly different shades Because that might be useful later So we're going to go one is going to be greener one is going to be bluer so j2 has to issue an upgrade But it's issue an upgrade for cv equals seven But because of this thing We don't know Whether it's at a1 or a2 Right, so this upgrade needs to go to both a1 and a2 So far so good But now And this is the kicker The upgrade we made to j1 It has the same problem. It's looking for cv equals seven Because of this it needs to up query both the shards of a See what the problem is Both a1 and a2 Both receive an up query for the same key twice This means that they're going to send two responses For the same key at the same time So what happens To our poor poor Union over here. Let's I guess red is right. So we're going to go with Green for some kind of bright green Maybe Yeah, it's good So our little shard merger here I like to draw them like this with a little dot in the center to indicate This is a shard merger, which is sort of a special form of a union Um So this union is going to receive uh a1 a1 a2 and a2 All of which are up career responses And if we think about what that looks like for a second Right, so it conceptually has a left pocket and a right pocket And remember it has one of these for each value The problem here is the value the the sort of key Um, I should say key not value here It has one set of these pockets for each key But the key here is the same for all of these all of these have key Is like cv equals seven So when a1 arrives That's great What if another what if the input to this is such that the order is a1 a1 a2 a2 Okay, we receive the first a1 and then we receive the second a1 What do we do with this guy? What do we do when we receive this? And what do we do when we receive this? It's not clear. Okay, so let's let's discuss the problem and then uh And then I will explain how we're going to fix this particular problem You might also observe that there's another huge problem here that I've sort of just ignored Which is This is definitely bad and sad Um, you'll notice that this one innocent up query here Had to ask n shards of j right Each of which so that's a multiply Has to ask n shards of a And this with some quick maths Is n squared This is not great, right a single query for a key led to n squared up queries And this means that like if you imagine that you have many shards because you wanted a fast system Or you have a lot of data This means you're doing a lot of up queries and in theory this it shouldn't be necessary There's no reason For both of these up queries to happen because they're asking for the same data So in theory there might be a way for us to Make only one of these happen But we need to make sure that it has to happen if either j1 or j2 needs the data Right imagine that j1 does not need the data and j2 does need the data Then we still need the up query to happen exactly once uh This purple problem This problem problem down here This is research I have some ideas for how we might want to do that And We're basically not going to be talking about it today like I'll answer some questions, but like We're not going to be dealing with this problem today because I don't know how to solve it If we find some time at the end, although it seems relatively unlikely Whereas this problem I think I might have a solution to So bug See I told you it was going to be on this drawing. All right, let's discuss Uh Oh tom tom. Hi Stay safe you two man It is a big mug. I love this mug. It's as big as my face. It's filled with tea Uh, this is an unsafe undefined behavior macro only 1st of april stream. Uh, I wish no this is my day today Just dealing with this stuff Uh Quite complicated to summarize. Yes, I agree This is one of the reasons why doing a stream on this is something that I've been hesitant to do Because I realized all of this stuff needs to be explained and all of it is fairly technical The fact that it seems like maybe people have been able to follow is a great sign Uh and better than I expected like this is some complicated stuff And like if you if you think about all the explanation we had to do in order to get to what this bug is We're very deep or deep deep down the rabbit hole, right? uh and so summarizing at this point is Would basically take as much time as it took us to get here um I had counters for both a1 and a2 and only propagate when the counters are equal uh Do you combine a1 and a2? So remember one thing to keep in mind here just to complicate the picture a little bit more Is These are writes, uh, there may be writes in between here Right, it might be a bunch of writes here Uh, and more importantly Uh, let's do this in like white. This is like a1 And a1 prime and this is a2 and a2 prime Uh a1 Is not necessarily equal to a1 prime because there might have been writes in between Uh a2 is not necessarily equal to a2 prime because there might have been writes in between And so what do you do? It might be that there are ways to sort of replace these. Um That like you only keep the last one, but how do you even know what the last one is? So again, uh, remember here that we're in this weird position where The shard merger does not know how many up queries there were And so we wouldn't know how many responses it needs to wait for If j1 happened to have the records it needed it did not miss for cv equals seven Then there's only one up query in which case we don't want to wait for the last one because there is only one So we would wait forever I again, I have a proposal for how we fix this bug. I just want to talk to through what the bug even is Um N2 yeah n squared would limit your closer size. Yep Uh, can we try to shard by the same key or does that become complicated? Um these Assume that these view the sharding of these views cannot be changed Uh Or if you were to change them they would lead to significant inefficiencies um Basically the the way to think about this in the in the context of this particular graph A needs to be a a or a is a base table That's sharded a needs to be sharded by its primary key Because otherwise you doing primary key lookups, which you do a lot of would be extremely inefficient So a1 needs to be sharded by ca J basically needs to be sharded by the join key If it's not sharded by the join key then every join lookup has to talk to Every shard of the join which would be really bad Because most of what joins do is do lookups on the join column um And so those two are just set and the view Is sharded by the lookup key of the view Which is what you want to shard by because otherwise every read has to talk to every shard of the view Um, and so this means that all of the shardings here basically set you If you change them you introduce enormous inefficiencies to the read or the right path And so changing them is not really an option Uh, a context is key here else for sure If you solve the first problem of redundant up queries, wouldn't it also help the second problem by reducing the number of up queries to n queries of j plus n queries of a So 2n yes, um There is an argument here for if we solve the up query explosion problem Which is what we refer to it sort of internally as um that would solve this particular shard merging problem It turns out that even if we somehow found a way to solve that sort of performance problem There's still other query graphs you can come up with where a union might have to deal with this case I it's a little bit too complicated to try to work through what those cases actually are live Uh, but just trust me that there are other cases So even if you solve the up query explosion, uh, you still need to solve this union problem Think of it as this union problem is a is like a correctness problem. It's a correctness bug Whereas the up query explosion is a performance problem that also happens to Reveal a particular correctness bug Uh, can you buffer weight with the same locking concept with pockets in the sharder as you do in the merger for up queries? um Right, so this is a good thought the idea here that's being proposed is basically what if the Up queries rather than going directly to a the up queries sort of went Via the shard merger and then the shard merger sort of combined the up queries and then they followed the path The thing is you don't really want to do that Uh, there are a couple reasons for this One reason is that now the shard merger has to do a lot more work and the other is that now um Your path for up queries a lot more complex because everyone tries to talk to the shard merger and then the shard merger tries to Talk to everyone Which means you have this like fan in and fan out problem, which leads to a bunch of performance bottlenecks essentially The nice thing about having up queries go directly through the source and not The up queries do not do not actually follow the edges of the data flow graph They go directly through the source and bypass any operators in between Only the responses flow in line in the data flow graph And that's really nice because it means your up query has to go through fewer hops Um, and it also means that you can use the network, um, essentially more efficiently Uh, but it would allow the shard merger to de-duplicate. So that would be the upside Uh, but as I mentioned, um, that still would not solve the other places where this union bug comes up You only keep the latest snapshot, but you keep track of how many snapshots you've seen How do you know when you've seen the last one? How do you know which one is the latest if you don't know how many up queries there are? There might be some point where you've received an up query response You've received an equal number from both sides Um, but there are still more coming And it's not that then you would still produce multiple up query responses So so a sort of non solution to this problem, right? Is that we produce one up query response for every up query Because what's going to happen then is j1 is still then going to receive two up query responses And j2 is also going to receive two, which is not really what they're expecting um Yeah, so the up queries go directly to the shards What if the shard who has cv7 says an upgrade to the shard merger telling it that I found the record Um, yeah, so you could have you could have the shard merger Tell if the up queries went through the shard merger then the shard merger could tell the If the if they went through the sharder then the sharder could tell the shard merger How many up query responses to wait for? That is one possible solution But it still then requires all the up queries to go through one path Uh, yeah, so uh, there's a good question that's being raised. What exactly is the right behavior here? and It's not entirely clear when you have these up query responses where the The union can't really distinguish a1 and a1 prime It also can't distinguish a2 and a2 prime Except by virtue of time And so it doesn't really know how to connect these it doesn't it doesn't know how to resolve this And so that is part of the problem and that observation is going to be part of the solution Uh, can you hash to the pocket key with value and up query ideas? So now we're getting pretty close to the solution I have in mind So I think we're just gonna go I'm gonna go ahead and give you My proposed solution and I'm gonna go p and you can discuss it while While you think about it. All right, so my proposed solution is as follows. I'm gonna use What color do I want to use pink pink is great. Let's go with pink, uh, red green. Um, let's go with Yellow bright like a bright yellow beautiful That kind of bright yellow. All right, so my proposed solution here is as follows Uh, rather than have the pockets be per key We're gonna stop doing it per key We're gonna have the pockets be, uh per Tag and I'll get back to what tag is Uh and requesting Shard This is my proposed solution a tag Um is a particular up query path So j1's up query path here and j2's up query path Share the same tag And so for the purposes of the examples we've talked to you so far You can assume that the tag is basically not there. Um, the tag needs to be there Uh, in order to deal with cases like when you have many many up queries Like many different up query paths all of which are different, but some touch the same nodes But for the purposes of explanation here, I'm basically proposing that instead of Keying this by key. We're gonna be keying it by Sorry and the key obviously comma key So instead of keying the pockets just by key we're going to be keying them by Which shard made the request and which key it's for And this also means that the up query responses to these Have to include this information So that information has to be included in there in addition to the records in the key Um, this essentially means that the union just buffers Completely separately for the up queries that j1 does and the up queries that j2 does The other reason why this is important And why I'm proposing this scheme is because ultimately when the up query makes its way down here And eventually reaches the sharder The sharder can now inspect this information In order to make the decision of whether the response should go to the left or to the right shard Now you'll observe this is not fixed the problem of there being multiple Identical up queries happening at the same time. It doesn't avoid the performance problem But it fixes the correctness issue. It basically allows us to distinguish a1 and a1 prime and a2 and a1 prime a2 and a2 prime By having them just be completely different they exist in like parallel universes So the fact that something the fact that a1 is in a pocket only affects a2 And updates If they are up query responses for the same shard And then the change we're going to do to handling writes is that a right has to look at all pockets for all of For all values Let me Draw that actually so if you get an update like this guy or this guy or this guy or this guy Then they are going to check pockets For all requesting shards for that key And they're going to have to update anything that's in the pockets Using the same rules that we had previously for unions All right, so while you think about that i'm going to go p and i'll be All right What do we think of this solution? We consider adding a phantom reference To the prior shard as a means of preventing this multi-lookup. I don't see how that helps um Rollins got game that you're in the wrong channel um Yeah, exactly. So the observation here is that the performance problem is definitely a problem But the correctness problem is a correctness problem and needs to be fixed first And especially because this particular issue can come up in other contexts too In essence Both a1 plus a2 and a1 prime plus a2 prime will flow downstream completely separately. Yeah, that's the idea well Not quite Yeah, the the plus the union of them. So the union is going to Completely separately handle a1 plus a2 and a1 prime plus a2 prime Where the primes are the ones that were requested by j2 and the non primes are the ones that were requested by j1 The tag includes the path of the up query. Couldn't a1 a2 Inspect the tag to see if the results going to flow to the same place downstream and that way throughout redundant queries um So the problem with that approach the observation here is a1 sees that it gets to up queries and a2 sees that it gets to up queries Why doesn't it just answer the first one? um, and the problem here is um When a1 gets its first up query, how does it decide whether or not to wait? And if it decides to wait, how long does it wait for? That's the problem you run into You we could have it wait and then sort of deduplicate. Uh, but That gets complicated. You basically need to introduce artificial latency for it to wait And even then it could get wrong and crucially you get end up in a really bad place where uh For a1 the first up query response arrives and then like enough time passes that it decides to respond And then the second one arrives and so it sends two responses, but at a2 the two up query Uh, requests arrive almost at the same time So it only sends one response and deduplicates them and now the union is going to get two a1s and one a2 Which is going to be bad Uh, the better way is to not emit two queries. Um, I agree, but again, it does not solve the problem There are other graphs that do not have the up query explosion problem, but still have the The the root gets multiple queries for the same key problem Basically what this is revealing is that the union pockets cannot just be per key that is insufficient Deciding on which queries you should response or not seems a lot harder Yeah, the the other thing that's nice here is that uh With with this particular solution is that a1 and a2 do not know anything about this If they get an up query for a key, they respond to the up query for that key. That's all they do um and so that Basically solves this particular problem or rather, um The receivers of the up query do not need to know anything about D duplication or looking for things that are the same or how many queries are going to get They just get a query and respond to the query And then we can tackle this problem of the up query explosion separately by doing research Which is like magic all right So now that we understand what the problem is Uh, why don't have j2 and j1 and j2 talk to each other to avoid sending duplicate queries? That's a great question um One problem with that is uh Here i've drawn everything with two-way sharding But if you have a particularly large system and you have a lot of load You might have 100 way sharding. You really don't want 100 nodes to have to coordinate to answer any up query um Right, I mean it could be that's part of the solution to solve the the up query explosion problem But but it seems pretty costly right now You're going to run something like consensus between 100 nodes in order to satisfy one up query At that point The explosion might even be better because you don't need the coordination. It's it's not clear But this is why I say like this is just research separately and it's separate from the bug that we have to fix Uh less complexity yep This also has much less complexity to just change the the union pocket Bucketing basically The union could eventually try to not same the same response downstream with this approach by sending the last one to everyone for a given time window um That could also be you're totally right that it could be that the union could eventually become smarter about this and be like if I have like I have two a's I uh in Pockets for the same tag in the same key but different requesting shards Uh, and then I get like both The things that would fill the other pocket for both of them Then I can just send one and say that it should go to both or something Uh, there might be some optimization we can make there later But but that is definitely an optimization and not something that's that's necessary um, I totally agree with you that uh, the observation is if you have 100 nodes then 100 duplicate up queries Basically like under square is really bad and I totally agree. Um The the coordination would also be really sad Like if you had to run paxos over 100 nodes for every up query, that's also really bad. Um I actually have some ideas for how we might solve this that do not require coordination Or a query explosion, uh, but that's still very much in the research phases and like Not worth trying to dig through now because it's it's mostly a lot of me like brainstorming rather than I was trying to solve something which is the plan for right now All right, so let's finally finally Jump into some code. Oh, no Here's the code All right, so welcome to the noria code base. Um, Noria, actually let me give you an ls here Let's remove all these guys because they're not important Let's remove some like Uh Remove some log files and stuff Okay, so this is the root of the noria repository Um, you'll see that there's a folder called applications Which is a bunch of different like benchmarks and stuff. I don't know why there's still a noria benchmarks That should go away. So applications has like example benchmark example applications and benchmarks Um, like for example the the graph I showed you earlier of like article and vote That's actually one of the noria benchmarks. Similarly, I mentioned that we have all the queries for the lobster's website and we have The application for that is also in this directory um Then there's the orchestration folder Which contains things like being able to run all the noria benchmarks distributed on ec2 that sort of stuff Um, the noria benchmarks folder should have gone away. It shouldn't be there ignore it Um, and then there's noria and server So noria is the client side of noria it includes Basically the api bindings the similar to like the mysql client library or the postgres client library or whatever It has all of the code necessary to interface with the noria server Um, and then server has the implementation of all the server side stuff Which is the kind of stuff that we've been talking about now. That's where um, Like all of the dataflow stuff works all of the sql to dataflow translation stuff works And all the other stuff for just like routing requests and and uh migrating the dataflow from one From representing one set of sql queries to a new set of sql queries that sort of stuff All right, so let's go into server because that's where most of this will matter As soon as it's server, we have a couple of different directories. These are essentially sub crates Because building noria already takes a while and being able to not compile all of it each time turned out to be useful Common mostly has shared traits and types. It's not that interesting Dataflow contains all of the dataflow engine of noria. So this is Basically the stuff we've talked about so far in the stream mere is the middle intermediate representation for noria, which is what we call Parced sql before it's turned into dataflow. So this is the layer that takes all your sql queries all the prepared statements we get from the user Parses them and it produces basically a query graph that it then does optimizations on And mere is the thing that takes that does those optimizations at the sort of sql operator layer and then produces the resulting dataflow And then source is the stuff that contains all the server side glue codes is like the htp server for handling controller requests Things like coordination between different noria server instances The thing that drives all of the the futures that represent different noria operators And of course a bunch of tests In this particular case I'm going to bring us in To a file called payload Uh, so payload Includes most of the files. Let me make this a little bit bigger um, so payload Includes basically all the stuff that gets sent over the network And primarily things that get sent between different noria instances And between different operators The primary one about these is packet So packet is just an enum of the different packets you can send I won't go through this in too much detail, but the ones that you need to be aware of are Message so message is a normal update. This is just so link here has the source and destination address And data is basically just a vector of rows or deltas rather And then replay piece so a replay piece is A part of an up career replay The reason it's called a piece is because as you know, sometimes they need to be assembled There are some other reasons too, but those are out of scope here So you'll see that every replay piece also has a link, right? It has a from and to address for the up query response It has a tag. So this is the stuff I mentioned of a tag is the way that we distinguish different up query paths that happen to Intersect on an edge, right? You might imagine multiple up query paths Where they happen to cross the same edge or the same operator, but they're not the same up query path And so we need some way to tell them apart. So what a tag is And data is the delta contained in the replay And then it also has this replay piece context So replay piece context, you can ignore this regular case Includes information about which keys this replay includes data for So this is basically like cv equals seven, right? So it contains seven Unishard, which is a Field we're not going to be talking too much about But Unishard is basically What did this up query originally only go to one shard of the parent As opposed to the drawing we've looked at right the up query went to all the shards of the parent operator So Unishard would be set to false Sometimes if if a for example was charted by v by cv Then only one of the ace would be queried and then Unishard would be set to true And so you'll observe that A union only needs to buffer if Unishard is false. That is it only needs to buffer if the Up query the original up query went to all the shards So that's what Unishard is for and ignore you can ignore Let's see if that yeah great So that's most of the stuff that we need for this particular stuff and now we're going to switch to union Actually, let me pull up. I actually wrote a test for this Because it's useful to have a failing test that we can try to fix So this build just starts noria in the testing context. It starts it with 16 shards You can ignore the other parameters And I have a documentation here which basically explains the stuff that I've explained so far in the drawing Right, so we're going to have a Base x and a base y they're both charted by a We're going to have a join that started by some column b and a reader that started by some column c and it sort of talks about the Problems that arise in this particular context And then it sets up that graph. So this is this is not using the sequel interface to noria This is using like the low level data flow interface So we add a base table base x That has just three columns because we're going to need three columns in order to shard by three different columns So they're called base call join call and reader call and it's going to be have a primary key of the zero So base call is going to be the primary key which is going to make noria shard base x by the zero column So this basically sets C A right. This is basically a in our drawing. It's going to set c a to be this column And then we have some other base and that other base is charted But it's not really important Then we have a join and the join is going to join the two base tables just like we saw It's going to be a left join between x and y right. So x here is on the left side rather than right side in the drawing And it's going to join on the First column so zero indexed of x this is going to be the join column Right, so it's going to end up being charted by that column So this is how we get a sharding of the join that's different than the sharding the base table And finally the reader, which is what we call a view node Internally and noria is going to have a lookups happen on the second columns here indexed, which is going to be reader call And noria is going to shard this view by the lookup column like we discussed And so therefore this view is going to end up being sharded by the third column As what this produces is a graph that looks the same way as we had in the drawing And it's going to have the same problems what we talked about in the drawing Um And then what the test does is it does a bunch of writes It tries to make sure that the writes get spread across all of the shards of the base table And that the writes are going to hit every shard of the join And that so you'll notice There are lots of different values of the first column, which is what the base is sharded by so we should spread across all of those For the join column we use actually this could probably be i as well to be honest probably wouldn't have mattered Um But this is going to be spreading over the number of shards in the sort of join column dimension And then I make sure that the there's only one Value that appears in the column we're going to do lookups for so if we do one lookup on key one That's going to hit all of these records that we've inserted So we basically end up with sort of a maximal upgrade where really all of the shards have data for this single upgrade And so this maximizes the the chance if you will that we end up in this weird case and notice that it is a matter of of um probability because The union might never detect that this happens Right, it might be that the up crew responses arrive so that we get like a one and then a We get a one and then a two and so the union goes Oh, I have both and then sense it and then we get a one prime and then we get a two prime and the union goes Oh, I'm done send it. So the union might never detect anything bad But in order to try to make sure that that happens in this particular test I basically have Everything produced lots of rows So that that is relatively unlikely And then we do a single read and so this read is just going to do a lookup by The shared value that all the records have And then it checks that the result is correct Um And if I run this test which tried this earlier, let's see if it still works Nice All right, so this crashed perhaps unsurprisingly And it says not implemented detect chained union This is an old word for this bug before we really understood what was causing it um But if we look at the implementation of union We see that that's this unimplemented over here And if we look at if we sort of squint at the code just above it you see the really what it does is it Looks at all the keys that the replay was for And then it has this like Buffered and if the buffered already contains An entry so these are the pockets essentially For this source for this key then it goes. I don't know what to do So basically it panics when it detects the case that we drew earlier, right It receives an a1 and then it receives an a1 prime. That's when you're going to hit this unimplemented. That's this panic um And so this test indeed reproduces the problem that we had and if we want to we could sort of I think I already have this somewhere, but let me Copy it anyway Let's do graph dot dot And then we're going to do x dot graph dot And see what that gives me maybe. Oh, there we go Yeah, so here, um This basically shows us what the the graph layout is of the data flow So you see the base y which is the thing we join with We have base x which has these three columns and it says that it's sharded by the base by base call 16 ways This little full circle in the top right indicates that it's a full materialization Um egress you can basically ignore. It's sort of a sender node that lets you cross Boundaries ignore it ingress same thing. You can ignore them Um, and then you see we have this, uh, shard merger. This is the icon I drew earlier, right So this you say see it's D sharded to avoid sharded shuffle. So I mentioned how we don't have sharded shuffles So this you see is not like dashed. This this operator is not sharded Um, and then this is a sharder. So this is a shard merger. This is a sharder And this shards by join column And then that arrives at the join Uh, and the join Joins on you if you read this carefully, you'll see that this ends up joining on the join column And you'll see that it is indeed sharded by the join column itself Um And then at the bottom We do another like shard merger and then another Sharding and then we end up with a reader the view at the bottom Which is sharded by the reader column also 16 ways and you see this is partial That's what that little symbol indicates. Uh, and the same thing with the join I'm not going to go into why the ingress is materialized not to join. That's for a different time Um All right, so we have the graph we expect and now let's look at what this um, what this crash actually was so this crash Said detected chained union at seven. All right. So what is seven? seven is Probably union up. Why is my mouse acting up? That's very annoying Stop acting up mouse All right, so seven is this shard merger up here right So this shard merger up here is the one that detected that something was wrong Which is what we expected right like this is the union in the drawing that also saw something being wrong Um, and what did it see being wrong? It got something from shard three on key two Right, so it got something from shard three of base x specifically for key two So which key this is sort of doesn't matter, right? Um, oh, that's interesting That might be a different thing we need to keep in mind. Um But yeah, this uh, this just tells us that the union observed the issue that we expected it to observe Any questions about this graph or or the problem we just ran into Does replay mean up query response? Yes replay is The word we use for up query responses for relatively legacy reasons Replay pieces in up query response, yeah Uh, yeah, so I think this graph should not be terribly surprising That's weird. Was it just x dot? Or is maybe the battery is running out of my mouse That'd be annoying Let me plug this mouse in so that it doesn't die There we go much better Uh, is there any particular reason why all the notes are sharded 16 ways by default? So by default, um By default that norio doesn't actually shard like you specify when you run the thing Uh, when you run norio you specify what you know this is being difficult um The only reason whoa, what is happening to this window? Whoa, my buffer is just Being all sorts of weird today Huh um If you look at the test actually You see that I set the number of shards and the reason I set the number of shards pretty high for this particular test Um or to 16, which is like whether it's higher or not doesn't really matter Is because the bug is more likely to Be detectable at the union if the number of shards is high because there are more up queries you need to wait for So there's a higher chance you end up observing, uh multiple occur responses for any given source shard Uh, I thought we were going to query key one. We did query key one, but the replay Is for a different key Right remember the the join has a different lookup key Uh, so that's why what we're really seeing is the up queries For the keys that the join needs Uh, but you're right. That is the observation. I was like that's weird. Um, but I think that's the reason why um I think that's the reason why It could be that Um, it's a good question actually whether the key we use for the grouping of the pockets Should be the up query key Which is going to be like, uh, CV is seven Um, or in our case like one, which is what we're querying for Or whether it should be The key in the values that come back Oh, sorry, uh, when it says key what it means is columns It's the key columns. It's printing And the key column is two So that's okay. Great. That makes me feel better Um, everything is shifted by one line and not refreshing to expose. Yeah, I think it's um, it's a tmux bug. I think it's weird Um, yeah, so the the reason it showed two was because two is the key column The value is going to be one if we look at it. Okay. That makes me feel a lot better Um, and notice the the little comment above here Um, which is something I wrote way back in the day when I Realized that this would be a problem, but didn't really know what the problem was or how to fix it So I wrote we'd need to keep a queue of replace from each side apply rights to all queued replace from that side And then emit all front of queue replace in lockstep Um, so that was my proposal at the time, which is one of the ones we talked through it, right? Um, and this Gets kind of weird although you could imagine doing it this way. Maybe Um, but I don't I'm not convinced it's any better than the solution. We now came up with All right, so let's try this and see what happens Um, okay, so the things that need to change here is that the union Basically needs to have some way of knowing The tag which is currently it doesn't know the tag either and it needs to know the tag But it also needs to know which um Which shard was the one that requested the up crew in the first place Right, so if you look back at the drawing, that's the information that we're proposing we keep here Is the tag which we have to add? and the requesting shard Which also needs to be included and it's not currently something the union is told about So we're going to go back to payload um And what we need is this replay piece context currently doesn't really include enough information I think Actually So it includes the tag right the tag is up here, but we also needed to include um Like requesting shard It's going to be a u-size In theory it could be an option u-size, but I think I want it to be a u-size Probably Yeah, I think I want it to be a u-size um And then we're really just going to do like error driven development here compiler driven development It could be a new vim issue. Although I've also seen the issue crop up in mutt Oh, it's hard to say um All right, so Obviously now we're going to get a bunch of errors with in places where we don't mention this field Right, which is sort of what we want. So node process Uh 87 Okay, so this is The code that When a packet arrives at an operator Um It needs to call a method on that operator basically and this is the code that does that so you'll see this is for Node type internal which are all the operators. You'll see there are a couple of like special node types like egress and readers and charters But most of the normal relational operators are internal and here Seems like it tries to extract Some information It looks like it's turning a replace piece context into a replay context Wonder why I did that Well That's fine so really what we want then is To also have requesting char to be communicated as part of whatever that translation is That's probably going to give us some other issue up here Yep, so this is going to be Also included that information, please Um, and we also actually want the replay context here to include the tag Um Which is not currently something that's included So here we'll also want to include the tag Which hopefully it has somewhere It's a good question. Does it have the tag? Replay piece Because this is all the information that's provided to the union and the union needs to know the tag and the requesting shard Unishard for unimportant reasons the key column and keys All right, let's see what this gives us So now we're back at the union line 450 Uh, so this is also going to be requesting shard Uh, and the tag great. So now we have that information in the union. Let's see if that's actually Enough whether there are other things we need to pass Okay, so domain mod line 1569 so domain mod um As a part of noria, I haven't told you about yet Which is Well, there are many parts But the way that noria executes operator graphs and this actually warrants a new picture Although this would be a much simpler picture actually um The way that noria models these operator graphs is as follows If you have a graph, let's go with like blues, let's do a nice little blue Um, I'm gonna draw circles because they're easier to draw If you have a graph that has like all sorts of funky edges and stuff and things like maybe there's another guy over here Really over here and then there's like a thing and then there's some stuff and this says a thing maybe and this says a thing Right, so imagine that this is your data flow graph and it's abstract on purpose Then I told you earlier that noria only ever has one thread that operates on one node at a time Which is true Although noria actually goes a little bit further than this and it uses something called thread domains So the basic idea here is we're going to draw these sort of artificial boundaries And how we draw them is something that I'm not going to talk about But well, I'm going to talk about a little bit um So this is sort of a connected components diagram Except we subdivided more it's more like a voronoi diagram um And the idea here is that All of the nodes in one pink circle Will be handled by one thread the reason this is advantageous is that It means that this channel for example Does not need to be a channel. This is really just a function call Because there's only one thread operating on this whole pink circle And so when this thing produces an update, there's no reason to send it anywhere We don't have to serialize is we don't need synchronization of a channel We just do a function call That invokes this operator with whatever that update was And in some cases can save you a lot of work So imagine you have something like a large chunk of updates or something having this sort of thread local operation Can save you potentially a lot of work And it avoids things like contention on channels And it has it has many other benefits It avoids a bunch of mem copies that avoids cross-core movement of data So there are various There are various reasons you want to do this But you don't want these pink circles to be too large because of true large then Like All of the other cores on your system might sit idle because they have nothing to do And then there's one thread It's just like spinning on this because all the work is in one domain So you want sort of a you want to divide your data flow graph into domains That's what we call these So that you have enough domains you can saturate all your cores But as few domains as possible given that constraint And so that is what And so the extreme case of this of course is Every notice in its own domain It's not what noria does but you can imagine so domain mod rs is really the code that handles All of the stuff that happens in one pink box And if if one pink box includes only one operator, then there's relatively less work that has to happen here But think of this is like this is the thing that like accepts incoming connections decide which node it's for Does the processing fills in missing state? Handles requests from readers when there's like a An up career request to it It does all of like the logic that is greater than that of a single operator All of the all of the stuff that has to do with talking to other nodes Anything that's like uh intra node as opposed to inter node operation Yeah, it's basically dividing conquer on the resources and there's a trade-off here That's not immediately obvious for how you choose what this should be Um All right So what this code is doing the all the like domain mod is a pretty large file because it turns out there's a lot of like uh inter node Stuff like there's a lot of bookkeeping and logic that's needed there And in particular one of the things that uh domain mod does is it is the thing that It is the thing that sends out up queries in the first place so here for example This is uh So seed all happens if um A node in a domain is asked to respond to an up query Then it's going to look in its own state to see if it can right So imagine that we've been asked to send someone the state for seven We can only do that if we have the state for seven So it looks up whether it has the state for the key that matters and if that state is missing Then we need to send an upgrade to our ancestor to fill in the state that someone downstream of us asked for So that's what this case is here Oh So the tag is already there And here we need to just set requesting shard and what is requesting shard going to be Well requesting shard here is actually pretty straightforward. It's just going to be Uh self dot shard unwrap or zero Right, so if we are Unsharded, so that's the case where self dot shard is uh is none Then it doesn't really matter what we say set requesting shard to because the downs If if there is a union that has to do this buffering. It's basically going to ignore our It's going to it doesn't matter whether the requesting shard information is included because it will only have one ancestor if we're not sharded Basically, you won't you won't ever run into this sort of shard merger case if we're not sharded Uh, okay. What other cases are there? This is what we're doing here is basically walking through all the places where we send up queries this 1688 what is this case? This is Uh For relatively stupid reasons. This is basically the same thing um So this for requesting shard is also going to be self dot shard unwrap or zero in fact I think all of them will be but let's see for sure there's actually I'm going to give you a little bit of a spoiler here. This solution is insufficient because there are some cases where Imagine that j the j's here are not materialized and so the up query goes all the way from v to a Then it turns out the union needs to know some additional stuff But we're just gonna ignore that problem for now because this this is still progress Uh, 2309 This is if a Ah, so this is in handle replay All right, so this branch we hit if Um, this is when a domain receives an up query response Uh Then it processes that it passes it to some node and then after the node is processed it needs to Do some additional things and here I think we can just ignore the requesting shard This is not relevant Again, there's a lot of code here that I'm just sort of glossing over And it's because noria is big and complicated and it's not it's not worth your time to go through it Uh control plus to reclaim that lost line in the vim command bottom No, this is it's actually a rendering bug in neo vim. I mean I could just resize my font and that might do it but Um Okay, great. So now it compiles. That means that theory. We're passing requesting shard now in all the places that we need it Um, and this also means that at least in theory I feel like there's something that's missing here Feel like there's something that's missing Because when you send an up query response The requesting shard of the up query response should not be your sharding But the sharding of the person who requested So I think we actually did that wrong requesting shard Uh Yeah I think actually what has to happen here is I know what's happening. There's a different Payload. So this is the forward path, right? This is the up query response But the up query is what needs to be Tagged with what the requesting shard is. So there's another packet type which is Request partial replay And this is the one that is to have requesting shard Because only the thing that sends the Up query in the first place knows which shard it is All right, so in other words, uh requesting shard. This is wrong And This is wrong. So let's Bring that back All right. So 306 This is a request. So this is one of the places where this should be self shard unwrap or zero This is a request so that needs to include r1 This is a request so that needs to include ours This is a request great So now we make sure we send it everywhere Uh, no two four three four and here And now we need to make sure that the response gets tagged with requesting shard With the information the response gets tagged with the information from the request Right So what's this going to look like? So this calls seed replay. So that's going to have to include requesting shard Arguably this function should be rewritten now that it takes as many arguments, but we're not going to do that here requesting shard is going to be a u-size here Uh and if this Okay, so this case is easy Right because this isn't the same function. So we know This sort of is we got asked to do A replay and we're going to send the response straight away so we can just copy the information over The other case I think it's going to be a little bit harder. I guess we're about to find out 1580 And so this is a little more complicated. Um, I'm not sure how we're going to do this. So Noria has this Noria has this feature where If it gets an up query, it'll wait a little while before satisfying that up query because it assumes that there might be more up queries For the same for different keys for the same tag and it would like to batch them together Um Which isn't really going to work. So I think what this is going to be is the Maybe what we should actually do is change the meaning of tag I think the tag here is actually going to be No, we can do this. That's fine Okay, so the question becomes how do we get requesting shard into this function? This means that when we batch Up query requests, we have to batch on them by tag and requesting shard Which is a little bit sad, but it should be fine So 1475 this does seed all And this does nasty stuff Yeah, so here you see their buffered replay requests that these up queries used to be called replays and replay requests That's why you see that in the code Arguably, we should just rename all of these to be to match the up query language um Why don't make requesting shard an option to avoid the unwrap for We could It makes the data structure larger Um, so this is a data structure that gets sent with like almost every packet So keeping it small is nice And making it an option would increase its size by a few bytes That's the only reason really I don't disagree with you that maybe it's better as an option but Um Given that it's totally fine to use the value zero if there are no shards Or if if the thing is not sharded, I don't think it's worth it um Yeah, so I think here what we need is this elapsed replays Really needs to include um Questing shard So I guess here we put that after the tag so Questing shard. All right, so elapsed replays Yeah, so buffered replay request is currently essentially a Hash map from tag to the batch we're collecting and I think the key will actually need to be uh a tuple of Tag and requesting shard So we don't accidentally collapse Uh requests that are from different requesting shards, but along the same tag Uh, which means that the place where we push to buffered replay requests Buffered replay actually let's do this. So that's going to be from tag and u size And where do we push to it? entry this is going to be an entry of tag And requesting shard, which hopefully we have access to in this context. We do Great Let's see what that does Great. So now we actually have the wiring correct Oh, yeah, youtube is way worse than twitch. It has longer latency and worse quality Uh, can you make it an option non-zero? Uh, no because it's not non-zero Uh Shards zero if you have two shards the shards are zero and one In theory we can make everything be one indexed, but There would probably break things all over the place. So it's not really worth doing Uh, really what I want is like a non max u64 because you're It has the same it serves the same purpose where the max value can serve as none but until we get there's like a proposal I think for A trait that's going to be like constantly It's like a const known unused value trait Uh, which you could have like a wrapper type that's non max size for example, but without that. I think we're just going to leave it Okay, so in theory we now have all the wiring we need Um So now the question becomes what does the union do and I think what we need is for the union Let's see here replay pieces Um Yeah, that's awkward. There are a couple of reasons why this is awkward But basically this has to be keyed by tag U size And the key and the key is a vectin we have far too many vect data types is a little unfortunate, but it's because um You could imagine that the key is a compound key Like the thing is sharded by or keyed by multiple columns rather than just one column A data type is our enum of of um primitive data type In the system. So this is things like it's basically an enum of like int and string and float and stuff And so if you go back to the drawing, right We needed the key to be tag and requesting shard and key rather than just key It used to be just key And now it's tag requesting shard and key Um Now I guess replay pieces Is going to be a bit of a pain because this The key here is going to be The Does this even know the tag? Oh man Actually, okay, so so you'll see that so on input raw Is a thing the union implements that tells it about the replay context of the updates you'll remember that um I told you that operators don't generally know that something is a replay all they know is that something is a normal update And that's all they know how to process. So there's an on input function That's what most of them implement and on input. You'll notice has nothing about replays It doesn't know whether it's a replay or not and unions are a little special because they need to do this buffering So they get to implement on input raw, which also gets information about whether it's a replay And if so what type of replay Um, and you'll see that we match on that and in the case where the replay context is none So it's just a regular update then we don't have a tag. We don't have a requesting thing And so here you'll notice that it actually has to do a This is where it does the look up for Uh, whether there are replays in any pockets related to the update And so this we're definitely going to have to update because this is going to have to check all of the pockets For that key, which makes me think that replay pieces is actually going to have to change a little bit. It's going to have to be Not this but Vec data type to a hash map from this to replay pieces Because we need an efficient way to At the at which point this should really just be a be tree map to be honest We need an efficient way to find um to find All of the tag use size pairs For which this key needs to be updated. I think that ends up being right It's going to be a little bit awkward. This patch probably needs a little bit more infrastructure um Yeah The reason to switch to a be tree map here is because Tupols are ordered by The parts of the tuple So you can do something like I want to do a range lookup on everything where the key is equal to this and these have any value Is an efficient query you can do against the be tree map As opposed to having a hash map where the values are hash maps a nested hash map Um Why is the hash table key vect data type of the key and not the key itself? Wouldn't it lead to bugs when different keys have the same type? Yeah, so that that's basically what i'm observing here that um That really this should be like Yeah, it's kind of awkward because I think actually What value you look up into here by depends on the tag So to backtrack a little bit Unions are actually broken in a bunch of different ways And one of one of them is the thing that we're looking at now, which is unions basically Assume that there's only ever one replay path through them And that's why it could get away with this right where if there were two replay paths It would just crash it would not work correctly But if there's only one replay path then there's only one set of columns and therefore you're fine There this will never have a mismatch and what now that we're introducing this notion of there might be multiple Then this gets iffy so really if we wanted to stick with what With the assumption that union unions already have which is basically there's only one tag through me In fact, let's do that for now and then we're gonna have a to do these need to be per tag We'll see advantage of btree map over hash map. Um, the advantage of btree map here is actually for allocation It means that rather having if the the corresponding hash map type we would need would be hash map from this to That This would mean that you have lots of small hash maps As opposed to just one big btree map Uh, so this is more allocations basically it's not quite True that the logic is this straightforward And it could even be that the hash maps end up performing better But this is sort of a Little bit nicer in terms of the the storage Um The btree map will also guarantee that these are relatively close to each other Like all the different requesting shards for the same key will be in sequence Which might be nice It may not end up making a difference um Yeah, so notice that we're basically here ignoring the tag for now, which is uh isn't a we're continuing with the implicit assumption that uh unions already had that um It was only ever Unioning that the replay paths was were only ever over one set of columns This is not true in practice, but there's no reason for us to fix that in addition to the bug at this time um Okay, so now let's make this be default default Rather than what it was in case we change it later that can still be lend. That's fine That can still be whatever it was. Okay, so here um Let's get back to this one because this is the case for updates which are a little more complicated Um And then the other replay pieces that's fine Uh, this can actually be a Mem take which exists now Uh replay Is mem take Which I implemented what else do we have? So mem take just is a member place with um, uh, with an implementation of default Uh, so this is still going to key by the key But then in addition it needs to key by Right, so this is a btree map now. So the entry here is actually going to be key And requesting shard And now in theory this should never be entered So this is Got two up query responses for The same key for the same downstream shard What Great, uh, this is going to be something like downstream shard issued duplicate About to How do we want to phrase this downstream shard? double requested key And then it's going to include some information like the node The source And the key columns. That's fine And in theory I think that should be all we need For this part, uh, this can now do I think this can now be a replace What is this business on eviction? Oh The eviction is going to be all sorts of annoying So here, uh, I really don't want to do eviction So this is where the If we had a hash map, then we would have to like Iterate over all the things under that key and since we have a b-tree map what we can do is Uh, actually We might not be able to do this But I think what we are going to end up doing is like this to key U size max value I guess dot dot equals And then this is going to be we're going to have to do something silly like collections b-tree map bound What's it called? I'm gonna the screen is going to become bright b-tree map So we got here range bounds Oh, I see that's fine. Yeah, so this is just going to be instead of get mute. It's going to be range mute Right, so this is then going to hit Um All of the different requesting shards for that key and so this is where the b-tree map comes in handy Although you could do the same with a hash map Because here when a given key needs to be evicted we need to evict all of the all of the buffered pockets I'm not going to talk too much about it because it's complicated Um Yeah, and then we only have the update case left and here actually Oh, this is going to this is going to kill me Two poles get awkward here Um, this is going to be like let key is this Um, and this is also going to be a range Mute and this is not going to be an if let this is going to be uh four pieces in And this is going to be from key zero to key U size max value Because here too, so this is this is the case where we get an update and we need to check All of the pockets, right? So in this case, uh, because we're going to have a distinct We're going to have a distinct pocket. We're going to have sort of a version of this pocket for Uh J1 and we're also going to have a version of this pocket for j2 And when an update comes in it needs to update the Uh buffered up career response for both the shards Right, so it's going to look up by its key and then it needs to update all of them And that is what we're making this this code do it used to just look up Is there anything buffered for this key from this source? um And now it's going to not do that anymore and sadly For reasons I don't want to fix right now. Um, it means we're going to have to clone the key And it might have to stay that way Uh, this probably Oh man Uh, this else can go away. Let's see what this does Six two six Right, so this is going to be the same for E in this It doesn't have an else It's great Uh 630 No field buffered. Did I change what the Value was here? I did not think so All right, I've made some silly mistake um If there's only one replay path at a time, why do we need map at all wouldn't it contain like one key all the time? No, the reason it might contain multiple keys is imagine that um One shard queries for like key seven and one shard And the same shard later queries for a key eight The two of them need to have separate pockets those two occur responses Right, so we actually do need to have potentially multiple things that are being buffered um Oh, I see why this is it's because this also gives us the key which we don't care about um, and similarly for The other place which was like five three four Down here somewhere or I guess six two eight um B2 map range mute is nice to us and also gives us the key But we don't actually care which key we're looking at. Let me do that, huh? This makes me so sad but It's for stupid reasons um Here this probably needs to talk about the value is my guess So with B tree map when use entry screen is going to come bright again Uh entry So what do you get if you have an entry here and you have an occupied entry? Then there's key B Should just be get this expression has type entry that Expected entry found hash map entry Ah Yes, that is true Um This is now a B tree map entry And it's true the tag is unused All right, so Let's now in theory we now have the fix right it now buffers based on the requesting chart as well Let's try I have no idea what this is going to work So let's try to run it. Well compile it first It's pretty close to three hours Let's see if the live streaming gods are satisfied with us and fixes They didn't crash but it's Also not finishing Let's do a quick look see here and see See what it's up to my guess is this is going to be a Another long fun debugging experience that i'm not going to do on stream Yeah, see that's what I was worried about um so basically What this indicates is that the Upquery response never arrives at the view It's like the view does a read for one right which starts an up query And then the the read for one just sort of sits there It's waiting for the up query response to arrive before it can answer to the client And if that up query response never arrives then the clients read never completes um this indicates that we are Somewhere like that basically the buffering never ends. We never release the up query That there might be a bunch of reasons for this. It might be that it's the second shard merger that gets stuck But this is basically something I have to dig into Um, but I do think that this is certainly progress where I did no longer crashes where it used to So I think we're actually going to end the stream there even though it's a sort of Weird point where it's not fixed. Um, but at least I think we've we've now Explained what the bug is talked through why this is a solution and then implemented the solution Mostly given the the constraints we had And now it's sort of debugging the solution and figuring out What the next steps are including things like making unions aware of tags And so I think this should give you a a decent sort of overview of of the internals of nori at least the ones we've talked about today And also the ways in which um The kind of ways in which Doing work on this kind of code base works It's sort of you've sort of gotten a shortcut where I know what the problem is and I know what the fix is and this is usually not the case um, but Yeah, a conduit block is basically what happens like in in the async await world Or just in general when you have like this is a problem where Something is just buffered forever and then The way that problem manifests is Everything is stuck waiting Like it's a deadlock And debugging them means figuring out why things are stuck It's not terribly surprising So I hope that was useful. Um, it's Hopefully this stream was even though it was very technical and we sort of dove in pretty deep in a large code base and a complicated one Uh, hopefully it was at least possible to follow as usual I I'll upload this to youtube afterwards and that way you can watch it at your own pace and sort of rewind and stuff Um, if there are any sort of closing questions, then I'm happy to take them now um And then whether we do another noria stream, I don't know yet. Um, it sort of depends on the response to this one Uh, and whether Another good problem comes up. So this particular bug Is one where I thought I might be able to Explain it and go through it and fix it in like three hours. Um, most problems in research are not like this This is sort of an outlier So if something comes up, then I might I see someone is is also, uh Someone is also asking whether I can do another stream to demo the fix after the fix is landed I might do that. Um, what I might do is When I land the complete fix, I will like tweet out a link to the commit That might be what I end up doing You're lacking a lot of infrastructure in order to implement this fix properly during the stream Yeah, I mean If I were to fix this Properly like first of all, I wouldn't spend two and a half hours explaining what the bug was and how to fix it Um, so that would shave some time off. Um, but also we would have some time to like I would I would introduce a lot probably more like debugging info and stuff to With the expectation that this is where we're gonna end up Problems usually lead to more problems. You're not wrong. It's like an endless chain of debugging All right, if there aren't any more questions, I think we're gonna end it there It's like a solid what three hours and 15. It's pretty good. I think I'm happy with that Um, hopefully you found it interesting. Hopefully this sort of Short insight into noria was the kind of thing you had in mind when you were all asking for me to do a stream on noria Uh, and maybe we'll do it again I will see you all later. Remember to wash your hands remember to stay apart sadly And then when we get through this, there'll be more streams. I don't know when but there will be So long for well. Good night everyone. We will see you next stream and