 So, hey everyone, welcome to day three of GPN 21. Nice to see all of you here in the morning. Today we have another very interesting talk for you about Matrix. And I want you all to please welcome Andrew, Andrew Morgan with his talk, Solving the Historical State Problem in Matrix. Hello there. Wow, microphone. Hi everyone. Yeah, my name is Andrew Morgan. I go by ANOA online, so you might know me by that as well. I am a maintainer of Synapse. I have been for a while employed at Element. I'm also a member of the spec core team at the matrix.org foundation. And I've been doing this for about four to five years now. It's my first GPN as well. I'm very excited to be here. And today I'm going to be talking about Solving the Historical State Problem in Matrix. So, who here does not know what Matrix is yet or hasn't heard of it? I know you know what it is. Okay, that's really good. I guess for the people streaming, I'll still cover it very briefly. We can go through it quickly. Basically, Matrix is an open network for secure decentralized real-time communication. That's what we say. It's essentially a networking protocol. And it has a lot of really interesting and a broad range of use cases, such as interoperable chat, which is probably what you're most familiar with. You can also do VOIP stuff on it, so audio-video calls. We also have some interesting stuff for AR and VR in the works, which is quite exciting. And you can also use it for some IoT applications. And of course it has end-to-end encryption built in, so you can use that across all of these. Matrix is decentralized. It's a federated protocol. You run Matrix on your phone or your laptop. And you connect to a home server, which is what actually stores all of your messages. And those messages are stored in Matrix rooms. We call them what you're like, channels on other services. Your home server communicates with other home servers. And it replicates these rooms in kind of a peer-to-peer fashion between the home servers. And through that, you can exchange data and communicate with anybody else on the network. We also bridge out to third-party networks through bridges, but I won't cover that here. And anyone can host a home server. You can self-host it yourself. Or you could use a hosted home server offering. We provide a free one at matrix.org. So let's take a look at one of these Matrix clients. There are many out there, but this one is Element. And you can see here that we're looking at the chat room for the Godot engine. Is my mouse work? Yes, it does. These are just screenshots. So this is the Godot engine. This is their chat room. It's really awesome open-source game engine, by the way. You should totally check it out if you don't know about it already. At the top of here, we have the name of the room. So it's Godot engine. Makes sense. There's also a topic where people can find some other rooms that are related to Godot that is always shown. And then there's the room avatar there. But how exactly are we storing all of this information in the room? Like, how does this actually work under the hood? Well, these things in particular that are persistent are considered part of the state of the room. And in Element, I can actually enter the developer tools. And I can click this button that says explore room state. And here you can actually see the various types of room state that we have here. There are things such as m.room.name, which is the name of the room. m.room.topic. You can probably guess where that one is. And there's the avatar and so on. And if I click on one of them, like the room name, this is actually what is stored inside it. We call this a state event. And it's just a bunch of JSON with some useful information. So as you can see, the type is there. And if I click on edit, I'll focus just on the content field that was in there. And here we see the name Godot engine. And actually, if I had the right permissions in this room, which I don't, but if I did, I could edit this and then send a new state event. And it would update. It would overwrite the old state in the room. And that's how you could change the name of the room. Obviously, we have better UI than this to update the name of the room, but this is one way to do it. All right. So if we go back, we can take a look at another state event type. This one is m.room.member. And wow, we've got a lot of options this time. And all of these are actually separate state events, all with the type m.room.member. But all of these have a different state key. And these are the state keys. So we uniquely identify a state event, given a combination of its type and its state key. And the state key can be any string. It can even be an empty string, which is what we had with m.room.name. There's only ever one room name, so there's no need to differentiate multiple of those. But in this case, for m.room.member, we have a bunch of different state keys, and they're actually all user IDs, or matrix IDs here. So if we click on the first one, which is actually mine, this state event right here is defining that I am joined to the room, that I'm in the room. And yeah, you can see the state key is just a field in this JSON. And if we take a look at the content again, then yeah, we can just see right here there's a membership property, and the value is join. So this state event defines that I'm joined to the room. This is also, by the way, where we define your display name and your avatar in the room. And yes, you can have a different one per room, even though we don't make that easy in the UI. So that's another example. If I click back, we can look at another user, say dino terminator. And they have a different membership value of leave. And this means that they were once in the room, but now they have left. So that's what state does when a client actually receives all of these state events. It can figure out how to render the room in the UI. But if you notice here at the bottom, we actually have, like about 2,300 of these state events, just for membership information. And if we go back to the actual room itself and go to the sidebar, you can see that there's actually only about 1,700 people here in the room right now. And that's because a lot of the people that were in here have now left, and maybe they left a while ago. So why is that a problem? We'll get to that. The last thing is we also have non-state events. For instance, messages. These are not state. We can look at the source of this and see that it's also just a JSON blob. But it has no state key field, which essentially means it's not a state event. So we call these just normal events or events. So what's the problem here? Well, let's take a look in detail. I've talked a lot about what data is stored in a matrix room. They're JSON blobs. And that makes sense. But for this talk, we're just going to focus on state events. But what data structure are these state events actually living in? Is it just a whole pool of state events? Okay, well, that wouldn't really be really efficient to iterate over. Well, I'll show you. So let's say you want to create a matrix room. Cool. What do we do? You start off the room with an event, specifically a state event, specifically, specifically, an m.room.create event. And it's going to have an empty string as its state key. Again, we'll only ever have one of these things. This is all going to be very simplified, by the way. And then you have an m.room.member event. And this is going to contain the information that actually joins you to the room. And on the left here, you can see that we're starting to build up some current room state. So this might be what a client saves, for instance, in its local state. And maybe you'll set the room name. And then perhaps your friend joins. And then they leave, because he was not very impressed with your room. And so on. And you can see how we're starting to build up this graph on the right, and the current room state on the left. And you can see that the membership of your friend, what used to be joined, is now updated to leave. But what's interesting in this very contrived example is that we now have two state events that effectively cancel each other out. Because one was for a join, and then one was for a leave. Yet these two are always going to remain in the DAG. DAG stands for directed acyclic graph. That's what we're starting to build here. And because this is an append-only data structure, this is just going to stay in here forever. But whatever, I'm sure it's fine. Like, surely things cannot get out of hand. So now we have a problem. These graphs can get really long. For example, matrix HQ, or headquarters, which is one of the largest rooms on matrix, currently has about 230,000 state events in it. And that's all the historical and current state events. If we cut out the historical ones, if we just take the current state events, which would be sent down to a client when it first joins the room, we still get about 100,000 events, which is quite a lot of JSON to both download and parse. If you download it, it's about 24 megabytes of JSON. And that's just one room. So you can see why sometimes it takes a while to log in. There's various things we try to do on the server side that end up iterating over this entire structure. And that can be pretty slow to process. Things like state resolution, or when your home server is joining a room for the first time. This also ends up taking quite a lot of disk space. We also need to generate indexes in the database on top of it and some other tables to traverse through this quickly. So you end up with a large Postgres database. Hopefully you're using Postgres. And clients, obviously, have to download all of this data because they need to know how to render the room and their UI. And we also want to add some features on top of state events. They're quite a useful property because when you get into a room, you kind of have to know the state. So you can put things in there that clients need to know. That's great, but if we want to build these features and we can't clean them up afterwards, especially if we allow anyone to send a state event so that they can share their live location or something, if someone comes in and just spams a whole bunch of that and you can't delete that spam, well, that's pretty crap. So how do we address this? Well, we've tried to deal with it in the past in various ways, but they've mostly been workarounds. We did have the faster room joins project, which ran for a few months. And actually, I can explain that quickly because I think it gives a good overview of the issue. So basically, you have server A and it wants to join a room that servers BC and D you're kind of hanging out in. To do this, if it wants to join through B, B would have to send over all of the state, including the historical state, to server A. And that would just make both servers really sad because server B has to pull all of those events out of its database and then into its memory and then it has to chuck all of that over the wire and server A would get this as like a 300 megabyte JSON blob and then it would have to hold all of that in memory and parse it and this would take literal minutes. And in the meantime, little Billy is over here and he's really sad because he's waiting this whole time with no feedback and the reverse proxy probably timed him out again with some unrelated error and the whole thing was just awful. So we solved this by only sending over the state events that you really needed in order to join the room, which basically amounted to leaving out all of the membership events, which brought it down to I think about 30 state events in the end. So significant reduction, great. And then once you were joined in the room, in the background, the rest of the state would still have to be shipped over but you're kind of kicking this can down the road and at least you can look at the room but maybe you can't send messages yet. This sort of worked but it doesn't really solve the disk problem or the CPU problem, it's just sort of spreading that problem out over a longer period of time. But it would just be really nice if we could just delete some of this stuff, maybe. Like when we look at a room, there's really not a lot of state here, right? Like a lot of this is historical churn of people leaving and joining over time or old room names. Can't we do away with a lot of that? Well, in the summer of 2022, an elements employee named Andy Bellam got some folks from the backend team at Element together and he said, yeah, I want to figure out deleting state, guys. And some other people said, sure. And things were set in motion. The group met every week or two to discuss how this could possibly be pulled off. The first idea that came out of this group was an idea called epochs. What are epochs? Well, I won't go into too much detail here because we don't have a lot of time. Oh my God, we really don't. But essentially, you'd basically have a point in the room where you'd say, this is the state of the room and you can forget anything before me. Which sounded nice in theory, but there were really a lot of what ifs and edge cases. Like, which home server actually creates this epoch? And whichever does, do they have a complete copy of the state? Like what if some state just shows up magically and blew afterwards because they weren't online for a little while. Could that be an attack vector? If you just said, oh, you missed this, by the way. And it got really complicated really quickly. There were some other concepts. There was something called generations, which would link together different related groups of state events and they would link back to a parent and then his parent. You'd have checkpoints, but those are pretty similar to epochs. And then you would have state roots, which was, you would basically have the common ancestor of all of the recent state events, you would deduce that. And then with that, servers would only need to know the state root and the current state event and the room state at the state root to get a complete picture of the state. So you're kind of cutting out some of the state, but not a lot. And again, there's a lot of edge cases. We looked at some other systems to see if anyone else could figure this out, like Git and Blockchain. But we found in Git that you can delete old commits, but obviously then you're losing the history between them. You can actually separate these old commits out into a history repo and then merge them together again later if you need to, but in Matrix we kind of need to do that all the time to validate a new event, which often gets sent. So that separation doesn't really grant you anything, except more overhead. And then with the blockchain, you can't really delete data. Each block contains a set of transactions and they build on top of each other. The problem was sort of addressed with light nodes in a lot of blockchains, but essentially the light node would run on your computer and it would just download metadata about each block and that would be a subset of the full block and you would download that from full nodes, but you're really starting to erode the whole decentralization thing here with that. So after all this research, we started to realize to really delete history, you have to start a new graph. And eventually you might realize that this starts to look and sound like another concept we have in Matrix. Room upgrades? They kind of work, right? They exist. Okay, to give a brief overview of what room upgrades are, when you create a Roman Matrix, it has a version. That version describes how certain things behave in a room. For instance, in room version eight, we introduced the concept of restricted rooms, restricted join rules, and that allows someone to join a room if they're already a member of another room, which is kind of useful for like, you know, hack spaces if you have one room that everybody's in, you create another one, everyone can already join that, but if you're not in the hack space room, you're some outsider, you can't join it. So some nice hierarchy stuff there. And if you wanted to gain this functionality in your room that already existed, you need to perform a room upgrade. And a room upgrade copies some, there's an allow list of the state from the old room that you had, into this new room. So the room name would get copied over, the room avatar, whether the room was encrypted or not, which is also a state event, and it would all get plonked in this new room that is room version eight, and it's kind of like your old room. It also puts a m.room.tombstone state event in the old room, which would tell clients, hey, this is an old room, you shouldn't use it anymore. Here is a link to the new room, please click that. And what this would look like an element was some confidence from Tulier there. You would basically get this link at the bottom that would tell you something about the room has been replaced, and you would click this, which doesn't have an underline, which isn't good for accessibility. And yes, some clients do this a bit nicer, but it's still all a little confusing. Like no other apps have this sort of mechanism going on in their chat, right? And users have to manually click the link in most cases. Some clients will try to automatically invite users from the old room to the new room, but then you have to first notice that invite, which also isn't a great element, and accept it manually and blah, blah, blah, blah. So there's a lot of UX problems with the fact that we don't copy over that much state from the old room. For instance, like we don't even copy over the fact that that room is in a space. So when you upgrade a room, it will just fall out of the space. Ideally, this would be a lot more seamless, and we've been wanting to work on this for a while. So what can we do here? How can we eat two cakes with one spoon? Or whatever the expression is. MSC 3901. Ooh, what does that mean? Well, MSC 3901 refers to the pull request ID 3901 on the matrix spec proposals repo in GitHub. MSC stands for matrix spec change, and if we click on this rendered link here, oh goodness, that is a very long document. Well, in a nutshell, here are the concepts that this draft MSC introduces. Concept one. Which state should we delete? Because we don't want to delete everything, obviously, like the room name. Also, for the purposes of this, we're going to ignore historical state because room upgrades basically solve that for us. It gets rid of it for us for free. But what we really care here is about what current state we can delete, and there's still a lot of that as we saw in the matrix HQ example. And let's give the state that we want to delete a formal name, like let's call it obsolete state. Yeah, that's great. Or obsolete state events. What type of events do we consider obsolete? Or rather, which events would people no longer care about? Okay, leave membership events. If someone's left a room where they've been kicked from it, we have to update their membership to say it is leave. This is different from ban. We definitely still care that someone is banned. But if they've just left the room, that's the same as they've never really joined it all. So we can call that obsolete. If a state event was redacted, which doesn't happen often, but it does happen, that means it was stripped of all of its content, but it still exists in the DAG taking up space. And someone intended for that to be deleted. For instance, if someone spammed a whole bunch of state events and you want to get rid of that spam, you might redact all of those state events. Well, they might no longer show in the timeline, but they're still in the DAG. So yeah, I'll do questions at the end, because we only got like eight minutes. So let's get rid of those. And other events, well, like I talked about, we have the live location sharing stuff, which is pretty cool. You share your location, you say, hey, I'm going to start sharing my location, and that's done in a state event, and the reason for that is when clients enter the room, they send through a million messages just to check if anyone shared their location. They have it right in the state. And that allows anyone to be able to send these state events. So we need some sort of mechanism to allow future features like this to say, yes, I no longer care about this event. You can delete it if we upgrade the room. So for this, we could define a top-level property that's m.obsolete, that when that is set to true, we consider it obsolete. Simple enough. And anything can build on top of this. So if you're writing a new feature, you can say, I use this field from this MSC, and it will all just work. Okay, so that's a definition of delete, state events that we want to delete, right? Okay, good, we need that. Now, what can we do with this definition? Well, one thing we can do is not send that to clients. Well, great. Specifically, when a client is joining the room or they're just logging in for the first time, they need to sync all of that state. We defined obsolete state, a state we no longer care about. So there's no need to send this to clients. The one case we do need to send it to clients obviously is if they already know about some state and it wasn't obsolete and this state has become obsolete, we need to tell them that. So that is when we would send it down to clients and that could be a signal for them to delete it locally if they want. And that would cut down on a huge amount of these leave events that we see that I showed in the example earlier. Live location shares that ended or anything else the client might not care about. The client could choose to optionally delete it. For instance, you might not see in the timeline history anymore that someone's leaving a room, but some client's element probably won't do that, but maybe some embedded clients, yeah, they don't care, they'll throw that away. So that's one thing we can do and that brings us to concept three, leaving behind old state on the home server. We've got a definition for what state events we want to forget and now we're trying to not send that to clients whenever we can. The final step is to actually forget that stuff on the home server and how do we do that? Right now, room upgrades are really clunky. So how can we fix that? Well, let's go back to that inviting problem, right? Trying to actually make sure everybody gets to the new room. This doesn't happen automatically today. You might say, oh, let's just send an invite to all of the users, you have problem solved, but there's a bunch of edge cases like what if the user's on a dead home server? And it's not responding right now. You know, you went off to GPN and your home server died and you're like, ah, it's fine, I'll just fix it when I get home. But what if someone upgrades the room in the meantime? You know, you're not going to get that invite. Or what if the user gets the invite and they're like, oh, I don't know who that is, reject. Oh, I needed that. Oops. We also have rate limiting for invites, obviously. What if you have 2,000 users in a room and your client has to invite all those people? The experience is just going to be a mess and your home server is going to say, slow down. You can do 10 a minute. And obviously the reason we have this rate limiting is so users don't spam everyone. But now we have a legitimate use case for all of those invites. So ideally, we would make all of this automated and different from normal invites. Users wouldn't need to click accept and rate limiting would not get upset at us. How do we do that? Well, the thing that 3.901 proposes is to differentiate these invites from regular ones by adding this optional part of field to invites. And this new field will contain the event ID of the upgraded rooms, create event. The home server can then look at that event, which it should receive as part of the strip state that gets sent in the invite. So they'll see that create event and they'll see that there's a predecessor field in there which actually points to the old room, to the event ID of the tombstone in the old room, which should point back to the create event and it can just say, OK, yes, these pointed each other. This is valid. Someone has sent a tombstone in the old room. They had admin and it points to this new one. This invite is from that new room. OK, someone's not trying to mislead us here. And you do this validation step when you receive the invite. And if that all checks out, then you can automatically accept that on behalf of the user. And you could also exclude if it checks out this invite from rate limiting. You still want to probably do a little bit of rate limiting because it takes some CPU to check this out, but that's easy enough to mitigate. So if that works, that's great. And that's what the MSC currently says. It still doesn't really solve the problem of what if the user is on a dead home server though, because you're still doing this kind of two-way handshake where you send an invite and then it needs to be accepted. So there's currently an ongoing discussion on the MSC for an alternative mechanism where you just treat the ability to see the fact that there's a tombstone that leads to the old room as a free pass to join the new upgraded room. Someone's nodding their head. Yes, I like that. And this would be kind of similar to restricted rooms in a way where users are allowed to join a new room if they're a member of another room. And that would allow home servers to join whenever they come back online in the future, which effectively takes this two-way handshake to a single action. At the moment though, that method is still being worked out, but discussions on the MSC if you want to check it out. Okay, so we've made it easier for people to join the new room. Users should no longer need to manually accept invites and clients don't have to expose as much as this machinery to users which might find it. Oh, another strange thing in Matrix. So what else makes room upgrades hard? Well, we have this recommended list in the spec of types that you should copy over and anything else. Just forget about it. So, yeah, like the room name, the avatar, sure. But this list actually misses quite a lot and hasn't been updated for like a year or two. Yeah, for instance, spaces and just all sorts of things that are getting left in this old room. So let's change that up a little bit. Okay, so instead of this allow list, yes, let's just copy everything that's not marked as obsolete. And that's using our new definition, that's great. And everything else that's not obsolete can be copied over. There are a couple exceptions though. If you don't want to copy some arbitrary state, you can put in a new field called exclude from upgrade. Okay, someone will probably want to use that, let's include it. And also to be clear, the person who's performing this upgrade is the one who's going to be sending all of that new state into the new room. So they need to be able to do that. And sometimes they can't do that. For instance, the event auth rules in Matrix say that if you are sending a state event that has a state key of some username, you are not allowed to send that unless you are that username. So that makes it really hard to send these, we call them user scoped events, into the new room. The permissions model currently does not allow for that. So that's still something we're trying to work out. And finally, room IDs. Okay, this one's a bit weird, but stay with me. Every room in Matrix has a unique identifier, and that's how you can tell them apart at each other at the API layer. And this can be a problem when you're upgrading a room. For instance, if you've told a bot like, hey, you should do something with all of these room IDs. When you upgrade a room, obviously this room now has a new ID and the bot's not going to know about it. Also an issue with application services, which are basically privileged plugins for your home server. They have a list of room IDs which they might want to receive traffic from. You have to update that too, and that often requires restarting your whole home server, and that's awful. Third-party IDs, space hierarchies, user IDs, push rules, which is how you get notifications, all that gets reset on the room upgrades. I mean, we try to copy over some of this manually on the home server, but it's really piecemeal. So there's just a tendency for things to get left behind. So how could we make this better? Well, you've probably already read the title of the slide. When a room is upgraded, that new room will have the same room ID as the old room. How does that work? Well, each room can still be directly identified by a monotonically increasing integer, one, two, three, four, that we call an iteration. And we store this in the create event of the room, perhaps under a new iteration field. Storing it in the create event prevents it from ever being modified, and all the home servers in the room can see it, and they can agree on it. And when you upgrade the room, the new room will have the same room ID, but the iteration will be old room iteration plus one. If it's not there, maybe it starts at zero by default. And the advantage of this system is that clients don't really need to worry about the iteration too much. Anytime a client or a home server references a room ID, the home server will just translate that to the latest iteration of the room that it knows about. And obviously if clients do want to reference a old room using a new format, maybe they put a slash at the end or something to say this is the iteration I want, because you might want to leave the old room if you don't care about it as a client, and to do that you need to be able to reference it. Right. Well, that's a lot of information. What exactly are the disadvantages to this whole thing? Or the problems we haven't solved yet. If you have an event in the old room that you want to reply to in the new room or react to, you can't do that because references don't go across rooms. Same with threads. If you have a thread in the old room and you're now in this new room, you cannot reply to it because you can't talk in the old room anymore in most circumstances. So that's kind of a trade-off of having this hard cut between generations. We're still looking for ways to solve this one. You could solve this with nice UX in the client to say, hey, yeah, we know, sorry, but no, you can't continue this thread. But, you know, it's not ideal. We could provide some exceptions to the permissions model here, or maybe we make it so you can still talk in the old room, but only with references, something like that. But that is something that we need to figure out. Obviously the user-scoped state problem I mentioned earlier, something like this, if you're currently doing a live location and then you upgrade the room, well, you can't copy over that state event, so it's like the live location just ends, so that's a bit jarring. So we might need to make an exception for that. The spamming of invites that I mentioned earlier, we could solve this with bulk invites where you invite a whole bunch of users at once. So that would cut down at least on the federation traffic, but there are definitely questions about spam with that. Or something else that we haven't yet thought of. So a lot of this is still in the design phase, but I wanted to put out in a talk so people could actually get an idea about what is going on with it. So if you're interested in helping us figure this problem out, obviously you can leave your thoughts on this proposal document. You don't need to be anyone special, you just need to get a hub account. And you can comment on the document, just please use a thread when you do so, which keeps the discussions organized. And yeah, obviously you can come and talk to me. That's my matrix ID. You can DM me, or you can find me at the event. Thank you. Okay, so let's do a little Q&A. We've got 10 minutes. No, we have quite some time, actually. Oh, yeah. We've got 25 minutes. Well, please wait for me to come to you with the mic. So we have the question on the recording. And yeah, if you're ready, then let's go. Delicious water. I'll just start over here. What happens when there are concurrent room upgrades? Like I imagined there would be two rooms with the same ID and wouldn't that be quite a big problem? Yeah, we have thought about, like currently that is a problem, right? And in some cases it's nice that you can upgrade a room multiple times because maybe you messed up with one upgrade or it was manually done and it's pointing somewhere you don't want it to be pointing to. But it would be nice if there was some sort of perhaps relation between those tombstones to say, like, yes, you need to, you can send another one, but you have to say it replaces this old tombstone and then if a home server gets two of those, then it would just do state resolution to figure out which is the right one and the other one would be rejected and state resolution should make that deterministic across all of the home servers in the room so they should all reach the same conclusion. So yeah, you would be able to send another tombstone, but the one that was later would clearly be later and it would need to have another iteration. Or maybe it could, if it's replacing it, have the same iteration? I'm not sure. But yes, if you reference between the two tombstones we should be able to figure out a clear upgrade path. If that makes sense. So first off, this doesn't solve the storage issue, right? Since you still have to keep around the old room with all the old states. It means you don't have to tell everyone about it, but you still need to start on your home server, right? You can keep it around or you can purge it. Obviously some of you users might care about the fact that they can go in and see the old history, especially if we're tying it together nicely in the client. But this does give you the ability to go ahead and purge that old room if you like without tying it to a new room that's currently active. So you can, just as today you can leave a room and then purge it and it still exists out there on the network, you could purge that old room. You might want to define somehow in your home server to do that automatically or maybe the room admin goes, this room is huge, yes I'm going to delete this manually. But no longer do you need to act on that room. So it's okay if you purge it. But yes, you will be losing history. Or maybe you just keep the history somehow and purge the room. But if you have like a home server that's coming back after a long time wants to check in on a room and would eventually see the tombstone and act on that and do the upgrades automatically if you can't find that tombstone anywhere. Yeah, then they won't be able to move to the new room. There might be a way to just say hey what's the latest iteration of this room if they're all using the same room ID it would be easy to find the upgraded room because they could just say how do I find this room? But they might not be able to find the tombstone again and then I guess they wouldn't be able to validate that upgrade. Well, but they could just join as a new user rather than doing this automatic invite thing. Yeah, I mean whenever you delete an old room in Matrix you are deleting the ability for other home servers to go and pull historical state after that. So that is a trade-off you're making. But now you should be able to still join the new room. Maybe that won't happen automatically though. Yeah, so that was also my question. Would other servers be able to pull that obsolete state even though they were told not to? So like for archiving purposes or like analysis of if something happened if you want to do like if there's a malicious event you want to go and like deep dive into it? Yeah, exactly. Which is why we tend not to purge old rooms on Matrix.org even though it takes up so much space because we understand that a lot of people do actually backfill from Matrix.org. So we kind of do want it to be the archive of everything. We just launched archive.matrix.org where you can check out, yeah, it's very lovely. But obviously if you're a small home server just running on a Raspberry Pi or something you know you might not want to be a person for the whole world for Matrix HQ. There's probably other people so you can delete it. And all that would require a new room version, right? As in, okay, well there's two things here. So all of this stuff, yes, this does require a new room version because we are changing the redaction rules but maybe what you're saying is it requires upgrading the room. In both cases, yes. Okay, I think we have another question over here. One second. In the beginning you defined this problem that the dark grid graph gets too large. This is why we're large. There's a complexity to it obviously. Do you reduce the complexity with your proposals? If you don't do that, then it's not worth it. Yeah, so the question was if you have this massive DAG that is complex to go and fetch everything inside of it so that you need to validate new things that are being added to it. How do we actually solve that problem here? The question is that, well, essentially when we upgrade to a new room you get a new DAG. You still have the old DAG though but you don't need to iterate or act on that old DAG anymore. It kind of just sits there as an archive because you're no longer appending to that old DAG. You append the tombstone once, yeah. But once you're talking and everything in this new room that is just running over this new DAG which is nice and efficient again. So yes, you have this old one sitting around but it's just read-only. And that's cheap and easy to do. I think I meant something different. There's a complexity to do it like over and over or whatever. You have an algorithm and it goes haywire because it's too complex. And does this reduce the complexity? You're both cutting or deleting some stuff but cutting and deleting is linear and the direct graph goes like exponential. Okay, so it is a DAG. It is a directed acyclic graph. So it's not quite a linear graph. It does split out. But it then reconverges again and we attempt to not make that split out more than 10 events wide at a time by sending dummy events. So yeah, we're obviously reducing a little bit of the complexity for the clients because we can say, hey, clients, for the current state, you don't really need to care about these sort of events. So in that case, yes, the clients can parse less. But the home servers still need to keep all of it around until you do this room upgrade and then they only need to act on this new room which has a nice clean DAG and maybe in the future this could happen. If it really was seamless, it could happen automatically so the home server just does this in the background and it's transparent to clients. That would be the dream. For now, it'll probably take the form of some sort of like, hey, your room's getting really slow. Do you want to garbage collect it? Okay, I see another question in the back. Let me quickly go over here. How does this play together with Sliding Sync? Isn't that already a project to simplify the data that is sent to clients? Yeah, it's a good question. So Sliding Sync does help the fact that you only need to request certain data. For instance, it helps a lot with, oh, I have 10,000 rooms that I'm in because I love Matrix but I really don't care about 9,900 of them right now. I'm just looking at these top ones from my friends. Sliding Sync allows the clients to only request a small window of the data from the home server but even if you're just looking at 10 rooms, you still need to know the room name and who's in the room and things like that. So the state still ends up being a large chunk. For instance, if you're looking at the member list, you only see 20 people so you only need to request 20 things, which is great, but you still need to pull down things like old live locations and things like that. Effectively, Sliding Sync is really great for reducing the amount of state that clients need to see and we help a little bit with that in this case. The real thing though is the fact that the home server no longer needs to act on this massive amount of state and obviously the home server is the one holding that so we can't really make that more efficient by sending less state between home servers. As we saw with the Faster Room Joins project, you still need to get all of that state at some point to be able to validate new events coming in. But yeah, Sliding Sync does help massively in this case. We just define it as don't send this obsolete state to clients because yeah, it makes sense to do that. Okay, looks like all questions are resolved. Just kidding. No, there's another one, great. Since we still have time about the redactions, those are like the same thing as message redactions, i.e. what a normal user would call a deleted message. Are they the same thing? Yeah, they are. Yeah, when you redact an event, there is essentially an algorithm that says what keys in that event can you keep and everything else you should wipe. For instance, obviously you still want to know who sent that event and you don't want the signatures and everything, but you want to get rid of the actual message data. It's similar for state events. You would get rid of a lot of the stuff in the content like the room name would just become empty. But non-state events, when you upgrade a room, like very simply I deleted a message but I can still see who said something at that point. And if I upgrade the room, is that going to be marked as obsolete and not kept in a new room version? Yeah, so when I said we're not going to carry over obsolete stuff and then I also said that redacted stuff is obsolete, all of that refers to just state events. When you upgrade a room, none of the non-state events get carried over. Messages stay in the old room. This is why it's still important that clients can reference the old room because obviously like you said, you might want to go back and read some of that or search through some of it or something. But yes, all of non-state events will stay in the old room. State events that are not obsolete will come to the new room but that's a copy. So we're just resending the exact same state events. They'll get new event IDs in the new room. Does that make sense? Excellent. So, more questions. Anyone? Okay, great. If you imagine, let's say, a moderately extreme case of having a room and splitting the users, the servers in the room in half and saying that they can't communicate with each other, essentially an ongoing network split. And let's say that goes on for a week and in both sides they would actually do a room upgrade because they both feel that the server is, that the room is slow and we need garbage collection. And then you would end up with two essential rooms which have the same revision but which were created by different people and if that week is over and the network split is finally resolved, I wonder what would happen then if one version of the room would be lost one revision or whether they actually be able to reconcile the room into one or if one is lost or if they actually can't reconcile and live in their parallel worlds forever. Oh my God, it's a split of the blockchain. Yeah, so you would end up with two rooms that both have the same iteration and at some point you would hopefully heal the network together and both home servers would see in the old room. Oh, okay, yes, there's two tombstones leading to two separate rooms. So everyone would be made aware of this situation. Maybe someone could then go ahead and copy overstate from one to the other. They could say, oh, this room ID is alphabetically higher than the other one. Let's keep this. But what happens to the messages? Yeah, so I don't know how you would copy the messages over really unless you wouldn't resend everything. I think you would have to pick. It's just like a split brain in any sense where some of the history would just end up being thrown away. You could take the state and copy it over to the new room and maybe override it or determine in some way which one gets kept. But I think in that case you would probably have to let go of the messages in one of the rooms or they do continue on. At least you would be well set up to solve this problem because everyone would be aware. As long as they haven't left the old room and deleted it, oh, well then, how do they know that you can trust the fact that there was a fork that you'd only heard about a week later? Whoa, what a fun edge case. Yeah, I think this is alluded to currently in the MSC saying that, oh, yes, there could be split grains when we upgrade. And it was sort of a similar problem to the epochs where it was, sounds like Ewoks, where it was saying, oh, well, if I don't know about all the room state at this point that I made the epoch, would you make another epoch to replace that one? How would that work? I'm not sure. I think that would have to be cleaned up just by throwing away one of the rooms, though, unfortunately. But hopefully it would be as somewhat of an edge case that it wouldn't happen too often. But yeah, if you have people that are doing this on submarines and they do disappear, then they just might have to be told that that's what happens and maybe there's some custom solution to deal with it for them. Yeah, good question. We should write an integration test for it. Okay. I have a question about the basically spam of invite events and trade limiting, because in case of unencrypted rooms, the man of room or moderator of room needs to trust their home server, because the home server obviously can act for the user. Can't we use that for basically when the room is upgraded, the home server of the moderator could send those invite events itself by passing the need of user to do that? So it depends, because if one home server has upgraded the room, then if your home server is not in it yet, then obviously you can't send invite events for your own users. Right? Yeah. Okay. So this problem that you're talking about is home servers can act on behalf of users and they can just say, hey, you've invited someone now. Let them into this encrypted room or unencrypted room. And this is a problem that is addressed by another MSC, but isn't fixed yet. But basically it's the client has to sign something to actually cryptographically verify that there's an invite coming in. But yes, what if we could exploit the fact that in unencrypted rooms we could just generate these invite events. And you could do it for your own users and then accept them for your own users. And maybe you just automatically accept one invite and then all the rest of them you generate yourself and maybe that could cut down on Federation traffic. But what if the home server doesn't support doing that? I don't know. Feel free to clarify. I mean, even if you sign those requests or those invites, you can basically sign a passport for the server to do it itself and then other home servers will see, hey, this is signed by one of moderators so I can forward the invite. So that's solvable. Yeah, that's true. That is actually an interesting third approach to automating this. And I encourage you to write it on the MSC. Or I'll do it if you don't want to. But yeah, perhaps we could actually make the process a little bit more streamlined that way rather than needing to do this validation every time over and over again. This is why we give talks about it and get feedback. Thank you. Yeah, this looks pretty quiet now. Okay, anybody else has any question about this specific topic? Then I guess we are now finished. And please thank you, Andrew.