 Hello, welcome to my talk real-time multi-user state management for the cooperative web. My name is Matt Hayes. I am a senior software engineer at Wizards of the Coast. I work on D&D Beyond. If you're a customer, thank you. You probably know my work from the game log. I was on the team that helped bring that thing to life. It is a web socket-based communication, a sort of messaging channel that lets players share information about their characters, their dice rolls, their campaigns within D&D Beyond. Today I'm going to introduce this little app I built to demo a kind of alternative set of APIs and different sort of state strategy. I called the app SuperDuperFluity. A state management doohickey is available on the internet at kindness is the dankest meme slash SuperDuperFluity on GitHub. It uses webRTC and specifically webRTC data channels for transmitting actions between connected peers. It uses web socket for signaling which is part of the webRTC protocol. We'll get into that a little bit later. It uses a mechanism called rollback netcode, sometimes also replay consistency for state synchronization which allows peers to make optimistic updates to their local store and heal from getting out of step with the server, you know, whenever that happens. So before we go too much further, let's look at what it looks like. Okay, here I have a local server running just the background. Don't worry, we'll come look at some code a little later. And what it does is sends events from one client to another client looks like you got a lingering pointer here. Oh, there is. So you can see as I navigate between these different browsers, it's dispatching their pointers and not only that, but I have my iPhone here connected and an iPad. So it's pretty neat. So this uses webRTC. It is a kind of complicated API and I'm gonna spend a bit of time talking about it, but it offers some really interesting performance characteristics. So please bear with me while we introduce a bunch of new concepts. The literature about webRTC mostly talks about its use in audio video or voice over IP. It is primarily geared toward that use case. So that kind of makes it hard to, I guess, learn about when you're first exploring. It also introduces the concept of a peer connection and a signaling channel. The metaphor I like to use here is that the peer connection is like a landline phone sitting on the wall in my hallway and the signaling channel is sort of an operator or a switchboard. If I go pick up my phone and dial a number, it connects to the signaling channel, which tries to find a pathway from my phone to whatever phone that number represents. And once someone picks up on the other end, we're talking that whole process that I described is referred to as negotiation in webRTC world. And the actual sort of part where some peer and I are communicating with one another is an open data channel in the SuperDuperFluity example and my sort of general scheme here. Also part of the webRTC sort of lingo world here is the notion of topologies. Just a fancy word for the way in which peers are related to one another. The first thing you'll see and the set of the most basic setup is a peer-to-peer connection. This is just two computers on the internet. Use the signaling channel to find a direct path between one another so that they can communicate without necessarily having a central server do any kind of processing or relaying or anything like that. From there, it's not hard to jump to the concept of a mesh network. This is sort of peer-to-peer-to-peer-to-peer. It's interesting. It's a sort of logical next step from a peer-to-peer connection. But for our use case, for my use case here, the lack of a central authority makes conflict resolution between peers sort of difficult to reason about. And also an issue that is more specific to the audio video use case, but still kind of an issue is that maybe not most, but many ISPs are constrained on their uplink throughput. I think it's not uncommon to have like 10 to 1 download to upload speeds. And if you're one of these peers with three or more outgoing HD video streams, you're probably consuming most of your upload bandwidth pretty immediately. And finally, we have the SFU and MCU topologies. The use stands for unit selective forwarding or multi-point control. In my little diagram here, the unit is sort of rendered like a server, but nothing requires it to be a server. It could just be another peer on the network, kind of supernode. Signaling is still involved. I've involved in this process. I've just kind of omitted it because it gets a little busy with all the back and forth. But the main thing here is that signaling negotiates a communication channel between two peers, and the sort of central peer here is relaying information between all the other connected peers. The SFU and MCU, the differences between them are a little specific to audio and video, but sort of generally the SFU architecture, the central node is just a relay. It just sort of rebroadcasts messages to all connected peers. And the MCU architecture, the central unit is doing some kind of interleaving. So you can imagine a Zoom call with 100 participants. The central unit would take all their video streams, maybe stitch them together into a single video stream with a single codec, and then transmit that stream to all the connected peers. My implementation of rollback net code via WebRTC data channels is kind of doing both and kind of doing neither. And we'll talk about that a little bit later on. There's also a little glossary of acronyms we need to talk about when we talk about negotiation. This is what happens over that signaling channel and what establishes the connection between any two peers in a WebRTC setup. ICE, Interactive Connectivity Establishment is this sort of overall mechanism. Stun servers are essentially like a what's my IP address server. It finds where an individual computer connected to the internet sort of is, what its address is, how it can be reached. And turn servers are technically part of the RTC implementation. They're a UDP relaying server and are used as a fallback if a stun connection cannot be established between any two peers. For my implementation here, the central unit is a server on the internet. So if any peer can't connect to that server, I am assuming that that peer just can't connect to the internet. And it doesn't really make sense to have another relay that it wouldn't be able to connect to to try to connect to the thing. Finally, the RTC negotiation process has had some let's say complexities of its own as the spec evolved. And the new hotness, not really that new, is a mechanism called Perfect Negotiation or the Perfect Negotiation Pattern. We'll talk about it in a second. I have implemented here and we'll look at code a little bit later an almost perfect negotiation mechanism. The way my system works is clients are the only people who ever make calls. They're sort of impolite in the Perfect Negotiation Pattern. There is such a thing as a polite peer and an impolite peer. It's kind of left to the implementer to figure out who's going to be who. And in my system, clients are always in play. They only make offers and the server is always polite. It accepts offers. It always answers. But the basic flow is someone triggers a negotiation needed event that causes that peer to create an offer. Both peers handle ICE candidates over the signaling channel. The remote peer creates an answer, which is to sort of say, okay, let's communicate on this path. And once all that is in place, you have an open connection and you can transmit data back and forth between two peers. There's a cool little internal sort of page in Chrome. You can look at sort of the WebRTC internals. It's very technical, but kind of interesting to sort of look at the insides of what's happening. Okay, so all that said, that's kind of a lot to have in your head. A lot of sort of hands shaking and negotiation and back and forth, just to have a connection, a data connection between a client and a server. Why wouldn't we just use WebSockets for something like this? Well, it is a valid question. And I think, in a lot of use cases, WebSockets are perfectly adequate. The fact that WebSockets happen over TCP means that they incur some overhead. They have slightly more headers. They send cookies. There's some sort of stuff to do to limit that. Also, and I need to do some investigation to verify that this is still true. But as of I think 2016, WebSockets do not support message multiplexing, by which I mean, if I have one WebSocket channel and I send a large message, say an image, and then I send a small message, say like, hello world, the small message has to wait for the entirety of the large message to finish sending before it can start sending. There's no way for it to split up those messages and interleave their packets or frames or whatever. So that can be sort of unfortunate if you have a message already outgoing and a time sensitive message that you want to get across. Further, in all of my research, I've only ever seen WebSocket as an HTTP 1.1 upgrade, which means that the WebSocket connection is over HTTP.1, which means that most browsers, I think this is still true also, will limit the number of open HTTP connections that the browser can have. I'm not entirely sure if that is connections per tab or connections over the entire sort of browser instance, but it means that if you open too many WebSocket connections, one of them could be closed by the browser because you've reached the limit of open connections per origin that the browser says is an appropriate number. There's more on this in the high performance browser networking book, which is just a great resource overall, and I highly recommend it to everyone. But anyway, all that said, the particular, excuse me, the particular windmill at which I am tilting is sort of near as possible to triple a video game latency for multiple users in a kind of shared environment in a browser. And here's where WebRTC data channels can kind of do some stuff that sockets can't do. Because RTC data channels are transmitted over UDP, and because they have the requirement that they are secure, they implement a kind of TCP light mechanism, which exposes certain levers and controls to people like me who want to decide if a channel is going to be unordered and unreliable or ordered and reliable. I'm actually going to demonstrate an implementation that uses one of each for different purposes. An interesting thing to note here about the unordered unreliable channel is that you want to limit messages to a single packet. The reason for that is if your message spans, let's say 100 packets and you have 1% packet loss, you're highly likely to lose one of those packets. And if you lose one of those packets, you lose the other 99 as well because it can't sort of complete reassemble the message on the other end. So limit your unreliable, unordered messages to a single packet. For me, you'll see in a minute, but I'm essentially sending input events over this unreliable channel and I think they should fit handily inside that about a kilobyte. But yeah. Okay, so that brings us to rollback netcode. Also sometimes called replay consistency. It is server authoritative. It detects desync and repairs those via a sort of server authoritative sync mechanism. It is less complicated than CRDTs or operational transfers, which are sorry, CRDTs are conflict free replicated data types and OTs are operational transfer. These are both mechanisms that are designed for decentralized scenarios that is no central authority, which I'm not sort of interested in for my sort of game state model. And also appear to be mostly about document editing, whereas I'm kind of more interested in keeping a simulation, you know, mostly in sync. And, you know, very fast, which leads me to the last reason you would implement something like rollback netcode is that it allows you to be optimistic with local updates and not have to sort of like send the update to a remote, a terminal, wait for it to come back acknowledged by the server, and then play it out in a terminal consistency mechanism. Your game's sort of speed or its feel is really dependent on that network latency that you might have. Another nice thing about this rollback netcode implementation is that the same state machine state management module runs on the client and the server. The implementation is effectively a last writer wins register from the CRDT world, except that in our case, the server always knows who the last writer was. We essentially use server time to determine the correct order of things. So it's able to decide the winner and there's no kind of like mechanism for negotiating that between peers or decentralized units. And finally, this implementation requires determinism, but for that we're going to rely on the kind of right out of the box sort of like redux style of determinism, which is to say given a state and an action, we get the next state. And if you give this function the same state in and the same action in, you always get the same next state out of it. It's probably what you should have been doing the whole time, but it's important to call out. Also, because the unreliable unordered data channel could theoretically deliver the same message twice, it's important that our messages be item potent. That is, if I happen to apply the message multiple times, it doesn't it's not it's not additive. Like I never want one of my action messages to be like add 10 to this number, because if I accidentally apply that twice or I have to reapply it, I'll wind up adding 20 or more. Okay, so how does it work? Well, like I said, the client is optimistic. It applies updates immediately and sends those to the server over the unreliable channel. Clients are held in a pending buffer. That is just like, you know, these are the unsettled or unacknowledged updates that we've made locally. I want to server gets an update from a client, it essentially acknowledges it and sends it right back, also sends it to any other connected clients, which just receive it as a sort of server authoritative action or update. When a client gets its own message back, it settles that pending update by which I mean, it rolls state back through any other pending updates, applies the settled update, and then reapplies the pending updates that haven't been settled yet. All of that should happen in less than one sort of 16 millisecond rendering frame. So we want it to be fast, and we want the number of updates you need to sort of reapply to be pretty small. But that's essentially how it works. When things go wrong, and the ways in which things can go wrong, a settling update could invalidate more than one pending update, by which I mean a user could send an update, I click on this thing, and before that, like while that update is on its way to the server, another user already sent an update that is like, I move the thing out of the way, and so by the time it reaches the server, it can't click on the thing, and so it has to figure out a way to send back, know if that didn't work, or just trigger a full resync, which is sort of the worst case scenario where the server is just like, no, you're way off, just start over and it sends down the full state over the reliable channel. We want those to be strictly server-to-client and pretty rare, but there are cases where that might happen. An important thing to note here is that testing this system should be pretty easy. In your test suite, you create a sort of timeout register of two actors firing actions at different intervals, and once all of the actions have been fired, just validate that both of their states are the same. Should be great. A quick bibliography on some resources that I have found particularly useful in my implementation here. 8 frames and 60 milliseconds is a talk specifically about Mortal Kombat and Injustice 2. Their implementation is peer-to-peer, but they're doing the same kind of rollback, apply, and then replay mechanism, and the talk is really well stated. Second, this replicated Redux talk by Jim Perberk from the React VR team at React Europe 2018 also implements rollback net code in a Redux-like environment. There he calls it replay consistency, but also just a great one with some JavaScript source code, which is nice to see. Next is Figma's multiplayer technology. They actually have a lot of really great talks on their dev blog or articles on their dev blog about some of the different challenges with multiplayer editing and kind of the things that can come up. And finally, implementing Undo History from the Redux docs themselves. The rollback net code is surprisingly like Undo Redo History, except that instead of waiting for a user to press Command Z to initiate an Undo action, we just wait for a server action and then we undo and redo sort of on the user's back. Before we jump into some source code, just a quick note about the libraries that I'm using. WS is the server-side WebSocket implementation. Node WebRTC is the server-side WebRTC implementation. ZooStand is a state store. It implements a lot of nice middlewares. It's kind of geared toward React, but has a vanilla implementation that works in just vanilla JavaScript. Emmer is a immutable update mechanism. It implements this function called producers, which wrap everything in proxies. So you can do mutations inside of a producer. And then when the producer function exits, those mutations are flushed to your object. It'll make sense in a minute. And finally, EVT, which is actually from the world of Dino, but it is an event streaming library that I'm using almost exclusively to wrap event emitters or event targets in functionality that lets me merge their events into a kind of single stream and treat them as if it was one sort of reactive stream. Again, more on that in a minute. As I mentioned before, this is all up on GitHub under kindness as a dinkus meme slash super-duper-fluity. They're kind of forming areas of interest in the project networking, which is the WebRCT RTC peer connection and data channel stuff state, which is all the Zustan and Emmer input, which is mostly EVT. And finally, rendering, which I'm just doing with the vanilla canvas 2D rendering context API. We're going to pay particular attention to the negotiation logic and the with rollback function, which is a state decorator. So let's look at some code. Okay, so here's my server running. Here's my entry file. This is the client side sort of mount for where the app kicks off. I create my WebSocket as my signaling channel. I create my peer connection instance. And then I pass those two off to this negotiate function, which we'll look at in a second. But I guess I mentioned we're on the client. And I'm going to call this peer connection dot create data channel, which is what kicks off the negotiation needed event, which kicks off the whole kind of negotiation process. My action channel is where I'm going to dispatch essentially input events. And it is ordered false and max retransmits zero. So it won't if it drops packet, it won't even try. It does make no guarantees about the order in which they'll arrive. And I have a state channel, which is another peer connection. It's using the defaults here, which are ordered true and max retransmits null, which essentially I think leaves it up to the browser to figure out how many times to retry. This state channel is pretty close to a WebSocket just implemented in a kind of like UDP layer, which takes us over to the negotiate logic. So here is EVT wrapping the peer connections, negotiation needed event. I just it's an async function. I set this kind of is offering flag so they don't like make two offers or accept offers when I've already got an offer on the way. I create an offer, a kind of local description saying I am offering to connect. And then I send that over the signaling channel. There is a peer connection ice candidate event, which is just some kind of like magical thing that happens inside the peer connection implementation. All I really need to do is make sure that I'm sending those candidates on the signaling channel. And lastly, we have the signaling channel message event. Again, that's just our WebSocket. But it accepts messages with either candidates or descriptions and descriptions can be either offers or answers, which is why it's so kind of conditionally in here. If we get a candidate, we just want to set the ice candidate to our peer connection. If we get a description, we want to set the remote description. That means it's a description coming from someone else. And if the description is an offer, we want to make sure that we're not a client. And if we are already offering or our signaling state is unstable in any of these cases, we're just going to return. This is sort of the impolite peer. I don't accept new offers. And I just kind of ignore ignoring coming offers. Otherwise, we will create an answer. This is the server, the polite server case. I answer all offers and send that out over the signaling, signaling server, which is handled by the negotiation, the same negotiation logic on the client. That brings us back to once all that negotiation is done. We're here, we have open data channels. This is a really nice feature of EBT I mentioned before. It lets me merge separate streams and then treat them as a kind of single stream that I react to. But here, number six, negotiation is complete. We have two open data channels. And I am going to send a client action that says the connection is open. Send client action up here creates the client action, which gives it a source of client. And then the type and payload I just pass in, I'm setting a bunch of metadata about the action that is going across the wire. But importantly here, I send it on the data channel up to my remote, my server, and I dispatch it to my local store so that it is immediately applied locally. And I just let it go and wait for it to come back before I consider it validated or however you want to think about that. Lastly, I think before we jump over to state messages from the server are dispatched immediately. So here inside of the data channel open event, I wire up a message handler that is listening to both of those incoming messages on those data channels and dispatching them directly to my state. This is either messages from other peers or messages from the server that are acknowledgments of my own messages. So hopping over to state, here we are. So nine, this is a reducer that should be familiar to anyone who's worked with React or Redux. It is an emmer reducer, as I mentioned. So inside of this function, mutations to state are allowed. That's how you do it. And once the function exits, those changes are flushed to the state object. A really nice mechanism here. Mostly I'm just building up a kind of blob of clients and their connected pointers, which might be touches or mice or pens or whatever. But the last bit of code we'll look at an important part to this rollback netcode implementation is this reducer enhancer called withrollback. Sorry, Siri thought I was talking to her. It keeps track of settled actions and pending actions and has an internal kind of inside of the closure. It has a little reference to the settled state or the last known good state. This is just sort of like kept inside of the closure here. It returns its own kind of like wrapped reducer. So if the action source is a client, that means it's my local action. I'm going to add that action to pending actions. And then I'm going to apply it locally immediately. Pretty simple. This is a sort of a pass or reducer with a pending action queue that I'm like building up as I go. If the action source is the server, we're going to come back to this action type sync in a second. But if the action source is the server, I am going to put the action into this settled actions queue. I'm going to call it out here. I'm not 100% sure I'm actually neat. I need this, but it seemed like it might be useful for debugging. So I'm going to keep it around for a while. I am going to overwrite the local state with my last known good state. So again, this is inside of MR. Mutations are allowed. That's how you do it. But I am essentially going to replace the current state with my last known good state. And then I'm going to apply the settled action, the server action to that, to the last known good state. So now, now state is the new last known good. Then my action might be a resolved pending action. So I want to search my pending actions to find any action that is sort of like equal to this one. And if I do find that one, I want to remove it from the pending actions sort of queue or buffer. I'm going to stash settled state. Remember that is the last settled state plus this new action. So the way you do that in MR is with this current function, it essentially flushes any mutations that have been made. And I'm going to store that object as settled state. And then I'm going to loop through my pending actions and for each of those apply them to state. So this is like, I have a new last known good, and I'm going to reapply local changes that I've made on top of that new last known good. Now, in the case that the action type is sync, I am just going to replace state with the payload of that sync event. And then stash that is the last known good settled state and apply any new pending actions. There's never going to be a pending sync action because sync actions are only emitted from the server. The clients are always just emitting sort of input events, input actions. That is my quick tour of how that works. Again, there's I think pretty good notes in here about it. And I am open to questions. Before I go over time slightly, a look at the future. This is a lot, WebRTC is a lot to set up. And there is this new spec, this new API coming out called Web Transport. I think it's already supported in the latest version of Chrome. It allows the same kind of unreliable to reliable sort of like levers. And it has I think none of the peer connection establishment stuff. Essentially, it is a much nicer replacement for WebRTC data channels if you're not using the audio video portion of WebRTC. There are some cases where WebSockets still make sense. So it's not necessarily a replacement for that. Also deployment scaling and multi originality, these little infinity symbols mean that there's a link here. There's a really great talk from some folks at a company called Red5 about how they're doing WebRTC scaling on digital ocean. Red5 is about sort of broadcasting video and video streaming. But the infrastructure scaling relays, kind of all the problems that they're solving about forwarding GDP traffic through ingress servers would apply to a kind of multi regional or production grade RTC deployment implementation. And lastly, Dino deploy is a cloud CDN with Dino runtime that lets Dino run at edge on sort of like edge functions. And the interesting thing about it here is that Dino edge nodes can communicate with one another via an implementation of the broadcast channel API, which comes from the browser. It's how you would talk between browser tabs or between browser windows. However, Dino implements it so that you can talk between edge nodes. It seems like you could probably set something up where edge nodes are running WebRTC WebRTC servers with clients that are geographically proximate to that edge node, but then edge nodes might communicate with one another all sort of like within the network, which is theoretically over, you know, fiber optic cable and 60% of the speed of light or something like that. This is just something that would be like really interesting to explore I see what the performance characteristics are like and whether or not Dino deploy will sort of forward the UDP traffic, stuff like that. I'm not a DevOps guy myself, so if I start having to configure ingress servers, it's going to take me a little while, but that is where I am looking next. So that has been my talk. Thank you very much for your time. I'm just going to kind of run out the clock with pretty stuff.