 Alright, welcome to today's CNCF live webinar, Conflict-Free Replicated Data Types. I'm Liddy Schultz and I'll be moderating today's webinar. I'm going to read our code of conduct and then hand over to Jared Dillon, CTO at Mycelial, and James Moore, principal instructor at Mycelial. A few housekeeping items before we get started. During the webinar, you're not able to speak as an attendee, but there is a Q&A box on the right-hand side of your screen in the chat box. Please feel free to drop your questions there, and we'll get to as many as we can at the end. This is an official webinar of the CNCF, and as such, is subject to the CNCF Code of Conduct. Please do not add anything to the chat or questions that would be in violation of that code of conduct, and please be respectful of all of your fellow participants and presenters. Please also note that the recording slides will be posted later today to the CNCF online programs page at community.cncf.io under online programs. They will also be available via your registration link you used today, and the recording will be on our online programs YouTube playlist. With that, I will hand it over to Jared and James to get things going. Thanks, Libby. As Libby introduced, I'm Jared Dillon. I'm the CTO at Mycelial, and with me is James Moore as well, who's our principal instructor over at Mycelial. So just to walk through a little bit of what we're going to talk about today, we're going to start off with a history of what distributed systems look like and how they've applied to the cloud data landscape in over the past couple years. And then we're going to talk a little bit about consensus or values in distributed systems and why consensus on values is important in order to build reliable, robust, large-scale systems. From there, we're going to talk about the challenges of building consensus-based systems at global scale and talk a little bit about the use cases of why you might want to solve these particular sets of challenges. Last, we'll move over to James. And James will talk about what conflict-free replicated data types are, how to use them and how they're implemented, as well as examples of how they're actually being used out in libraries as well as ways to contribute and participate in this in the open-source community. So just to give a little background into distributed systems in the cloud-native environment, our goal in designing this and really a large goal of the cloud-native movement was to start to begin scaling and solving the problems of scale beyond single systems while ensuring some sort of asset compliance, making sure that all of our data is atomic, it's consistent, it's idempotent, and it's durable. We want to be able to write values and then read and know that we're reading a valid result back. And so that leads to some guarantees of data integrity. We can trust our systems. We know that our rights are serialized. We know that certain properties of our systems when we go to access and read that across multiple systems. But when we're talking about multiple systems, we also need to start dealing with the realities of the world. And so partition tolerance comes into play. What happens when one or many nodes starts to go offline? How do I make sure that system continues to function either by progression or by being able to at least still access and read values in the meantime? And we want to do this with some sort of reliable leader election or some sort of guaranteed consensus mechanism to make sure that we have the correct values. And so a lot of this talk is really about the purpose of values, the purpose of consensus and consistency in our systems. So just to recap a little bit and add some more, we're really looking for a guarantee of data integrity between different servers. And that's so that we can get some level of fault tolerance. We're looking at building highly available systems that can sustain a fault somewhere along the line that will ultimately get fixed later but doesn't compromise the working system in the meantime. So we're wanting to ensure continued progress servers are available. And typically in these systems, we can't make progress if a majority fails, but we always want to be able to return a correct result. And this is very critical, right? If we're building systems out on top of Kubernetes, we even in a read-only state, we need the state of that system to return a correct read, even while rights are unavailable to us. So an understandable way to go about this. There's been attempts at this starting with Paxos and other consensus algorithms. But in the past decade, a paper was written and a consensus algorithm called Raft came out was ultimately designed to run under understandability. And with the idea that sane or reliable robust implementations would come out of strongly understood systems. And so I encourage everyone to go read the Raft documentation if you haven't. It is a great example of how to build a consensus-based system that agrees on values while remaining strongly consistent. And there's no better example in the CNCF of Raft than at CD. It's a graduated CNCF project. It was originally developed at CoreOS to build out what's called Fleet. And that was the ability, it was a distributed system manager for deploying out in the early days of Docker, workloads on top of a CoreOS cluster. It was also used for other mechanisms inside of CoreOS and really was the beating heart of a cluster of operating systems. And so what is that CD? Well, at its core, it's a distributed, locking, strongly consistent key value store. I write a key and I expect to be able to read that result once that CD confirms that that key has been written. And so it's acid compliant and that's in that from that standpoint. So now after years of use, users of being scaled is the core data store for Kubernetes, Core DNS and a lot of other CNCF projects. It's based on Raft and it's a single writer with leadership election and multiple readers. So what does single writer mean in this case? Well, any node can accept a right. However, those rights get forwarded over to the leader if that current Raft node that's being accessed is a follower. And so you really only have one node that is responsible for the of data at any given time. So as systems have scaled, and we see this no better in Kubernetes and multi Kubernetes and Federation and the need for getting workloads closer to users. We start to see challenges in these very consistency oriented consistency bound environments where you have single writers where where latency is a is a is a significant issue. And so these sorts of stores these these consistency oriented stores really work best in very low latency environments. So a global scale of CD cluster for example, often suffers from heartbeat issues because as you move across the globe, latency of course increases you're not going to beat the speed of light. And you start to see partitioning problems and so what we've settled on is is generally scaling into the the region or the availability zone and having had a background in building out very very large Kubernetes clusters doing machine learning graphics programming or GPU based operations. Some of the worst failures that I've seen actually come from missed heartbeats with latency, because at CD failed to actually get into a proper state because of these sorts of latency issues. And so trying to scale that out to a global environment is incredibly difficult, as compared to trying to run multiple clusters and trying to get data between those. So strong consensus works best in very low latency environments. We also have a major bottleneck in this single writer environment, again, large scale, and having developed out kudo and other operator based systems. We're seeing with custom resources, much more use of the control a Kubernetes control plane, as a general purpose store. And so scaling at CD in this environment becomes very interesting because if you start to have more rights than your max throughput. You start to backup you start to have issues you start to have failures in that CD, because it can't scale multi this multi reader and single writer environment and it presents a lot of challenges for very high reason. With the answer being, well use a secondary store for other data or spin up more Kubernetes clusters and deal with more Federation. So, okay, as we're starting to build out as we're starting to think about these next systems. Well, event sourcing comes into play right and event sourcing is this idea where we have a centralized bus of events. And you have multiple readers coming off of this that that consume those events and perform either different actions or, or it has a decoupled sense from the whole where you interact with the entire system through events. The issue ultimately drives into the same kind of core problem. And the problems that don't go away because the models and switch in order for this to be robust, robust events really require strongly guaranteed ordering. And so clients receiving events at different times really can cause buffering and bottlenecks because you need to be able to piece that together in order to get to a consistent approach or consensus at the end of the day about about date your data. And so attempting a multi writer system with events really forces questions about consensus and getting those values back at the end. So, we're really looking at a new set of needs here. We're looking at what's called strong eventual consistency. Now, what is that we've, if you've heard of the term eventual consistent or if you haven't eventual consistency is a liveness guarantee where eventual consistency means that eventually all replicas will have the same information, but can return any result in the meantime, depending on what the state of the system is in. And, and so there's no guarantees of safety with with with that because even if it's received the information and eventually consistent system as it's been defined so far requires every single node in order to report the new value. Even if it's aware that new value it'll still work may report the old one, and this really gets them down into database semantics. This is opposed opposed to strong eventual consistency and if you're familiar strong consistency that means that that when it writes accepted the values the same everywhere now strong strong eventual consistency sounds like an oxymoron. But what strong eventual consistency does and we'll talk about this more as we begin to see into what CRD T's are and what how they operate strong eventual consistency as the safety guarantee on top. That is to say, every node that's received the update is able to report the new event and so it's eventually consistent, but it's also correct with the information that it currently has, and we'll see some interesting use cases for that. So data is converging to the same value across all replicas, but in the meantime, every replica that has the data is strongly consistent. And so connectivity is not guaranteed low latency is not guaranteed and and ultimately ordering here is not important we'll talk a little bit about that a little bit more. So, let's look at that some some use cases for these needs right and a big one is globally distributed databases now we have this in some forms now we have either sharded databases with a shard key. And that's that's a that's one form of scale or you ultimately have some sort of primary region that's responsible for accepting rights and distributing reads out globally. So, neither of these really fit the bill if you think about systems like Cassandra things things that that shard based on these shard keys. You're really managing scale on a different dimension than geographic, you're managing scale on a dimension of your data cardinality. And so, if you have all of your replicas of a certain shard go down. Now you have a partial outage for that type of data. But that doesn't say anything about your geographic distribution right mean it means nothing about having a multi writer system. That is global scale. It's really talking about multi writer systems at that shard or at that key shard key cardinality. So, if we wanted to go out and build and there's a couple examples of this out there in the wild now, building a system that's multi writer at the regional or cluster cluster level. Where these regional partitions don't cause systemic failures. And we're doing we're operating on a single database when we're doing this and this is the difference between that it like between this sort of globally distributed database like a spanner and other types of databases that that choose other strategies for this. Our goal here is to achieve modern tenacity and so what that means is is the ordering of events is not important we all eventually converge on the same document. No matter the order of events that come in. Another use case here is going to be building out local first applications. And so really what a local first application is is the extension of this user data out to the client right they're able to operate completely offline and then synchronize data back with your cloud when you're offline. And so, building out this sort of at this idea of edge native enables cloud native use cases and very adverse or low bandwidth environments where you can come back offline online and merger data with the whole with guarantees that it that that your system is always going to progress you're not going to deal with rollbacks. And just to move back actually one very important thing about this order independence is that prior systems in this class forced you to deal with and stop the system. If there were conflicts right and so one of the goals we want to achieve here is that this is a rollback free system. That's not to say that there won't be conflicts but everything converges to the same document without user intervention, no matter what and so we can we can make this guarantee without stopping the system, the system is always progressing without a rollback. And so the last use case I want to talk about here is is building a collaborative applications, the collaborative collaboration multipliers becoming a feature of many applications and arguably a competitive advantage and so you're seeing out a large swath of applications that are multiplayer first if you've ever used Google Docs, I'm sure most of not all of us do. Being able to collaborate in real time with others is a feature of work in our current, current age and and a feature of most of these applications. And so applications built for a cloud native environment will be able to handle collaboration or will need to handle collaboration. And this is on top of should be in local first right if this is all being forwarded to a central note. Some of your users are going to have high latency others low latency, and you haven't really solved the problem. And so it almost demands a multi writer application in order for you to be able to, to achieve this strong consistency between every single user, get to consistent the same value and enable all collaborators to be writing at the same time, as opposed to doing something like subversion where we lock the documents and makes their rights and that's not how these these these document based systems are working right. And very importantly, machines need to be able to collaborate not just humans and there's a lot of use cases that are enabled. And once these machines are able to collaborate on on values, rather than falling back to more traditional operations. So we're really talking about bringing cloud data to the edge here right and so we need this all to be observable we need to be traceable we needed to be operable and cloud data is great for the cloud but new solutions are needed for what we're referring to as edge native environments. And what's nice is that there are solutions to these problems. So with that I'll turn it over to my colleague James more who will kind of dive into what CRD T's are, how they how they work, and hopefully gives everyone some ideas of how they can use them in their projects. Thanks Jared. Alright, so as Jared said, over the next few minutes I'm going to provide an introduction to CRD T's. So, what exactly are CRD T's. Well, there are a collection of data types, similar to the data types you're likely familiar with. So for example, there are arrays, there are maps, there are text types, there are counters, there are registers which allow you to wrap other types, sets and among other options Now, and these data types work very similar to their conventional counterparts, but there's something special and interesting about CRD T's, which is the fact that they're effectively shared data types. Now we'll talk more about this concept of shared data types in a moment, but first I want to address how you do CRD T's in an application. At a high level, using CRD T's isn't all that different from other data types that you're likely familiar with. So for example, if you were going to write a to-do app, your application's data model might look something like this, where you're storing to-dos in an array, and then you'd add to-do items like this, where each to-do item is a record with meaningful properties. Now, composing conventional data types like this works well when you're writing an application meant for a single person, but what if we want to make this to-do app a collaborative application? Where multiple people can add, edit and remove to-dos. Okay, some of you might be thinking, well, writing a multi-user app like this isn't all that hard, right? Well, when I say I want this to-do app to be collaborative, I don't just mean it's a multi-user application, I mean something different, something deeper. Let me explain what I mean with a couple of examples. So imagine the person on the left side of the screen decides to edit the by-milk task to make it more descriptive. And then at the same time, the person on the right side makes a different edit to the same milk task. So what should happen in this scenario? Well, in a truly collaborative application, both of these concurrent edits would be merged together like this. Now, what you just saw here in this simple example should give you a sense of what I mean by collaboration in a deeper way, not just a multi-user way. Or here's a similar but slightly different scenario. What if both of these people are offline? And the person on the left decides to delete the model on task. And the person on the right decides to add a new to-do item. And again, remember both of these edits are happening offline. So first of all, writing applications that work both offline and online is not an easy task in and of itself, but CRDTs make it easier. And secondly, what should happen in this app when both of these users get back online? Well, ideally, the model on task should be deleted on the right, and the new clean the garage task should get replicated to the left. Okay, so to achieve this kind of collaboration, the kind of collaboration I just alluded to, this conventional data model isn't going to help us. However, with a few small changes to the data types in our model, we can make this app more collaborative. So what changes to this conventional data model would we need to make? Well, we could selectively use CRDTs. So for example, instead of storing to-dos in a normal array, you could use a CRDT array. And then instead of using strings for the title, we could use a CRDT text data type. Now using these new data types isn't all that different than using the conventional equivalents. In fact, the APIs for most CRDT libraries are very similar to their conventional counterparts. But you're probably wondering, what's the benefit of using these CRDTs? Well, this is where I want to turn back to the description I used earlier for CRDTs when I called them shared data types. What exactly do I mean by shared data types? Well, at a high level, it means you can use these data types on separate computers, and you can make local changes to the data without any locks, and the changes that are made can be shared or replicated to the other computer and merged without conflict. Okay, I want to highlight a few subtleties in what I just described. First of all, the data replication can happen in near real time as long as the two computers are connected, or the replication can happen later in an asynchronous fashion. In other words, these two computers could be completely disconnected, and the application can still work in a local first or offline mode. And then at a later point in time, maybe minutes or hours or even days later, the changes can be replicated and merged between these two computers, effectively synchronizing their state. Now, this example only shows two computers that are synchronizing their state, but there could be many other computers involved. And I want you to notice that I never mentioned anything about servers. I mean, you could and likely would use servers in many scenarios, but it's not a requirement of CRDTs. In fact, I think it's more helpful to think of CRDTs as a peer-to-peer technology, but you can and should use servers where it makes sense. Okay, if you're new to CRDTs, you're probably wondering, well, how do they work? I mean, what's the secret sauce? Well, you have to create these data structures in a special way following certain rules, which we'll talk more about in a moment. And it's also important to note that CRDTs store your application data, but they also store metadata. Okay, so what's with this metadata? Well, hold that thought, because I'm going to look at this metadata more in just a moment here. Now, to fully understand how CRDTs work, you need to understand a bit of order theory, and in particular, you need to understand joint semilatices. So are you ready to do a deep dive into some order theory? Okay, well, before you stop the video, I'm just kidding. I'm not going to go into a deep dive into order theory as fun as that might sound. We're just not going to do that. But I mentioned order theory for a reason. The reason I mention this is because it's important to understand that there are mathematical proofs behind CRDTs, and we should draw confidence in the technology because of its underlying mathematical principles. But in practice, most developers don't need to understand the math behind CRDTs. You just need to understand how to interact with CRDTs. In other words, you need to understand the APIs. Now, as I said a moment ago, most developers don't need to know much about order theory unless you're a developer who's writing CRDT libraries, in which case you're going to have to know the order theory, but most of us developers don't need this level of understanding. Okay, so you don't need to know all the mathematical proofs behind CRDTs, but if you're new to CRDTs, you're probably going to know how they work. And I like to give you a sense of how they work by looking at one of the simpler CRDT data types, a counter data type. So, imagine we need to count something, an application that counts something. So, for example, maybe our application is running on some sort of scanners that collaboratively count things that pass them by on a conveyor belt, or maybe the application is meant to count people entering a venue on smartphones at all the entrances. And it's possible that these smartphones need to work both offline or online because of the operating location. Now, there's a subtlety in these two examples that I want to draw your attention to. When we think about collaborative applications, I suspect that most of us are thinking about people collaborating on a common task, but people aren't the only things that can collaborate. For example, we see on the left side of the screen, it represents collaboration between machines, and the example on the right side of the screen represents collaboration between people. I think a better general term to use instead of machines on the left side and people on the right side would be actors. In other words, actors can collaborate on a common task, and actors could be machines or people or both. And the reason I'm making this point is because we live in a world where smart computing devices are proliferating. They're all around us. And I think we need new programming models that better support the reality of the hardware situation, and the actor model is probably the best programming model for dealing with the proliferation of hardware devices, and CRD keys fit nicely into the actor model. Okay, let's look at implementing this collaborative counter. Now, first, I want to see if we can implement this collaborative counter by using a primitive type, an integer in the application's data model. But here's a spoiler alert for you. Integers won't work for us, and you'll see why in just a moment here. So let's say this device on the left counts the first concert goer, which increments its local count from zero to one. Then we could somehow replicate this one on the left to the other devices on the right. And when we do that, excuse me, things seem to be working so far. Next, the middle device increments the count from one to two, but then at the same moment in time, the device on the right increments its count from one to two as well. So to be clear, what I'm saying is both the devices on the right side concurrently updated their local counts. Next, the updated count is replicated from the middle device to the left device, and the one is changed to two. Then the update count is replicated from the right device to the left device, and the two is replaced by two. Okay, that's not right. I mean, we know the total count should be three at this point, right? Okay, this example is just one reason why primitive types in a collaborative application like this just won't work. However, if we swap out our integer-based counter with a CRDT counter, we won't experience the problem you just saw. And to increment the counters, all you need to do is call increment. And basically, this is all you need to know as a developer to interact with a CRDT counter. And again, by swapping out the integer for a CRDT counter, we can have a collaboratively maintained count that works even if the device is offline. But even though you don't technically need to know how a CRDT counter works, you're probably still curious how they work. So let's look at how the CRDT counter could be implemented. So as a learning exercise, we'll talk through the implementation of a CRDT counter. Now, a counter obviously represents a number, but that doesn't mean the underlying data model has to be a number. Here's what I mean. What if the model for our counter was composed of two things, a unique ID for each replica and a map? Then when the count is incremented, a key value pair is added to our map where the key is the unique replica ID and the value is the incremented counter for this replica. Now to get the actual count value, we just need to sum the values in the count map, which at this point is just one. Then to replicate the update, we just send the entire map to the other nodes and merge the map in a special way. And at this point, we've achieved consistency across the nodes. In other words, every node or smartphone in this case has the same count value. Next, let's walk through the problematic scenario we saw a moment ago with concurrent updates. So both of these counters on the right get incremented at about the same time. Then each node adds a new key value pair to their count maps where the key is the unique replica ID and the value is the count for the corresponding replica. So the first time increment is called, the value will be one. Then to calculate the current counter value, we simply sum the map's values, which totals two for now. Next, the updated values are sent to the other replicates and merged into their local maps. And as you can see, the count value for each node is now consistent at three. Now, any subsequent increments will simply increment the appropriate partial count value in the map. So incrementing on this device will increment this key value pair and incrementing on this device will increment this key value pair. And lastly, incrementing on this device will increment this key value pair. Okay, there's a few more details on how CRDT counter to work, but hopefully this gives the small exercise will give you a sense of how they're implemented. You remember how a few minutes ago I mentioned that CRDTs contain your application data and also metadata? Well, now you see what I mean by metadata. The counter needs to store extra data in the form of the replica ID and the map and the value of the counter is derived from the metadata. But again, I'll ask, what's the point of the metadata? I mean, why is it needed? Now, you may intuitively know what the metadata is for, but I want to get specific. The metadata is needed so that the data types have all the information they need so that changes can be replicated and merged among the peers in a conflict-free way. So in summary, the metadata is all about allowing the updates to be replicated and merged in a conflict-free way. Excuse me. Now, a consequence of having to store this metadata is the fact that CRDTs require more memory than their conventional counterparts. So there are trade-offs when using CRDTs, and additional memory usage is one example of those trade-offs. And keep in mind, some data types use more metadata than others. Another trade-off is the consistency model with CRDTs. In particular, there's a period of time when replicas can have different values. For example, when we incremented one of the counters a moment ago, the other counters had different values until the updates arrived at the other nodes. So CRDTs offer high availability and strong eventual consistency, the key word there being eventual consistency. Now, one of the next questions you probably have is, how do the peers in a CRDT-based system communicate? In other words, what sort of networking protocols are used? Well, CRDTs are somewhat network agnostic, meaning you have many choices and you get to decide how to connect the peers. So you could use WebSockets and PubSub to propagate updates, or you could use WebRTC, or you could use some sort of mesh networking, or some other gossip protocol. The next question you may have is, how do you ensure updates get propagated to the peers as appropriate? Well, there's some interesting inherent properties of CRDTs that make updating peers a bit easier and more forgiving. Let me explain. There are three important update-related attributes of all CRDTs that go back to the order theory and in particular, joints and myelatases, which CRDTs are based on. First, updates are idempotent, meaning an update from one peer to another can be applied multiple times and the result is the same effect. Let me demonstrate what I mean by idempotent. So let's say one of these counters gets incremented, updating its local count to one. Next, the updates are sent to the other devices. Now let's say the left device, maybe it got temporarily disconnected and it's not sure if the other device is received its latest update. So the device on the left goes ahead and sends an update again and the same update can get sent again and again multiple times and the result of merging the update multiple times gives you the same result. It's an idempotent operation. The next important attribute of all CRDTs is that they're commutative. Commutativity means changing the order of operands doesn't change the result. For example, commutative means operating on X and Y gives you the same result as operating on Y and then X. In other words, the ordering doesn't matter. The third important attribute of all CRDTs is that they are associative. Associativity means rearranging the parentheses and an expression will not change the result. So the result of this set of operations is the same as this set of operations. In other words, the order doesn't matter. Okay, so what do these last two attributes give us? Well, if you think of X, Y, and Z as replica updates and you think of these operators as data merges, you realize that updates from one peer to another don't have to be propagated in any particular order. As long as a peer eventually receives all the updates in any particular sequence, the peer's state will become consistent with the other peers. Okay, given these properties of CRDTs, you can see that propagating updates between peers is easier and more forgiving than you might imagine because updates can be passed to peers multiple times without any adverse consequences and the updates can be passed in any particular sequence and eventually each peer will converge on the same value. So where do you go from here? If you want to look at CRDTs and try them out. Well, CRDTs are somewhat new, so you're not going to find lots of libraries yet, but there is a lot of work happening in this space, including the work we're doing at Mycelium. The best CRDT libraries at this point are JavaScript libraries. You have YGS and you have Auto Merge, both of which have been worked on for several years at this point. There's also a few up-and-coming libraries written in Rust, which are of particular interest to us and some other languages as well. So in summary, with CRDTs, you get shared data types that allow you to build many collaborative applications, and it's a peer-to-peer technology where you can optionally use servers if it makes sense. And in a way, you get the best of both worlds in terms of online and offline capabilities. When online, you get the near real-time capabilities, but they also support offline capabilities that can later synchronize the application state with other peers. And with that, I'm going to turn it back over to Jared. Thanks, James. Yeah, so anyone who wants to come work on these is interested in advancing the state of the art. At Mycelium, we're working on open-sourcing and a whole bunch of tooling related to this and being able to embed it more into the cloud-native and edge-saving environments. If you're interested in discussing Mycelium, sorry, CRDTs, we have a Discord where we talk about local-first applications and CRDTs all the time, and we have more news for that. So happy to take questions and have some discussion around CRDTs. And we have a question already. What are the implications of long-duration disconnections on CRDTs, like, for example, days? That's a great question. So CRDTs are intended to solve the question of merging towards the same document. And so at the end of the day, once all changes merge, they are, every party has the exact same result. And so this carries through, actually, before I talk about carrying through, this can be dependent on the type of CRDT you're using because we're really talking about embedding a lot of this, the conflict resolution in the type itself. And so let's say that you have a key value, you have a map or a dictionary, where it's a shared key map or key value store or dictionary or map, and one of those values is a string, right? You're just using a string, you're not using a text CRDT, for example, and that gets changed. Now, the rule might be a last right wins register is what we would call that in that sense. And so given that version can be part of that metadata, if two parties edit the same key, you edited it 15 days ago, it was offline until now, there's a new value. A last right wins register will merge that new value. That said, there's other types of CRDTs that will try to better combine those two pieces of information. And this is where we're also looking at things like tools like WebAssembly at the point of origination or at the point of conflict in order to create more embedded logical decisions around that sort of data. Now, let's go to the example of you're adding a key to a shared map. Let's say you add that, there's no other conflicting party in the meantime. Now that that's going to, you hop back online, no matter the order of those updates and how old those are, everyone's going to have that new value in the key with typical, you can call them observe, remove like sets, right? You look at a map of this case as a set. And so in that case, the long-term disconnection is not such a big issue. Now, one interesting property that you can have there is CRDTs can be made up of other CRDTs as well as James was showing. And so let's say you add a key that's a list, right? I add a key that says to-dos, and that's a list of to-dos. And someone else has done that as well. And both of us are just operating on that list. We can expect those lists to converge at the end of the day. Now, we're probably giving those random IDs. And so there may be cases depending on how we detect duplication that you may end up with duplicates. And so you kind of get into the stories of your data problem of CRDTs. And one thing that I really like about this sort of data structure is it does force you to talk about, well, what does my data actually mean? What does it actually do in order to decide the guarantees that I want? Hopefully that answers your question. But the most important thing there, by the way, just to wrap that up as well, and awesome. I'm glad they answered your question. The most important part of that is that no matter what the ordering is, once all of those events are received, we've all converged on the same document. And so your long-term disconnection does not matter from the perspective of us having the same view of the world. We have achieved consensus in a strong way by the end. And you're not in a state that you would be like, for example, get where now you have just this giant merge conflict to work through. There's that possibility, because it's monoisotonic, has been eliminated from one of the problems that you need to solve. We're not talking about whether it's correct at the end of the day, but what do you think the largest barriers for adoption are for CRDTs at this point? That's a fantastic question. I'll give some of my views, and I'd love for James for you to give some of your views. My biggest one, I think, is the accessibility and proliferation of libraries out there that are usable and just available for people to use. And I think it gets caught up into these peer-to-peer applications and building one sort of application, whereas we are seeing examples of CRDTs being used for building global-skilled databases, they're just either not open source or not being heavily used and promoted at this point. There's also the issue of there is an overhead to CRDTs. And so latest research has dropped that right amplification down quite a bit, and there's a lot of work going into performance as well, but these are larger than scalar values. Now there's some advantages to that as well, because now we can get to full causal views of our data, we can go back in history, we can kind of understand things, and so there's actually benefits to that amplification. But a lot of our systems, a lot of the ways we look and design things is not built to really support more than just these causal, I'm sorry, more than just these scalar values, right, or scalar values. Now the other thing I would add to that is not every application is appropriate for CRDTs, right? And so scalar values are totally okay to use, but I think there's more applications out there where you might not want just a scalar value, but the entire history. And of course, once you get into that, there's other interesting things around transactions and compression and other hard problems to solve. And so one of the things we're doing here at Mycelial is solving those hard problems and making these industrializable and usable. James, what do you think on that question? Yeah, I think I agree with what you said there. Another barrier is it's just a very big sort of paradigm shift. So I think in a lot of ways, and it's a very new technology, not particularly mature, but it's moving fast. And so I think there are, we're gonna have to think differently about how to build apps in a lot of ways. And I think that being a somewhat new technology, there's a lot of maturing and patterns that are sort of need to get figured out within the community on how to use these, but and also really getting back to libraries is a big one as well. But there is a surprising amount of effort in various libraries that we see throughout the open source community. So we're excited about that. Yeah, I'll just read this out. It seems like real-world apps using CRDTs need both CRDTs and traditional storage mechanisms. It seems like you might be confusing to decide how, which, where, how are they applied. Yeah, that's some of the work that we're doing is to, if you think about how we've learned patterns around designing very data intensive applications. Yes, that's absolutely true, right? There's very few resources right now. There's very little practice right now. And so enabling people to have more platforms to test that, to determine a practice to kind of figure out that. I think like, in the first question I said, the story's a data. I think that rings really true, right? Because if you look at traditional storage mechanisms, they're designed very generally. They accept a scalar value, a scalar value. Here's Postgres, very advanced database. Here's a JSON, here's your even JSON data type, right? But you also have your Varkar, you have your number types you have. You can fit values, you can fit a lot of business use cases into just a bunch of values and not necessarily have to think about it. Whereas I think looking at all of the advantages of CRDTs, forces you into the conversation a lot sooner of, okay, what are the properties of this and how I want to choose based on that. And then of course, that does need to go to a backing store. Now there is some work on building out actual databases that are storing CRDTs, much like an event sourcing database would as well. And we're talking to some of the people who are doing that, because ultimately a CRDT under the hood when you're replicating it, it's just a very opinionated shaped event. And so you can go store that, create a materialized view on it. But of course that is a lot more specialized than say just putting something into Redis, for example, right? Where you just care about the value and nothing else. And so, yeah, I think that's absolutely true on that confusion and sure, and still be very oriented around use cases. Now that said, I think oriented data towards your use case is a lot more powerful than just having random data that is happened to just be cardinal in some way or have some cardinality. And those restrictions do add some value and do add some powers. Any more questions for us? Yeah, I like that comment. And thank you. The stories of data I think is really important. Real implications of that distribution. And I do agree with the comment that machines need to collaborate not just humans. And the reason I said that and the reason we bring it up so much is because you look at moving compute more and more to the local environment and you look at clusters of machines that are actually performing tasks. And so, it's not, I think that our CP systems are very primitive forms of collaboration. We agree on the same value, right? But that doesn't necessarily allow two actors to go perform local. And so I think the implications once the machines can operate in a local first way while still collaborating is going to be huge, especially for physical industries where you have multiple machines working on similar tasks. All right. Do we have any more questions? Here we go. You mentioned that there are databases that internally you see or you see is any that you can recommend. On the open source front, there's some interesting movement on that. So, gun, gun VB is one. So gun dot eco is, is a open source attempt at, at Firebase. There is also, is it sync store? Hang on. Let me, let me look. Redwood. Redwood is another attempt at this. It's still very early. There's a, there's a few SAS options, but really what they're using them for this is the distribution of data. I think gun and redwood. It's redwood slash redwood. There's also the redwood JavaScript framework. That's not what I'm talking about. Is storing a state tree among, among many, many peers and is using CRDTs to do so. I think there's a good example to of an IPFS based database that is storing as CRDTs, but we definitely need more examples of this. And, and, you know, we're not really focused on the database side of the problem, but I think a good collaboration between the applications and databases are going to be more and more important. And I'll post those links in the chat as well. Okay. SAS solutions as far as, I'm sorry, sorry, sorry, Libby. SAS solutions as far as storage. I'm not aware as any that you CRDTs for, for storage. Maybe someone will chime in on Twitter or discord later and show one, but the only things that I'm seeing on the SAS front is a CRDTs being used for multi-writer replication between geographic regions. I think that's what that were about time. Any other questions? We're welcome to field in our, in our discord or Twitter or anywhere else, please. We love the collaboration. Sure. All right, everybody. Thank you so much. Going, going one more time. Any more questions? All right. Well, Jared and James, thank you both so much. Thank you everyone for joining us for another live webinar. Look for everything online later this afternoon, and we'll see you again next week. Thanks everybody so much. Thank you. Thank you.