 Hello everyone and welcome to the 10 a.m. session in the developer and open source track. As a reminder to our in-world and web audience, you can view the full conference schedule at conference.opensimulator.org and tweet your questions or comments to atopensimcc with the hashtag OSCC 14. This hour we are happy to introduce a terrific session called scaling open simulator inventory using no SQL. Our speaker today is Tranquility Dessler. David Dashler is an object-orientated software developer architect with a wide range of experience in software and hardware solutions and is currently a partner in software architect for in-worlds LLC. Welcome all. Let's begin the session. Hello everybody. Thanks for stopping by and checking this out. Yeah, so as mentioned, we're going to be talking about no SQL and specifically Apache Cassandra and we're going to talk about how we can use that to make sure that your inventory services stay up and running during a variety of bad situations that would normally bring down server. So who the heck am I and why should you listen? Again, my name is David Dashler. We've designed a lot of systems on our end to try to mitigate scaling issues that we've run into as our grid has grown. Everything from an LSL compiler and a virtual machine, physics integration, also designed a scale-out asset service that's not running over 11 servers and holds about 10 terabytes of data. And our inventory system, which is also running on top of Apache Cassandra, running on eight nodes, holding about 250 gigabytes of inventory data. So we're holding a lot of stuff and so we've run into a lot of problems that I'm hoping that I can assist you to avoid by going straight to these systems that we ended up having to go to anyways. So we routinely handle over 300 concurrent users on the grid. We peek out just shy of 500 concurrent users without experiencing back-end faults or load issues, which we obviously have monitoring and everything on. So I get the awesome 2AM call if anything goes wrong. So most of the implementation stuff that I've done is to make it so I can get sleep. So trust me when I tell you that this stuff, a lot of it is working very well for us. So we've experienced and conquered more than a few scaling problems and I really just want to share some of our experiences with you guys so that you have a chance to sort of just skip over some of the bad times that we've experienced and go straight to running a grid the way that you want to run it. So why do we need to worry about inventory? Well, you're running an open simulator grid, everything's going great. But then you get 150, 200 people online. They all want to have 500,000 items. They want to make sure that they can carry 10 copies of primitive, you know, times 100, times 1,000 as they go along. People don't really clean out their inventories. They say they do, but most of the time there's a lot of inventory cruft and items everywhere. So you're going to have to deal with the fact that inventory is basically a system that's going to keep on growing without bounds and that you need to be able to deal with that. So you may start to see people complaining that they can't log in because their inventory gets too large. You may start seeing MISQL timeouts where they can't open a folder, things like that. There's just all sorts of stuff and then of course their pony avatar may go missing and we all know that that's a huge problem. So we want to make sure the pony avatars stay in the inventory where they belong and are accessible at all times. So that's why we're looking at systems that go a little bit beyond the scalability of MISQL and that's why I'm here. So federation. Hypergrid does help you a bit because normally the people that are going to be visiting your grid are going to come from a variety of other grids where the inventory back end services are going to, you know, be powered by those other grids that they're coming from, but it does not help when your grid gets huge. So if you're the one that's powering a ton of inventories, you have a lot of users coming on and then they're going to other grids, your inventory services are still going to need to service them. So trying to predict growth and loss and sharding your grid off into your grid one and your grid two and your grid three, four, five to start balancing that inventory back end load, it's just not going to allow you to move very fast. It's going to be difficult and you're going to basically scale out that way. You're going to have to keep duplicating what you've already done, putting up more and more duplicated servers and more and more duplicated zones. And so I think that no SQL in these situations with its ability to scale out so well is a great way to, you know, ease your administration and make it so that you have a good solution that starts and can continue into the future. We're going to talk about shard keys a little bit, but basically when you need to split things between multiple servers, keys have to be chosen that split up the data evenly between all the servers that you have. So a lot of times you can split inventory data. You could shard that between four MySQL servers, depending on the user's UUID, for example. So the first quarter of the 120th bit number goes here. Second quarter goes here. Third quarter goes here. Fourth quarter goes here. But then what happens when you need to add another server, right? Well, you need to manually shard your MySQL solution again, and that's just kind of a pain in the butt. So luckily, systems like Apache Cassandra sort of help you out with this. They do the sharding automatically. So what about MySQL reed slaves and MySQL scale out? Well, so MySQL does support reed slaves out of the box. So you can set up MySQL master and all your rights are going to that master and then all those rights get replicated to slaves and you can use those slaves for reeds. So your slaves are just become reed slaves, basically all your reeds go to those and the masters use specifically for rights. Well, what's the problem with this? Where does it start to run into an issue? So the issue is that the rights become the bottleneck for your master or MySQL server, MySQL server at which point all the slaves have to be scaled up with better hardware. You have to remember if the master can't take the rights, neither can the slaves if they're the same hardware. So now you have to scale up the entire cluster and you can only do that so far. Scale up doesn't buy you forever. It only gets you so far and that's what companies like Facebook, Google, Amazon realize that the solution is to scale out to many, many servers to handle that load. And to do that the right way we want to use good protocols, good algorithms, distributed systems are complicated. So what is there out there? So once we hit those bottlenecks on our MySQL cluster and you're starting to see those rights that are failing and starting to see lots of latencies, one solution is Apache Cassandra. And so Apache Cassandra is this really, really cool solution that's based on Dynamo, which was designed by Amazon. So we'll go through that. But Cassandra is in use by constant contacts, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and 1,500 other companies that have active large data sets. That's right off their website. That basically means this thing is proven and people are using this at massive scales that you and I probably have never seen. Maybe some of you guys out there, I've seen it, but these are huge companies that are taking a lot of rights and a lot of reads. It's a distributed, scale out, fault tolerant database with tunable consistency. What does that mean? So distributed means that your data is going to get distributed amongst all the nodes that you put up, it scales out. So again, the note you can increase your throughput by adding more servers. Fault tolerant means you can lose an entire server. We're not just talking about a raid failure. We're talking an entire server, let's say somebody in your data center is wrestling with somebody else and they bump into one of your servers and it falls off the rack and explodes, Cassandra doesn't care. So the benefits of this, again, your data is replicated on the multiple servers. You can span different data centers. You can lose one or more servers in a cluster depending on your replication factor and you will have zero downtime and no data loss. That's pretty cool. You can also see the load on, if you see the load on your cluster increasing, it's really easy to add servers to handle it. So you don't have to necessarily add multiple servers. Let's say you see a hotspot and you want to add a single server to take care of it. You can go ahead and do that. Yes, Cassandra is like the honey badger. So now I hear you saying some of you, but wait, Cassandra is eventually consistent. What about acid? And I'm not talking about the drug. I'm talking about acid as far as it comes to databases where once you write data, you get a nice reliable and consistent view of the data. And you can basically guarantee that what you've written is readable right away. Well, I'd hate to break it to you. I really don't, but a traditional RDBMS scale-out solution with a single master and one or more slaves is also eventually consistent. What do I mean by that? I'm not lying, I'm serious. So in a MySQL setup, you can take a look at what's called slave lag. And what this is is when you have multiple servers that are handling reads from a MySQL MySQL master, you have lag time between the time that your writes hit your master server and then appear on the read slaves. This lag time is variable. It depends on the load and a lot of other things. And some people will tell you that slave lag is a real pain in the butt once it starts going up. So somebody may write an item to their inventory and then try to res it right away. And guess what? If you're not aware of how slave lag works, that read is going to fail and they will not be able to res it maybe for a second or two. And they're going to get confused. So that's pretty much why if you're going to start scaling out to multiple servers, it makes sense to look at something that was designed 100% from the start to handle scale-out and distribution. So Cassandra has tunable consistency. It can offer better guarantees to get a consistent read than a traditional scale-out RDBMS. We can tell Cassandra to write to a set of nodes and not return until a quorum of them, like basically a majority of them, have responded that they have written the new value. We can also, again, read at quorum consistency. And we are guaranteed to see the most up-to-date value. No slave lag. How cool is that? So we totally bypassed the issue of slave lag. And we're using a distributed system that was 100% designed from the beginning to handle these problems. So a little bit of a background. Again, I mentioned Dynamo before. Cassandra is based on Dynamo. So Dynamo was invented by Amazon in 2007. It's a solution to provide a highly available distributed data store. So Amazon, we know, they're really huge. And so they go down for a minute and they lose like half a million dollars or a million dollars or probably more. I'm probably underestimating. It's probably like 10, 20 million dollars. So this is the type of activity that these systems were written to take into account. So when we're talking about using it for inventory, unless your grid is twice the size of SL, you're never even going to hit the amount of usage that people like Amazon, people like Google, people like Facebook have used these type of solutions for. So the Dynamo paper has a few important implementation details Cassandra borrows from. Data is automatically sharded based on a consistent hash of the primary key. And it's replicated to n hosts and a hashering, where n is configurable. So let's say you want to make sure your data is always on at least three servers at a time. You can do that by setting your replication factor to three. And then it makes sure that your data is written to three separate servers. And you can even do it in different data centers if you're concerned about Meteor Strike or something on your current data center or flood and people boating through your servers. This will handle it for you. It can automatically replicate to other data centers. It also borrowed from Dynamo hinted handoff, which helps bring the data set into convergence during temporary failures. So that means that a node goes down temporarily, like let's say you take it down because you need to do an operating system upgrade. Now, remember, you stay out a whole time and then when the node comes back up, it gets these hints from the other nodes that tell it, hey, you missed these rights, here they are, and it comes back to consistency rather quickly. It also gives you the ability to add and remove storage nodes without interruption of service. This is so cool. So again, operating system upgrades, any hardware failures, or even you just need to add more nodes because your load has gone up. This is your, this gives you 100% control without having to worry about downtime. So cool stuff. So just to explain consistent hashing again, I'm not gonna dwell on it, but here it is. You have four nodes, A, B, C, and D. Each of the nodes, A, B, C, and D, if you had a replication factor of one, let's say you only wrote the data to a single node. Each of these nodes would then have 25% of your data. So 25, 50, 75, 100, easy math there. And that's how Cassandra is going to automatically shard your data. It's gonna make sure that these nodes are evenly loaded and that's what consistent hashing, and this is called a hash ring, and that's what it's all about. Again, this is one of the things that Cassandra borrowed from the Dynamo paper, which is pretty cool. So we talked about Dynamo, we talked about consistency, eventual consistency. I'm not gonna get into cap theorem, but that's pretty cool to you guys should definitely take a look into distributed systems. They're awesome. So we're gonna talk about using quorum reads and writes to achieve consistency and partition tolerance. Now, don't get too worried about this. I'm telling you about all the internals about how it works. So you kind of get an idea of the codes like way simpler than how the system actually works because there's a lot to swallow here, but once you start to see the code snippets, you're gonna understand. So don't worry if you're kind of lost in this stuff, it's definitely under, it's something that you want to understand around a Cassandra cluster, but it's not required for you to be able to develop against it. So when you read or write to and from a quorum of nodes, you get a consistent view of the data and you'll be able to tolerate a node or network outage. Okay, so an example is a quorum of two out of three nodes that form a majority. So let's say we write the message A and C to hello to nodes A and C, and then node A dies. So one of the nodes that we wrote hello to is now gone, but Cassandra doesn't care because we're gonna read at quorum and we still can read hello from node C and stay running. Now, B is not gonna have that right yet there because we only succeeded to get it to A and C, but that doesn't matter because we still have a source of truth at C and you're gonna get that value back. So this is sort of a very simplified view on how it all works and how it can stay up and running. And so there you go. So consistency is pretty cool. We can do tunable consistency with Cassandra too. If it's not super important that you are able to write and then read the same value right away, you can even write to a single node. And your right will come back after a single node, but then that node will in the background replicate that data to all the other nodes that are responsible for that data. So simple Cassandra set up with Docker. So we're gonna go and take a look at how we'd actually code against this. I have a question. If A dies, how do you know that C is the truth or B since it's a 50-50 belief of truth? Because we know that we wrote two nodes to begin with so we know at least one of them has the actual value that we're looking for. So it's not like Cassandra also has timestamps that it uses. Like if you wrote a second value overwriting that row, Cassandra would use timestamps to figure out who wins. So like if I write to AB and then AC again and A dies and it reads B and C, then it's gonna get the newest value from B. So as long as we're writing to a quorum and reading from a quorum, we're always gonna get truth, even if one of the node goes down. And you can stay running like that too in a degraded form. You are degraded a little bit because obviously there's less nodes taking the reads and writes, but then as soon as you put another node up, it's actually gonna either bootstrap itself or fix itself to be consistent with the nodes that were up at the time when that node went down. So it fixes itself and it's all pretty cool. Like I can sit here and talk about this all day though. And unfortunately, I think I have a few more slides left and then we start going through some of the questions and stuff. So simple Cassandra setup with Docker. If you haven't heard of Docker, you need to check it out as well. Docker's really cool. It's like a packaged virtualization. It allows you to run basically applications, package an app and all of its dependencies into a portable container. And then you take that container and you can ship it to production. You can ship it to a data center and everything works really well. So Docker's cool, it lets us spin up really fast clusters. So if you wanna test out a Cassandra cluster, an easy way to do it is by going to github.com slash tobert slash Cassandra Docker. And you can actually pick up something that'll allow you to really quickly build a Cassandra ring. By the way, this is done by ElToby. He's an awesome guy, he works for Datastacks. So they're the people that really have put a lot of time into Cassandra and their customer support's awesome. They have a startup program. You gotta talk to them if you're gonna use Cassandra in production. They're really, really cool. Again, that's Dave's accent. Follow ElToby on Twitter. He's always talking about something really, really interesting. Once Docker's downloaded and set up, you can start a single-node Cassandra cluster by just typing a few commands there. You see Docker run dash D dash V, serve Cassandra. I think that version that 2.0.10 at the bottom is now 2.0.11. He's updated, I believe, that package. So alternatively, if you're on Windows, just grab the latest release from Cassandra and run Cassandra.bat. I'm pretty sure that that'll just start up without really having to do any major configuration at this point. So you can also just grab the release from Apache if you just wanna run a single-node and get started. So that's how we can spin up a cluster. Now, once we have a single-node or multiple-nodes going, then what we really wanna do is start coding. And so the code, you're actually going to see it, the language is called CQL. It's like SQL, but it's a little bit different. So originally, Cassandra made its debut and you had to access and read and write values with thrift calls. And thrift using those was like accessing a huge hash table and you put the values basically into a dictionary and then you sent that dictionary over the wire and then Cassandra figured everything out based on the dictionary that you sent over. It wasn't the easiest way to do things, but once you got used to it, it wasn't too bad. However, stay designed CQL so that everybody that's familiar with SQL can jump right into Cassandra development. So it's pretty awesome. So things to remember, though, when you're developing in CQL, there's no joins, there's no group by, okay? So there's a few reasons for this and we'll go over them, but dating Cassandra is expected to be mostly denormalized, okay? So this is, we're not talking about normalized data sets. Cassandra writes are extremely fast, way faster than reads, and it mitigates the extra right penalty for writing something twice. You're gonna notice that Cassandra supports compound keys and the data that is inserted with a compound key is grouped together by the partition key and this is super duper important. So that means you can write a very wide row in Cassandra and read that row back. Basically, it's not gonna be a single seek, but it's not gonna have to seek for every sub column in the row. It's gonna read the whole thing together and that makes even running on a disk really fast. So we're gonna keep that in mind. You cannot use a where clause to filter on columns that aren't part of the row key or secondary index. Partition keys must be queried using the equal operator or in statements. So you have to remember that you can't just go, I wanna select everything from the inventory where the name is primitive, right? You can't do that, Cassandra's not gonna go and troll through 200 gigabytes of tables to go and try to find you that unindexed value, okay? It has to be indexed. So if you need to search by something, you have to remember that as part of your design. It's gotta be indexed one way or another. So it's either gotta be part of the primary key or it has to have a secondary index. These rules and features, they keep you from shooting yourself in the foot so that you're not doing whole entire data scan, you're not doing table scans. So a question, what kind of data could you move here? Anything. Cassandra doesn't like super duper wide rows. Like if you're inserting multiple megabytes of data, you're gonna wanna split those up into chunks. But once you do that, you could definitely store assets on Cassandra. You could store your primary data, like user data, everything, anything you want. And they even have ways to update data atomically. You have to go through the docs to make sure that everything that you would need would be there. But it's pretty much, at this point, CQL is pretty much able to do anything you could possibly think of. They even show demos of transactional, like financial stuff in Cassandra. So they definitely have a very wide range of uses. So things to keep in mind. As cell viewers, they do not request subfolders individually in inventory fetch. The protocol can do this, but instead all folders and subfolders are retrieved as part of the skeleton during login. So when you log in, the server's sending you basically a list of all your folders and subfolders and everything is there and that's what the viewer starts with. So even though when you open a folder, it has subfolders, the viewer already knows they're there. It's not getting another fetch from that. So we're gonna keep that in mind because it's gonna be important when we design the query and everything. All items inside an individual folder are requested at once. We wanna optimize reads based on this fact and not turn every item into an individual random IO. That would be bad. We don't wanna do that because we're gonna be able to get really fast speed out of this if we make sure that when somebody requests a folder, all the items come up in a single query and not have to do multiple queries and also not have to force Cassandra to do all sorts of jogging around the disk. Even though most people that use Cassandra will recommend SSD. So we can use a compound key to achieve not having those random seeks. Items are resident of the world based on their UUID. This gives me a sad face because it does create one extra table for us in our implementation. So therefore we need to map item IDs back to their parent folder ID. So when you go to something that we have, we know exactly what folder it's in because in this design, the folder itself actually holds the item data. That's a little weird considering where you come from in SQL, but I'll show you how that works. We do this explicitly and avoid secondary indexes, which seems to have issues with becoming stale and a couple other things judging by mailing list traffic. So I avoided secondary indexes which would have allowed us to not have another folder there. Just because I see everyone's want to see things pop up a bottom. Maybe there's still an issue there. I'm not sure. All folders have version numbers that get incremented when items or subfolders are changed, created, moved or deleted. And we'll use a special SQL column called a counter. So Cassandra does have support for keeping track of a version number or keeping track of something that's incrementing. It's called a counter. It's built right in and easy to use. So that's really cool. We don't have to read the old value and write the new one. In other words, Cassandra will be able to increment the value for us. So this is what our schema looks like. You might have to zoom in. I just want to talk about this quickly. A few things that you'll notice is our data types. We have a native UID data type, an integer data type, Varchar, Varkar, however you want to say that, data type for text. We have a Boolean data type. And then you'll notice the big thing on this is the primary key, right? I talked about that. That's going to keep, especially in the older contents, that primary key having the folder ID and the item ID be part of the key is going to keep the items grouped with the folder, okay? So whenever we pull, whenever we ask for just the folder ID, we're also going to get all items in one fell swoop. That's a single real fast return. We're not querying each item individually. That's going to make things really beautiful for us and make things quick and keep them fast. We also have, you'll notice, we have a skeletons table. That's the initial folder skeleton that you download. We're keeping track of whether it's a root, a top level or a leaf folder. That allows us to prevent the creation of multiple top level folders that confuse the crap out of the code when it's looking for what folder to put a texture or something else in. So only on the initial create will we allow something to create a root or a top level. Therefore, it also prevents the viewer from going absolutely crazy and inserting multiple root folders. I've seen that happen too in the past. And this is all designed to mitigate those crazy things the viewer likes to try to do when it doesn't think it has the right view of the inventory, which is awesome. Folder versions, that's where we're tracking the folders. And again, this is the counter column I was talking about. So that's the folder version. So each, unfortunately, we couldn't put that counter into the skeleton. That's where it really belongs, but Cassandra does not allow you to put a counter column on a non-counter table. So we had to do it this way. So item parents, that's that reverse mapping between the item ID and the folder ID so that we can find the item when somebody goes to read something. Which is actually a low runner case, believe it or not, compared to inventory downloads because everybody loves to clear their cache, which is awesome too. Yeah, I have a lot of stuff that's awesome. So a bit more detail about the design. Again, note that the schema is geared around how the data's gonna be queried. It's not geared around the class model, okay? This is a little bit different for everybody that's done probably SQL designs in the past until you get to a certain scale and you need to end up sharding the tables off anyways, but the actual data set is designed around how we're going to query it to be most efficient. So you'll notice I'm making a big deal out of this primary key thing being multiple columns. We have a partition key and a clustering column. And again, we're using this primary key due to the way Cassandra stores data. When you use a compound primary key, all the data matching the first component in the compound key known as the partition key is grouped together on disk. This means that when we query using this key alone or with this key of the range of clustering columns, Cassandra is able to retrieve the data without seeking out each individual row for the clustering column. So once we have a folder ID, we also can very efficiently get every single item in that folder, which is what we want because of the way that the viewers are gonna request the items in the folder anyways. So we've designed around the way that the viewer pulls the data. We've designed our schema around trying to be the most efficient that we can. So I'm gonna show some code examples, but first there's a few things to remember. And that's since we're maintaining a denormalized data set, we wanna make sure updates to item and folder parentage and versioning are reflected in all related tables. So we can make these queries via batches. As of Cassandra 1.2, and right now the recommended version I think is for production is 2.0, point X. So as of 1.2, batches are atomic by default, which means there's less of a chance of inconsistency slipping in. Because one thing with these systems is because we've broken these tables out, that normally maybe this would be two tables and it would be really hard for any inconsistency to slip in like where the folder, the item doesn't belong to a folder anymore, but it's marked as belonging to the old folder and things like that. Because we've denormalized this data set, we have to make sure that things stay consistent and so that everything stays running good. So what we're looking at is Cassandra 1.2 batches are atomic so we can make sure that when we update three folders at a, or I'm sorry, three tables at a time, that all that data either goes in or isn't executed. So it's either all or nothing and that's cool. It's Cassandra 1.2 guarantees that. And remember, so just a few things to remember about the design, moving a folder requires you to alter the skeletons and update the folder versions table. Renaming a folder requires you to alter skeletons, folder contents and folder versions. Now you're like, oh my gosh, you know, these are simple single queries in SQL to do this. Why is this so complicated? Well, remember, this is denormalized and also remember that we have a few tables to worry about but it's cool because the writes are super fast in Cassandra. We don't have to worry about those extra writes being slow, they're super duper fast. Deleting folders and items requires hits to all associated tables. Moving or renaming an item requires you to alter folder contents, folder versions, item parents. Okay, so that's just some of the technical details. All right, so this is CQL real quick. Yes, you can bundle all requests to be atomic except for the ones with the counters because again, for some reason counters are special. So counter increment cannot be bundled with other statements that don't touch counters. So when we do counter increment, it's always gonna be by itself, but that's okay because if the folder version is slightly inconsistent for like a millisecond or something, that's probably not gonna be much of an issue and actually in production for us, we've never seen that become an issue. The folder version's usually, it's only used by the viewer when you say you log out and then you log back in, the viewer gets the skeleton and it wants to know what folders it needs to refatch. That's where that counter's gonna come into play. So that's when you're gonna need that counter. So it's very unlikely that single non-atomic write is gonna cause an issue. So we have some CQL examples here. So you'll notice if you look at this, it's pretty familiar. It looks very much like SQL, insert into folder contents and then the list of columns and then values and then we're binding to that, we're preparing it. And then you'll notice the only thing that's different really with this is that folder attributes, insert statement set, consistency level quorum, okay? So that means we wanna write to say two out of three nodes and make sure before that write returns, before it comes back to us and says everything's okay, it's gone to at least two out of three nodes, which is cool, okay? So Cassandra will actually usually write to all three nodes, but if one's down, if there is as long as there's at least two up, we're good. So it'll return that and that write will be considered successful. So that's your little bit of CQL and there's another slide with a little bit more CQL in there, but remember to insert a folder, we need to insert into skeletons folder versions and folder contents. So this is the batch I was talking about. This batch is gonna be atomic from 1.2 on. So you'll see that we have, and I didn't list out the CQL because it's too big, it doesn't fit on a slide. So we got batch equals new batch statement and then we're adding the two insert statements. Those are gonna get executed atomically. So either both are gonna happen or neither. Then we have a version increment, which that's what I was talking about with the version increment has to be separate. So I bundled that version increment into a function and that's just gonna execute another CQL statement. So you'll notice we were able to do two of those in a batch that keeps the skeleton and the content consistent and then we do the version increment at the end. And you're gonna see that pattern throughout the code. And yes, all this code's available online and the last slide actually has the URL and everything to the GitHub where you can grab this and then start playing with it. Also, I have everything's unit tested in there. You can see that it works and you can almost grab it and start running with it right away except for integrating it into your open simulator installation. So what's up with version ink? Like I said, we can include a counter table as part of the batch with non-counter tables. So unfortunately, we need to increment the counter separately. And so this is how you actually do an increment statement. So again, here's a little bit more CQL. You can prepare, update folder version and set version equals version plus one where user ID equals and folder ID equals question mark, which means that we can fill those in. Again, second consistency level quorum. So that shows you how the version increment works. And again, it has to be separate from the other batches because it's incrementing a counter and that's, counters are special. So I am, I guess we have 13 more minutes. So this is the part where I ask you for questions and I'll answer them as best I can about Cassandra and implementation stuff. So go ahead and if you have questions, I am watching. And again, this is a really good solution for just about anything you wanna do that you know you're gonna need to scale up. A lot of grids, you know, if it's just in your house and you're using it to be a hyper grid port and you're not expecting to get thousands and thousands of users, you probably don't need this stuff. But if you think your grid's gonna get big, sometimes it's good to be prepared. Okay, question from Justin. How long did it take you to implement your Cassandra-based inventory service? Originally, when we did it now, now in-world Cassandra-based inventory is still thrift because we did it, when I designed it, CQL was in 1.0, it was just barely out. A 0.8 was the version of Cassandra that was being run by all the big guys and so CQL was a baby. So I had designed a thrift-based one and that's what we're running now. And that thrift service, I think to learn Cassandra and to implement everything took me about two months and then there was probably another month of testing and then probably another month of bug fixes after that. But the CQL makes things a lot easier because it's really hard to keep like a bunch of dictionaries and how they're interacting with each other in your head. So CQL makes that a lot better. So probably if I were to, like this presentation took me about, I don't know, probably 40 to 50 hours to do and it's a fully functional inventory system. I have all unit tests for moving items and deleting and purging and all sorts of other good stuff in there. So it's basically ready to go. So how do you configure? Okay, how about binary data? Yes, you can store binary, it's a data type in CQL. So no worries about binary. Now, if you have to store huge giant blobs, you have to remember that Cassandra's going to read it all into memory first before it sends it over the network. So you don't want to create a gigabyte blob and then try to read it because you're gonna probably create an old out of memory condition on the server. So you want to split a huge binary blobs up into chunks. So if it's small though, if it's, you know, you're talking like 16, 64K, you know, under a megabyte, you can pretty much just jam it into a column and be done. So next question, how to configure the nodes to know about each other? Is there a master server? No, here's something really cool about Cassandra guys. There's no masters, there's no special roles on any server. Okay, so every server is equal. So you don't have to worry about configuring like this server as like a special, this is like the server that knows everything and these servers are kind of the babies that don't know anything. Cassandra doesn't work like that, everything's equal. So what you do is you create a cluster and then you tell each of the nodes in the cluster what you give them or what are called seed nodes. So like, say you know that your base Cassandra cluster is four nodes out of eight, okay? So your first four nodes become seed nodes and then those nodes talk to each other to figure out the configuration of the cluster. That's it, they gossip to each other, they figure out how the cluster is built, they figure out who owns what and boom, they're done. When you add nodes, it's the same way you just specify a cluster name, you add the node, you tell it what its seed nodes are, it's gonna grab the configuration, everything else from the other nodes and start going as soon as it's bootstrapped and this stuff happens fast. I put up, I moved, let's see, we went from four to eight node Cassandra cluster and I put up four nodes in about an hour and nobody noticed, like it was so cool and these nodes just, they just do what they're supposed to, they talk to each other, they figure things out and your cluster starts moving, so it's really awesome. Can SQL be converted to CQL? Not directly, you definitely wanna understand how Cassandra works on the inside before you just start writing CQL because you don't wanna end up writing an inefficient implementation. So it's definitely worth your time to understand the implementation and then start writing some CQL, but you'll notice how similar CQL is to SQL and so that allows you to convert back and forth pretty easy. How easy it is to add new columns, very easy, there's an alter table statement just like you're used to and it's not like MySQL and NODB and the others where you add a column, it's gotta rebuild the table, it doesn't destroy your cluster while it's working, it just works, so it's really cool. Does the Cassandra cluster rebalance? Yes, when you add nodes, the Cassandra cluster automatically rebalances your data, it shuffles the stuff around and based on the partition keys, it moves data around to the nodes where they belong. So yes, it does automatically rebalance. Now, the cool thing is, if you double the size of your cluster, you don't have any shuffling to do besides for draining old data from nodes that no longer handle it. So if you go from like two to four, well, two doesn't make a lot of sense, but let's say three to six. The data that's currently on those three nodes, half of the data is gonna go to the new nodes and then actually you'll see the disk space freed from that happen. If you just add like a single node, then all the nodes have to sort of rebalance because it affects the entire range. So you'll see data moving around and stuff like that, but doubling, adding twice as many, like going from three to six and six to 12, it actually is pretty easy for Cassandra to move the data around. And now it's actually even easier because Cassandra uses V nodes. Yeah, so that's a little bit of a dated information that I'm giving you because Cassandra now uses V nodes, which means that it sort of automatically, even if you just add a single server, it will partition the ranges and sort of make sure that there's no hotspots and stuff. So definitely check out the docs on V nodes and how that works because they're pretty cool. But right now we're still using the original design, which was just you pick the key, you divide the 128 bit space into X or N nodes, and then you pick the keys and they're each responsible for their ranges. So, but V nodes make that a little bit easier and it will automatically, just adding a single server, it will figure out how to partition the data moved around. All right, so I haven't seen the question in a little bit, so thanks for coming. You can find the full source code with unit test coverage that's on GitHub at github.com slash inworlds slash open sim CQL inventory. And if you go there, you'll see the entire, you'll see the entire bits of source code there and it's all workable, it all works. And you can go ahead and do what you want with it. It's under the BSD license, which should be entirely compatible with the BSD license. So, so do what you want with it and have fun. And hopefully we can start to see some more Cassandra solutions start coming out because trust me, they're not going anywhere, especially data stacks, they have some big customers and Fortune 100s and all that kind of stuff. So good stuff. You can reach me if you have any other questions probably via GitHub and just let me know. So thanks a lot everybody. Thank you, David for a terrific presentation. As a reminder to our audience, you can see what's coming up on the conference schedule at conference.opensimulator.org. Following this session at 11 a.m., we have a break in the schedule for lunch or dinner wherever you may be in the physical world. In addition, if you're a crowd funder at the exclusive access level or above, you are invited to a VIP question and answer session with today's keynote speakers in the staff zone audit auditorium at 11 a.m. Finally, we'll return after the lunch break for an hour exploring the conference grid from 12 noon Pacific until 12 45. If you haven't yet had a chance to visit the expo regions or play the open meta quest game, this will be your chance. Thank you again to our speaker and the audience. We'll be back after lunch. Have a great break.