 Okay. I feel so much better. Okay. Great. Well, thanks everyone for coming. And welcome to my talk. We're going to talk about re-architecting CardsDB secondary indexes on top of FoundationDB. Hopefully from Adam's talk, you now know that we're building CardsDB on top of FoundationDB. My name is Garen. I am part of the Apache CardsDB team. I'm on the project management committee. I come from South Africa. And if you like sports, especially rugby, with the world champions, it feels amazing. I'm going to gloat about it. I'm not ashamed. And I'm a freelance developer. But the main work I do is for IBM Cloudants. IBM Cloudants are there. I get to work on Apache CardsDB because it is the kind of the main underlying storage that IBM Cloudants and underlying database that IBM Cloudants is. So CardsDB has three main secondary indexes. It's got Mango, MapReduce, and Search. And we're going to look at all three of them and how we're actually implementing them on top of FoundationDB. So the first one we're going to look at is Mango. And Mango is very much quite inspired by MongoDB. It's a much simpler kind of querying syntax than what we have for MapReduce. And it follows the same kind of querying syntax as MongoDB. You can create indexes. You're querying it using the same selectors that MongoDB does. So currently in CardsDB, it's kind of just an actual wrapper on top of CardsDB's MapReduce indexes. So every time you're making a query, we're actually using MapReduce indexes underneath and then doing some little bit of wrapper layer on top of that. But when you move to FoundationDB, we want to make it its own complete index. And so we've taken a bit of inspiration from the FoundationDB's document layer. And one of the big things we're really excited about when it comes to Mango is that the index updates are going to happen in the same transaction that a document update happens. So that means then that every time you do a... if you update documents, then do a query straight afterwards and a new transaction, that whole index is already up to date. So we can kind of keep it update all the time. Don't have to build it in the background. The kind of basic data model of how we want to store Mango indexes in FoundationDB looks a little bit like this. We have a section where we keep a list of all the indexes and then we keep the keys that those indexes are actually storing. So the first one here is index ID1 and we're storing the keys for that one on name and age. We then also keep a build status of it, whether it's building or running, and a sequence. And then we have a build status from the changes feed. So what actually happens is, as much as we want to update the index in the document update transaction, if you've got an existing database that's really big, you can't immediately start using it. You actually need to build it first. So that's where we have those two fields. So we set a... the moment an index is created, we set that build status to building, and we set the sequence at the changes feed that that index was created. We then, in the background, build that index up to that point. At the same time, any new doc updates that come in, we're actually indexing and adding those into the index at the same time. So the moment we've hit that sequence in the database and building that index... sorry, as soon as we've built that index up to that sequence point that we've stored there, it's then ready to be used and we can change it to a running. How we store the fields is very kind of straightforward. The first part of the key is the index ID, then the actual keys that we're storing for that database, and that's how we use two querieds, and we store the document's ID as well so that we can formango indexes we always, in most cases, return the document as well. So we can fetch all those items out of the key that we stored in FoundationDB and fetch the document and return it to the user. Now, one of the big things we've had to look at, is the actual ordering. FoundationDB does a byte string ordering of your keys and stuff. CardsDB has its own prior history and own way of storing things and it's got this thing called view collation. So that's how CardsDB orders the things and it orders its keys and it kind of looks like this, where you've got special fields so null, false and true always come first. Then you've got numbers, one, two and three and you'll notice that we don't differentiate between integers, floats and doubles. We've got strings, arrays and objects. So whenever you have anything stored in index it needs to be ordered in this way. So we needed to adjust the way we store keys in FoundationDB so that we would get this actual ordering. And the way we've solved that is that every time we store a key we actually create a tuple and the first part of the tuple is just a number that sets it as what it is in terms of whether it's a special or numbers and it then starts to order it. The second part of the tuple is then the actual whether it's a special or a numbers and it's the original key. With strings we go another step further. With strings we actually use the ICU locale library to store strings and order them. So with FoundationDB what we've actually then do is we create a sort string. So every time we've got string keys we run it through the ICU library creates a sort string and use that sort string to store it in FoundationDB. So now let's look at MapReduce indexes. MapReduce indexes are created like this. What we do is we create a design document and in that design document you create a view. So MapReduce index is also kind of called views. In this case we have a map function and we emit something so you can see there we're emitting an array and that is the key. We're emitting a class and a name and then the value of 1. We also then define a reduce function as well. In this case we're using one of the CardsToBe built in reduced functions called cart. It's also possible to create your own JavaScript reduced functions but we really try and discourage that because most people it's really difficult to get right. Once you've created this design doc like this and saved into CardsToBe, CardsToBe would then build your index and the process looks a little bit like this where it reads from the changes feed, fetches all the documents that it needs to put into this index runs those documents through the JavaScript query server. From the JavaScript query server we get a bunch of key values. Those key values are then stored in FoundationDB. So because of this process and because we have to always send all the documents to the JavaScript query server we can't actually do this in the actual document update transaction and it actually has to run in the background. So I'll show you how we do that just now. What we're storing in FoundationDB looks a little bit like this where we store the index ID to start off with and then we store the keys. So in this case the name and the age and we do the same encoding process I talked about with Mango. We store the document ID and then in the value side which is everything in red there we store the keys again. So once we've encoded the keys for FoundationDB to get the correct sorting we actually can't decode it back to get the exact same keys that the users actually put in. So we store those keys then in the actual value side of FoundationDB along with the value. So every time we do a query we fetch those values and return both of those back to the user. Now with Mango indexes every time we update the index because we're doing it in that document update transaction we can fetch the old documents look at its keys that are put into the index remove those old keys look at the new document that's being updated or added into the database and add the new keys into the index. So we can do that in that one transaction but with MapReduce because it's being built in the background we can't do that we can't fetch the old full document body so we have to keep another thing called what we call an ID index and the ID index keeps the list of the keys for a specific document that are added to an index. So every time we update an existing index we look at the ID index for the old set of keys that this document has contributed to this index remove those out of the index, get the new keys, add those into the index and add those into the ID index as well. And that's how we keep track of it. So how do we build it? So in a traditional Cache Me 2 or in Cache Me 3 building every time we have a query for a view we have a coordinating node and that goes in context each of the nodes in the cluster and those nodes then look at each shard for this view update that shard and updates the view, sends all the results back to the coordinating node which then collates it and returns it back to the client. Now the big thing with this is that each node has to have a lot of states. It has to know how many nodes in the cluster, where all the shards are and which nodes those are and have to do as Adam mentioned earlier there's a lot of scatter-gather situation there. So we want to avoid that with FoundationDB and move to a situation where each node actually doesn't know how many nodes in the system and that all the state is actually moved into FoundationDB. So if you want to, if you in a situation like using Kubernetes you can add nodes if the cluster is under a heavy load and then remove them if later you don't need them. So a great example of how we're moving all the states into FoundationDB is Cast-to-be jobs. Now Cast-to-be jobs is a global queue for background jobs and all of its state is kept in FoundationDB so the whole queue and the status of everything is in FoundationDB which means we can spin up multiple workers. These workers spin up, they connect through the Cast-to-be jobs API and they see what jobs they can work on and build the index. What's also nice then is through FoundationDB again as well we can monitor for failed jobs so if a worker hasn't updated its state in FoundationDB for a set number of time it means that job has failed we put that job item back in FoundationDB queue in the Cast jobs queue there and another worker can pick it up and continue working on it. Along with that we've also added pub sub progress so anything that needs to know how a job is progressing can listen in and get feedback on that and know when it's ready. So how we want to work with FoundationDB is we have a view query comes in that immediately puts a job on the background queue which will then build the index and get it up to date so it's done. So a worker will accept that job build the view up and keep reporting a status back to the view query node that node will then know that the view is up to date query FoundationDB and return the results back to the user. And this is one of the sections where we still got a little bit of work around optimizing it and improving it so it's always that trade-off with original cards to be because each index was split into shards we could have a whole view in parallel and build it really quickly but now because we're using a single key value storage in FoundationDB at the moment we're kind of building the index all in one step one at a time. So our building is a little bit slower but our querying is significantly faster now with FoundationDB. So that's one of the areas we are going to definitely look to improve in the near future. So now let's look at reduce indexes. So reduce indexes allow us to aggregate the map index results. So if you've got a map index that returns something like that where you've got some dates the reduce allows you to aggregate them based on your group level. Now group level is at what value in the array that you've got. So in this case you've got three items in the array so the group level is three. If you set it to a group level of two you can see we can start aggregating on year and month set the group level to one and we can start aggregating on year. So as I said reduce allows for aggregation of map results and we support two different ways. We do built-in reduce functions that Katsubi has and then you can also use custom js functions. Now one of the things with reduce is that it relies heavily on the fact that internally Katsubi is using a B-tree to store your index. So every time what we do in Katsubi is that when you store a whole lot of map key values you're actually storing them in the non-leaf nodes we're actually storing aggregations of the results below. So every time we do a reduce query we don't actually have to go right down to the leaf nodes pull those all out reduce them in real time and return the results we can actually look at higher non-leaf nodes and get those values out and return them which means that the result is a lot quicker to query it and a little bit more efficient. But now moving to foundation DB we don't have a B-tree anymore so we have to try and adapt that way of querying to work in foundation DB and so the way we think it's going to work is we're going to use a skip list. So the skip list idea comes off the record layers ranked index design and how a skip list works is you have multiple levels. So level north you have all of your reduce results and each level above you have a reduced set of those. So say level one you have a reduced set and it kind of reduces less and less results higher up. Each time a key value is not added to a level it is then aggregated with the previous results meaning that we start getting that aggregation. So when you want to do a query for a reduce index we don't have to go right down to level north to query the whole index we can actually kind of start higher up and get a certain number of key values before jumping down if needed which again we're hoping is going to make it kind of same kind of efficient query like what we have with our B-tree design at the moment. Finally we have tech search. So with Katsubi 3 which will be our next release and the final release before we do the foundation DB release is going to have techs search. Now Cloud Engine has had it for quite a while and we have open sources but it's never been official part of a release with Apache Katsubi. So from 3 onwards it's going to be that which is quite nice. And it follows a very similar design to the map reduce indexes you create a JavaScript function you say what you want to add to the text indexes and what you want to use to query and internally we're actually using Lucene. So under the hood we have Lucene manages the specific shards for the whole text index and we kind of manage those and that's worked pretty well but moving to foundation DB we didn't want to have to have that same situation where we've got foundation DB for everything and then got Lucene to manage our tech search and having to manage those shards and everything where we kind of want foundation DB to manage everything. So the idea we've come up with and the design we're doing is that we're still using Lucene but we kind of cut out a bit and we implement the directory layer. So instead of actually writing the text indexes to disk we're actually now using the directory layer actually writes to foundation DB. We then add some extra optimizations on top of that. We have some Java nodes that we run multiple Java nodes that store the indexes in memory. So every query we will go to specific nodes that have that index in memory which then makes querying significantly faster and we use it for building the index so it can be built in memory and queried. Again Cache Tree Jobs is also used to build the index. So the current status of where we are Adam gave a nice list of all the features we've implemented but we still very much the Cache Tree layer has just begun we're still kind of perfecting and learning where we can improve and learning how to think in a foundation DB way an additional way of solving our problems. And everything we're doing because it's all open source you can follow along completely so you can get the source code from the Apache Cache Tree repo. We also write RFCs for everything we've talked about today so if you're interested in any of the ways we've done it we've got a much more detailed RFC which really dives into how we're implementing everything and that's in our documentation repo and please join our mailing lists or we've got a Slack channel as well where we talk about all of this so if you're interested in any of this please join us, it's great to get some new ideas and new people getting involved and sharing how we build this all together. And so thank you, thank you for coming to the talk and I just want to say a big thank you to the Cache Tree community it's been really fun working with everyone as we build this all out and it's a really nice community to be passed on so thanks a lot.