 Great. After these technical hiccups, let's get started. My name is Chris Immelman. I'm here to talk about Redis Jason, a document DB in Rust. I have about 40-45 minutes, right? So, questions, please at the end. I'm available in the hallway afterwards. So, free to grab me if you have any topics or discussion points. First of all, I'm going to dedicate this talk to my son, Luca, who cannot be there. If you're watching the stream, Luca, this is for you. Second dedication is actually to something of the CTO team at Redis, Labs, who are the people behind the code base? Who am I? I don't go through all of the details. Suffice it to say, I help a lot in Frankfurt. I'm an arch-package maintainer, blatant, blatant plug, I'm going to run the Linux beer vando this year in Kronberg. So, if you have a week at the end of August, early September, to join about 20, 30 people for discussions, hackathons, presentations, hiking, and of course beer, check out the website. It's actually linuxbeervando.com, or monochromecforward slash lbw, monochromeccom, forward slash lbw2020. Second plug, and then I'm going to finish with the plugs. I run a podcast. We just posted the first episode. It's called Linux In-Laws. Domain has gone live yesterday. First episode will be aired on Hacker Public Radio, HPI's your go-to source, on the 13th of February. It's open source with a dark site humor twist thing. Plugs over. Hobbies include software development lifecycle. That's what I'm dealing with for the last, what? 20, 20, 25 years. I've been using open source for the last 30 plus years. I'm also dabbling in IT security. And if I still have the time, I'll work for a company called Redis Labs for disclosure. As a solution architect and liaison. Couple of things, basically, what this talk is all about. First of all, how many of you have used Redis or know what Redis is? Wow, OK. There are a couple of intro slides. I'm going to go through them fairly quickly, because if the majority is already familiar with Redis, there's a point in repeating these details. Then I'm going to talk a little bit about the architecture with a special focus on how applications perceive this document DB written in Rust. And of course, summary and outlook will conclude this talk. As I said, I'm going to go through this fairly quickly. Redis is, so far so to say, Redis is one of the kind of leading in-memory databases. It was founded about 10 years ago. We have more than 25,000 GitHub stars. There are more than 162 clients written in more than 57 program languages. I reckon that makes Redis one of the most loved databases when it comes down to client connectivity. How many of you actually have programmed against Redis? And I reckon most of you did this in caching use case, right? Or as part of a caching use case. This is basically where Redis comes from. About 10 years ago, the project initiator, somebody called Salvatore Sanfilippo, was looking for a performant reporting database for one of his web projects. He checked out MemcashD, he checked out other solutions that didn't check out. So he came up, he wrote his own database as a key value store initially. This is basically where Redis comes from. Does anybody know what Redis stands for? Remote dictionary server, this is what this is it. But over the years, Redis has evolved into much, much more. In about 2015, Salvatore introduced something called the module SDK to the code base that allows the initial Redis server implementation to be extended with so-called modules. And Redis JSON is one of them. The idea is to take your native Redis implementation and to provide and to plug any gaps that you may perceive from an application perspective with functionality implemented in the modules. So over the years, there have been modules in the area of graph. Redis graph turns Redis into a very performant graph database. Full-server compliant interfaces similar to Neo4j. There's something called time series that turned Redis into a time series database comparable in functionality to something called InfluxDB. And so forth. The beauty is that all these extensions are on GitHub. So if you're looking for a performant graph database, simply clone the Redis server site, clone the graph module, compile the whole thing, load the module when you start up the server, and then you have a full-server compliant graph database in memory at your disposal. The idea was to turn Redis with these modules to turn Redis into something very application-specific while maintaining the advantages that native Redis brings with it, namely low latency and hard throughput. This is the kind of idea. And the modules that you see on the screen are just the ones provided by Redis Labs. There are many more modules out there in the world living on GitHub. So if you think Redis doesn't offer something that you need, simply deploy your favorite search engine and Google for a module extension. Chances are somebody has written something that you can either clone or deploy natively, is in the right way. OK. Again, this is native Redis. I won't spend too much time on it, but this is what native Redis offers out of the box. And the majority of the modules would fall back on these generic data types, including strings, sets, sorted sets, hyperlock lock, which is a probabilistic data structure and so forth. As I said, this comes with native Redis. This has been in the code base since pretty much day one. Most of the client-side implementations would reflect these data structures, either as part of their local ecosystem as a client-library site or written by an extension. And what is Redis JSON all about? You have your native Redis, and then you have something called Redis JSON, which is essentially an ECMA 404 compliant module that turns Redis into document-oriented to be in functionality comparable to other document-oriented databases. If you see this, Mongo, if you see this couch base, you know what I mean. OK, you have your typical JSON commands that allow you to insert documents. That would be JSON set that allow you to retrieve documents from a database. That would be JSON get. But also, because JSON offers Ray and all the rest of it, you have JSON array append and JSON array insert, which allow you essentially to insert elements into an array and basically append them at the end. The navigation is done by JSON path. Who of you have used JSON path in the past? OK, not that many. OK, so I'm going to spend a little bit more time on this. Again, just deploy your favorite search engine of choice. JSON path, if you're looking for the specification, JSON path essentially is a standardized way similar to a document object model, if you know what that means, as in a DOM, that allow you to access data in a JSON document. So essentially, all of these levels are separated by simple dots. All you can use array notation to access these levels. So the following is equivalent dot foo dot bar. By the way, an initial period also always reflects the root of the DOM, of the document object model. That's something very important to keep in mind. So dot foo dot bar is equivalent to the first array in this level, which comes after foo, which is essentially foo dot bar, or you can write it all in array notation. Simple as that. And something very important, you also find this in DOM query languages, is the support for Y cards. So if you are unsure about specific selector, just insert a star, for example, and you will get back the corresponding array, the corresponding document, that reflects that path. How does it look like from the server side? You have the native server, and this is generic. This is not ready JSON specific. You have essentially the ready server that is running on, by the way, what port? Can anybody tell me? 63.79, right? Something very important, if you want to access redis from the outside, make sure that your firewall is open for that port, or kind of fortified correctly. So all communication goes through a wire protocol called REST between the server and the implementation on the client side libraries. And the client side pretty much looks similar to what is on the screen now in pretty much any programming language for which there is a redis client. Essentially, you have a small wrapper around a socket interface, which is called HiRedis. Written in C, highly performant, does little more than just wrapping socket access. On top of this, you have language-specific bindings. As I referred to earlier, redis is supported by more than 57 programming languages. So each and every programming language has at least one, if not more clients. The problem or the issue, the challenge with these programming language, they're all different. You have Go, which is compiler-based. You have Python, which is Python 3, very important these days. You have Python 3, which is interpreter-based. You have native C, or you have C++. For all of these programming languages, there are clients at library implementations, but they're all different. So there's an abstraction layer essentially implementing the interface layer to HiRedis, but that can understand the specific semantics, the specific interfaces, towards the programming language of choice and including the runtime environment that these programming language uses. On top of this, you also have module-specific bindings. If you go to OSS.RedisLabs.com, you'll see a list of modules, as I said, that Redis Labs put out there on GitHub. And most of them would have a Python, a Go, and maybe a JavaScript implementation right from day one. So these are the kind of the basic interface libraries you find for the modules. Needless to say, these module-specific client-side implementations would use the language-specific bindings in order to talk to the server, which, of course, then requires a module to be loaded if the client-specific bindings should work. Let's take a look at how this is done for Redis JSON. Essentially, this architecture depicts a performance benchmark that I'm going to go into as part of a later part of the presentation. So I'm going to just spend some time on describing the architecture. You have the module-specific bindings, language-specific bindings, and then you have for HighRedis, as explained before. And on top of that, you have a small layer called JRetisJSON. And this is then used by a database performance benchmark framework called YCSB, probably known by most of you who are looking for a performance database, because essentially it stands for Yahoo! Serving Benchmark. It's a standard benchmarking framework for databases, simply to TCP, if that rings a bell, only concentrating on no-SQL databases, because Redis is a no-SQL database, right? So the idea is to extend YCSB with a thin driver that talks to the object-oriented, the document-oriented database, and it does so through the ordinary architecture-specific stack. On the server side, this is reflected. You have the native Redis server written in C. As in, you can pull it down from GitHub. It's all written in C. It's there. I used version 505 for this, but more on this later. Then you have something called the module SDK, which written in C has been there for the last couple of years, as I explained. The trouble with that, of course, is that you cannot use this really from Rust right away. This is the reason why Redis Labs created something called a module crate that essentially wraps the module SDK in Rust bindings. As you probably know, to call C from Rust, you have to tweak it a little bit. Essentially, you have to say, now look, the Rust compiler. This API is not safe, because the usual memory management techniques, as in so concept of ownership and all the rest, and borrowing your Rust experts, you know this, do not apply to C code. So the idea behind the module crate is essentially to wrap the module SDK in something that Rust can understand. And then you have the remaining code base written in Rust talking to this module crate, which then in turn talks to the module SDK, which in turn talks to the server internals. By the way, this is Redis JSON 2. The first implementation called Redis JSON was written in C. As we can see in a couple of minutes, when I'm going to go through the performance aspects between these two code bases. OK, the original implementation, some figures. The original implementation had about 5.2 kilolines of code. The new implementation is about 3.2 kilolines of code. What were the main decision points for re-implementing the already existing Redis JSON extension in Rust? First of all, yes, the native code base of Redis is written in C. But as probably most of you know, aging C code is sometimes or somewhat hard to maintain, especially if it's been around for a while. So C code bases bring, especially for new members of a team, learning code with them, plus also do not necessarily help with the overall total cost of ownership or something called technical debt. So there was a decision being made to going forward that the new implementation language for any new modules or re-implementation of existing ones like Redis JSON would be done in Rust. The idea was to have a more compact code base, and I think the figures kind of reflect this, with a lower technical debt and very important with a lower QA effort. Because there are fewer lines of code that need to be tested. And of course, that leads to a lower overall total cost of ownership when it comes down to maintaining and extending the code base. And of course, something important, when you're working for a company that sells support around this and other services, time to market comes down to something that may be important for the business side of things. This is the reason why it was a conscious decision about two years ago to go forward with Rust rather than C when it comes down to native implementation of modules. A little bit of experience when we had when we engaged with this re-implementation of the new code base. The team had quite a diverse background. Some of them were coming, many of them were coming from C. Some had some Java background and also Golang was present. But so the background was pretty diverse when it comes down to programming language. And the reason why up to then the main implementation language of choice was C because A, the module SDK being present already was written in C. And of course, the remaining server code base has been written in C. That's the reason why natively it was pretty much a no-brainer in the beginning to use C as the implementation language for any modules being developed inside Redis Labs. But going forward, as I said for the reasons explained, Rust was chosen for as the new technology to implement modules. So some lessons learned from the team that engaged on re-implementing Redis JSON2. Yes, Rust does have a steep learning curve. Just hands up, how many have used Rust more than two or three years? So you can probably reflect this, especially if you haven't put up your hands. That means you're kind of sitting in the learning first, I reckon. So you know what this is all about with regards to memory management can be tricky at times. I'm talking about the board checker. Somebody about a year ago told me if you have convinced the board checker or if you have convinced the Rust compiler to generate code, you're halfway there. In contrast to C, where this is slightly different. OK, but now the plus sides. A pretty comprehensive ecosystem. If you take a look at what's out there on crates.io, that's a lot. You have at least five web frameworks to choose from. You have crates for socket access. You have crates for Rust itself is self-hosted. So you have actually crates for ASTs as an abstract synthesis and all the rest of it. So chances are, like Python, you take a look at what's out there if you want to program a new code base and simply reuse the stuff that has already been written. So this is a major advantage when it comes down to implementing new systems. Because essentially, as with other any open source projects, you are resting on the shoulder of giants. Simple. Responsive community. If you take a look at rustlang.org, especially at the forums, it's amazing. When I started to learn Rust, there was always somebody out there helping me. Never mind how stupid the question was. If there are any stupid questions, you will get support. So in contrast to other communities, Rust, and I think this is one of the major, of the big advantage of the community and the language, has been pretty responsive and pretty supportive. And this is reflected by what the development team basically experienced when they first started to this enterprise of basically program something in Rust. And of course, the tool chain support is awesome. Not only you have different tool chains at your disposal, better, stable, and of course, nightly, if you want to check out new features, you simply switch to a new tool chain version and off you go. But something pretty important, cargo, best example, right? It's not only build system. It's a package management system all wrapped into one. Maybe apart from go lang, I've yet to find a programming language that does it all in one go. Maven for Java came later, and it took Java people a long time to get it right, my opinion, before any Java people killed me after this presentation. No, jokes aside. Mozilla decided about 13 years ago that they needed a new programming language, because C and C++, as in the code base, then in place for the rendering engine, didn't cut it anymore. So Rust was developed first commit, I think, about 11 years ago, if I'm completely mistaken. But they did it with intelligence this time around. And you'll see this if you take a look at the tool chain support at the ecosystem, and people picked up on it pretty quickly. It took Python about, I reckon, 15 to 20 years to where Rust is now within the short amount of 10 years. And that's pretty amazing, I think, for a programming language. More info on our beloved Yahoo! servering benchmark. As I already said, it's written in Java. It's a standard framework. So the idea is that you have quite a few DB integration layers as part of the native code base when you clone it from GitHub right away. Redis is, of course, supported out of the box. So is Hadoop, Mongo, Couchbase, even some graph database and all the rest of it. So the idea is basically, if you want to take a look at how your implementation, how your ecosystem, when it comes down to NoSQL, is performing, you simply clone the code base, you compile it, and then you can start testing. If, and that was the case when we started off on this performance benchmark exercise, if there's no integration with your new NoSQL database yet, it's not that difficult because you simply implement four, actually five methods in Java that talk to the respective clients at library implementation, and then you're good to go. So the idea is basically you have insert, you have updates, you have deletes, and then you have writes and you have scans, plus maybe initialization and finalization of the database connection, but that's about it. So the implementation of the Redis JSON extension that talks to Redis JSON is about 200 lines of Java code. It's as simple as that. So not a big deal. Don't fret if your database is not on the list of all of the already supported NoSQL databases. Writing that interface layer is not that hard, apart from using Java, of course. But that's a different story. OK, YCSB has the concept of workloads. There are five workloads that all reflect different use cases. For the purpose of this benchmark, I used about three of them. So they range from A to F, goes without saying. So they all reflect different kind of access patterns on the client side, as I said, different use cases. So workload A would reflect your vanilla cycle in terms of your 50% of writes, your 50% of reads, that hammer onto the database. Then you have workload B that has a more caching-oriented access pattern, namely you have about 95% of the database accesses would be reads, and only 5% would be writes. And then you have your bread and butter workload, which is F, that reflects a typical crud circle. As in, you read a data record, you modify it, and then you write it back. So for the purpose of this benchmark, this is basically what I focused on, especially interested in workload number B, because this is where this caching thing comes from originally about 10 years ago. How much time do I have left? 20 minutes. OK, plenty of time. OK, cool. So we can spend some time on the analysis of this. First of all, some specs. I used a stock, even as in the latest Ubuntu release. And the machine I ran it on is actually a Dell XPS 13 that has a mobile i7 with about 16 gigabytes of RAM, and it's 512 gigabytes of SSD. And the number of records that I used is actually around 1 million. Bear in mind that these figures, expressed in seconds, of course, can be scaled if you move to a different server specification, goes without saying. I travel a lot, so I use that machine that I'm presenting from as my go-to-server in the VertiCommerce. I use Redis 505 as a kind of reference architecture. You see that in the first line. Basically, to see how Redis JSON scales up against the native Redis implementation. The native Redis implementation is already, as I said, part of the YCSB code base. And then I used Redis JSON as in the C implementation, and I cloned this on the 2nd of January 2020. And one day later, I cloned the Rust version. The reason for the one-day delay was that my internet essentially broke down on the second after I cloned the C code base. And then I was traveling, and then I had internet access back again next day. So that's the reason for that one-day delay. But I took a look at the commits, and there were no commits in between, so that's the reason why. Pretty important, because I'm measuring only Redis. I'm not measuring, or Redis JSON. I'm not measuring ComBongo. I'm not measuring code base. I'm not measuring any other NoSQLDB as part of this benchmark. So I left it at in memory management only. So there was no persistence configured. If many of you have used Redis, you know that Redis has two types of persistence, namely append only files, as well as snapshots. I didn't use any of them, because I want to keep it straight, and I just want to confine it to Redis. So I simply said, now look, Redis, do your thing in memory as you have been doing for the last 10 years. Let's take a look at the native implementation first to get some sort of some-some feeling of how Redis is measuring up. The number of threads, you can configure that when you run your YCSB invocation. Number of threads essentially reflect or model the number of applications hammering onto that database instance. As you can see, and this is implemented by pure Java threads, thank you, pure Java threads on the client-side implementation because this is what YCSB uses when it comes down to accessing or simulating access to the server-side implementation. So one thread, four threads, and eight threads. Bear in mind that this is only a mobile quad core, so you're looking at essentially a dual core with hyperthreading. That's something important to keep in mind. Already at the kind of Redis level, we'll see a spike in performance when you move from one to four threads. This slightly goes up if you move to eight threads because that's what you see when you hammer with multiple databases, sorry, when you hammer with multiple clients onto a single database instance. And that's actually workload A and what is also, and this is reflected also in the remaining workloads. What I'm a little bit surprised about, and I haven't done a quite thorough root cause analysis yet why the performance for eight threads is actually higher on the native Redis level rather than four threads. So I reckon it's down to something called the Jettis interface that is used when accessing native Redis in YCSB. Jettis is one of the standard Java clients, apart from letters and something called Redisn. And as I said, I'm suspecting that this is basically down to the Jettis implementation. I used three or two, the tagging of the release was fairly recent, so I reckon there's something fishy maybe in that Jettis version. Switching over or comparing this to Redis Jason as in the native C implementation of the document DB, you will see the price you pay from when switching from native Redis to Redis Jason, i.e. the overhead, the performance penalty that you pay in a Vodacomas when you actually use that module giving you documents oriented functionality. And as you can see, it's not that much at the end of the day because essentially you're looking at 30% multi-threaded and even less when it comes down to a single thread. And now it gets really interesting because the comparison between Redis Jason and Redis Jason 2 essentially tell you the performance penalty when using a Rust code base and comparison to a C code base. So this is the impact that you have when you go from C to Rust. I'm slightly simplifying, but you get the drift. Let's take a look at the numbers. Work, let's pick a random workload. Let's pick workload A. The performance penalty that you pay is quite minimal because essentially you're looking at 44 versus 49 seconds. That's five seconds difference. But on the other side, you gain all the benefits that come with Rust. This is why this comparison for this particular use case is quite revealing I think when you are facing the decision what implementation language to use for your next project. Would that be Rust? Would that be COC plus plus? I'm gonna leave it here because we have about what? 10 minutes left some like that. So there's some more slides. The slides of course will be on the website. So free free to take a second look to do some further analysis. The code will be on GitHub very soon, more on that later. And feel free to get in touch if you have further questions on this. Okay, short recap. Redis JSON of course is your document oriented to be an extension basically of Redis and this is purely aimed at or the primary use case is in memory processing. So the focus is on what Redis comes with natively, performance and high performance and low latency as in maximum throughput. And of course because it's based on Redis you have full cap triangle at your disposal. Sorry, cap anybody doesn't know what cap is? Okay, great. Sorry, yes. Bruth's theorem it's the triangle between consistency availability and partition tolerance. Essentially from asset right up to maybe coherence something like this. So essentially asset doesn't, of course asset should ring a bell. So this asset is your typical SQL based use case as in you have a transaction that either fails or succeeds, nothing in between. Many as it turns out many applications do not have to use that strong consistency notion. So that's the reason why you especially see this in the no SQL space that more and more applications move away from this strong asset compliance. And Redis with the different types of persistence with the in-memory focus and thank you. And the other benefits of course allows you to move in that triangle based on your use case pretty freely. So this is basically the advantage of this no SQL approach. The outlook is we're gonna integrate this and you see this when you take a look at the code base on GitHub already. That's a module called Redis search which essentially is a full text search engine also based on Redis of course that allows you to have the functionality of a real time index in memory at your disposal. Many people use it as far as I've seen basically to implement something called like an in-memory search engine because that's what it is for. So you take a document, you let it index, you let Redis trust index the whole thing and then you can basically search for your index terms or for your search terms on that index. So the idea is basically to combine Redis search with Redis JSON which up to the time only supported JSON as a query language or JSON path as a query language. And of course, other functionality improvements regard extending the functionality currently implemented in Redis JSON 2 plus of course API extensions. So yielding at the end of the day at a document DB that is fully indexable in real time. So that's the overall idea moving forward. Before we come to the questions, couple of links on Redis.io and find the full Redis documentation. No need to tell you that because you know this already. RedisJSON.io reflects the reference documentation that is out there for Redis JSON and Redis JSON 2. By the way, the API is the same. So whether you are talking to Redis JSON or Redis JSON 2, it doesn't matter and it will always be the same. So if we extend Redis JSON 2, these changes will also be reflected in the registration for the time being. You will have the code base on GitHub. There's of course the code base for Redis from Salvatore, maintained by Salvatore, San Felipe on GitHub as well at his handle called Antirese. You'll find the YCSB cloud server benchmark also on GitHub. As soon as I get around to it, I'm going to issue a pull request for the Redis JSON interface layer. So if Brian accepts this, you'll have a native YCSB integration out of the box on GitHub. And of course, there's also something called the Redis Labs University, just to sentence us on that because this is not a commercial presentation. It's essentially an online university where you can get to know more about Redis, both from a developer as well as administrative perspective, it's free of charge, simply register, create an account, enroll in a course, pass the exams, pass the homework, and you get a certificate. Any questions before we close it off? Yes. Can you make a module so you can move part of your application in the server? Yes. Yes, of course, thank you. The question was can you nest modules essentially as in can you move business logic onto the server side? Yes, you can. There's something called Lua, which is a scripting extension for Redis, but there's also an upcoming module, it's in technical preview at the moment, it's called Redis Gears. Redis Gears allows you to take your, at the moment it's only Python, allow you to take your business logic code in Python and move it off to the server. So your Python script is then executed as part of a Python implementation, running on the server. You can also do this with module functionality, although I haven't seen this yet. I was specifically thinking about Rust. Rust. Question is it hasn't been done that. From a functional perspective, I don't see any reason why, as long as you stick to the module create, goes without saying, because essentially this is what you use when you access the module SDK, but I don't see any reason why you couldn't do this. But as I said, the POC still remains to be done. Any other questions? You had a question, right? How much prediction ready is the Redis JSON2? Is it just functionally complete? Is it already optimized? Is it wearing the life cycle? Yes, thank you. The question was, how ready is Redis JSON2? We, as in Redis Labs released this last year, the commits have been mostly bug fixes. So I know quite a few projects who use this already in production, so the answer is it's ready for prime time now. Okay. Needless to say, PR is also accepted, I think. So if you have further improvements, let us know. Other questions before we close it off? Yes. The question was, is the performance equal between creating a new, JSON document or just updating one? Essentially, the idea is if you are updating one, you basically have to modify the contents that of course includes a query in terms of where you wanna stick in the content. So I don't have exact performance benchmarks, but based on the already kind of native performance implementation, I wouldn't expect that overhead to be really significant. But as I said, having not done the performance benchmark for this particular use case, let's see, another question, another question. Okay, in that case, I would like to thank you for your time and enjoy the code.