 Just a quick reminder guys, if you've got your mobile phones on, just switch them off and we'll get underway. Cool. Okay, well about six or so months ago I started using Redis, which is one of the newfangled NoSQL data stores. Why is, probably because I'm a web developer and we like shiny new things, but I did have a sort of use case for it in one of our production apps, so the use case is a bit boring. I'll just talk about something a little bit different, but essentially it's been in production for over six months on a smaller size website, there's no problem so far. So I'm going to try and keep it quick and simple if I can, but as I say, something completely different at the moment, my top five movies of all time. Now you may be wondering why I'm inflicting my rather poor choices of movies on you, and that's primarily because I thought, what can I build to demo Redis? Well, URL shortness, everyone's got one of those, Twitter clones, yeah, heaps of them around, everyone loves them as a hello world app. But how about a rating service for your DVDs? And Googling around couldn't find anyone who'd done that so far with Redis or many of the other NoSQL databases, so I thought, let's give it a shot. Now, first thing you want to do is, of course, log into your web app. I'm a web developer, so that's what I'm going to do it as a web app. It's not going to be a native app. That's roughly what the code would look like, because you don't want to have someone have another set of credentials to go to your website. People hate that now. So OAuth is the way to go, and you have to deal with all the providers. So that's kind of the minimum amount of code using a library. But it's got very little to do with Redis, so rather than that, I'm just going to cut all my code examples down to the bare basics of what you need to do with Redis. Now, I'm using server-side JavaScript, because that's pretty much what I do. So I saw a few blank faces as to what the whole language is that. So it's basically, if anyone's done any browser-side JavaScript, it's the same kind of thing. It just runs on the server the way PHP or Python or any other language does. The framework I use is called Ringo.js, which is actually, it runs inside a JVM, it's just another Java language. But I'm hoping that it's pretty much C, ish syntax, so you'd be able to follow along. I'll also cover off using Redis on the command line later, because pretty much everything you can do, you can do from the command line client when you're developing and debugging. So what do we just do? Basically, those two lines there use the get and set commands in Redis. Does anyone use things like Memcache? Anyone familiar with that? No? Well, one, two people. Okay, so essentially Redis is a key store with benefits. You've got the standard get and set. Some of my analogies might not be so crash hot. It's got the basic get and set commands, because that's really what key stores are about. You've got a key, a simple string for a key, and a simple string value for the value. So basically what I'm doing is I'm grabbing the key for the particular user who's logging in. Just if that syntax looks a bit funny, all I'm doing is I'm just using, rather than doing string concatenation, I'm just using the join function that you get in JavaScript to concatenate an array of strings together. And likewise, in the set, I've got the same kind of thing. So I'm looking up based on the user's email, some value that I've stored in there, and I'm also setting a session key for them when they log in. Now, in Redis, it's just convention, you can name your keys whatever you like, but a lot of people seem to use this format of value, colon, value, colon, value, colon for the key names. And in this case, I've got basically the type. So user, the field name, email, and the value that uses email address as the key to look up whatever piece of data I'm stashing in their users under that key for that user. No, I don't bother checking for key existence or anything like that. With Redis, basically, you write to a key. It'll write it for you if it doesn't exist. It'll create it for you if it doesn't exist. Now, first thing we want to do is get a movie to rate. So handy thing to do is grab one of your DVDs from the user's existing library. Because I'm presupposing a few things that are in the web app, for instance, the users already got a list of the DVDs in their library that they've put in there previously. Now, the first thing this illustrates is that Redis has more complex data structures than some of the other key value apps out there. So in this case, it's, and this is great because the previous talk, we've talked all about set theory, so I'm going to be able to cut down on that a bit, it's got a data structure called a set. Basically, a collection of unique values can be anything. Just basically a collection of more keys with, sorry, more values that are just in one key. So a handy command that you can do on sets is pull out a random member out of set. Now, I thought I'd demo this because I came across a post on the net where Simon Wilson was writing about using Redis, and he said he had a whole bunch of problems doing a really heavily loaded web app with, I think it was my scale, it could have been a post-grader, I can't quite remember, but apparently a lot of the load was the way he had to pull out a random piece of data out of a very large table set, out of a large table set. I don't know if that's true. I would have thought there'd be a command that would. That's right. Is it OK to be able to do this really bad at the end? No, no, evil, evil, especially, you shouldn't do that on the web pages. So one thing I also want to note is that in this example, I'm just pulling out an ID number for the DVD. I'm storing the actual data for the DVD, all the fields, like title and so forth, in a separate key. So it's still, it's not SQL normalization, but you still don't have to put everything into the one blob. Now, getting the data out of, getting that extra data, all the things that you have, so for instance, a customer name, address, or name, phone number, whatever, in this case, all the attributes of the DVD, they're stored under another key. So the old school way of doing it, well, not necessarily old school, but one of the simple ways of doing it, especially if you're a web app developer, is you use whatever serialization tool you have in your language to just dump all the data into a string that you stash into the one key. So this is what this code does, is it just grabs whatever the string was at the key dvd colon id, and it's in JSON format, so it basically parses it and converts it into a object that I can use in JavaScript. I should note that even though I said simple string keys are just strings, they're actually binary safe strings in Reader. So it can be any set of bytes that you want to stick in there, no worries about putting null characters and so forth in there. And as of the recent versions, the keys are binary safe as well, so if you want to use a bizarre naming scheme that might have nulls in your keys, it should be fine. And yeah, so I just talked about that common approach would be to, most languages would have, for instance, a way of converting your data into a string, serializing out into a string like JSON format. Now there's a big disadvantage with this approach is that you have to read in the whole bloody object to get at one field. So if you've got, say, 500k of data for a particular object, it's a bit of an extreme example, but you might do if you're storing, for instance, serialized images or other binary data, and you want to just get one of those fields, you're going to have to read in the whole thing, deserialize it just to get out the one piece of data that you want out of all the fields for that object. So an alternative approach is to actually use separate key for each field's object data. So what I mean by that is that you have a separate key for each field of an object. So for instance, if you want to store the user1's email, you have user, colon1, colonemail. You want to store their first name, user, colon1, colon first name. So you have a whole bunch of keys represent and values for all the fields of an object. And you tie them together only by the naming scheme. Redis doesn't care what you've called your key, so it's simply up to your naming convention to keep things sane and tie all them together. Now one of the big downsides to this is poor memory utilization. Why this is so is because it's a fairly simple way of getting Redis to store your data. And on the next slide, I'll talk about what's the better way of doing things. So the new way in the newer versions, I think 2.0 onwards, is to use a new data type that they introduced called hashes. And they're pretty much what the name suggests. So essentially what the code does is it's using these two. It's grabbing a hash key. And hash keys are just, again, an arbitrary collection of key values, but under one key name. So you've got operations, for instance, to get all the keys out of a hash, all the values out of a hash. And I just wrote a little generic piece of code that basically iterated. If you know specifically what keys you're putting in, you know your kind of data schema, you'd write specific code. This is kind of generic code that you might put into your data layer. Sure. Yes, yes. And I have simplified this code. I will get to exactly that point in a few slides. Very well anticipated. Exactly, yes. Because one thing I haven't mentioned so far is that Redis is basically just like MySQL, it's a server running on a port. Any number of clients can connect simultaneously. So the reason why hashes are better than you think, why would that be any better than the previous method I showed you, which was just storing essentially separate key values for each one, for each field? Sorry. And that's because apparently the main developer of Redis has done a fancy trick, which I admit I'm not really smart enough to understand, in terms of when you have small keys together in a hash, he actually does some compression tricks to greatly optimize the amount of memory that it takes to store them. I can point you at the end of the talk to his nice blog post that describes all about it. And I'm sure people who are a bit more into the data storage or a bit cloyer than I am about that kind of thing might find it interesting. So moving right along, we need to, now that we've got a DVD that you want to store the rating for, you want to store it for the user. But you also want to not have to calculate all through. Like, OK, cool, I've got two users in my system, PCK, to just calculate the average rating for the DVDs from those two users. If you've got 10,000, it's still pretty easy. If you're talking like an app that you've put up on Facebook and you've got like a million people using it, you probably don't want to be calculating on the fly that average ratings that everyone's given on a particular title. So what you can do is pre-calculate them. So as you go along and you get new ratings, you're basically updating the scores, the ratings for each DVD. So save the first line, oh, sorry, second line. We add a rate, save the rating for the user, and then we add a rating for the average for that DVD title. This shows you that, like you might have seen, that we've got the commands being Z add, Z increment by. That's, the Z indicates that it's working on another data type in Redis, which is a sorted set. So it's a set, but for each value in the set, you've got a numeric value associated with it as well. So it's really built for these kind of applications, because when you want to count things, when you want to keep ratings or measurements of things, it's really handy to be able to do that. So Z add just adds a new member to the sorted set with a particular score or numeric value against it. And what happens is every time you add it, it will update the set to keep it in order, in sorted order. Then the Z increment by basically lets you automatically increment the value by the default of one or any other number, the score of a particular item in the set. So that's just covering what I just talked about. Now, you might be asking, what about the two commands I didn't talk about? Redis, multi, and exec. And that comes back to your question, transactions. So yep, it's all cool if you've, like Aaron said, one user. You're testing your web app. Oh, beautiful. But then you start doing load testing or even, God forbid, you have real testers testing at the same time, doing things simultaneously. And most web apps are either multi-threaded or they're running in multiple processes or clustered in multiple machines. So there's parallel access to the data. Now, Redis does give you these two commands, multi and exec, for helping you with transactions. They are not your SQL type of transactions. It's not what you're used to if you're using them with a SQL database. Basically, these two commands, all they guarantee is that between the multi and the exec, the commands that you add, they go into a queue. So what happens is multi says, I'm starting a command queue. You add in your commands, then the exec says, execute them atomically one after the other in the order I said, and no other client will have their commands executed in between. So that's a very helpful guarantee. It's not exactly the same as full-blown transactions. But in a lot of cases, it's exactly what you need. You just, like you said, you don't want someone changing. If you've got a few commands that you need to run sequentially, you don't want someone changing the data in between one of those. You want them to happen atomically. So that's what that it will guarantee. And no rollback, though. So in your application, you'll get that all those commands happen at once. But if something goes wrong in the application layer, you get something wrong. You want to do a rollback. That's it. There's no going back. That's one of the reasons why I haven't talked about readus' speed. But that's why a lot of people use it, is it's not just my talk about it. It's trying to be fast. It is very fast. Benchmarks on your average laptop type hardware usually get around 100,000 operations per second for the gets and sets. It's slower for some of the more complex operations. But that's really very fast if you compare it with a lot of other products out there. The reason I used it was also that I had a really smallish type of problem. But I knew that if, for some reason, the app I was working on suddenly did get a load spike, it would be able to scale reasonably well. Because I'd be able to, well, actually, I'll talk about that in a little bit. But I'd basically be able to fairly easily scale it up to not huge loads, but any spike that our website was ever likely to be able to cope with in terms of its network connection. Yes? So if you were to put the knowledge that you've made more change, or do you get a result of it right after that, basically you don't see those changes in terms of your call attack? I think I know what you're asking. Yeah? Basically, what you're asking is, if you read one key and you get an action to read another key? Yes. I think I might have got a bit lost into what you're asking is, if you read one key and then you're going to write into another key and someone else's reading value from the first key? Yes. What it's guaranteeing is that in between those, basically the commands that you've queued up will be done atomically with no other client access in between. But the consistency in terms of other clients' reading out, the reads could be inconsistent, if you're depending on checking. So if you're talking about, like, say, the classic banking application, you're moving some money out of account A into account B, a separate client reading account A might see the balance in both because the reads, it doesn't lock, it's not a read lock, basically, I guess is what I'm trying to say. Sure. I don't know if I'm answering it yet, yeah. Yep. And then you do a input line, and you do the rating as an output two. Sure. Yes. Yes. You get one, because the exec is actually what executes commands. So sorry, I didn't quite follow. Yes. So that's exactly right. What all that happens is that all the multi does is it's literally queuing, it's buffering up the commands, and nothing, the database, the readers, never sees any of those commands, doesn't execute any of those commands until multi. And there is actually a command to basically discard a queue. So if you're in the middle of buffering up commands and something happens, something goes wrong or whatever, in your application logic, you can just disregard that. Exactly. Yes. Is that a limitation for these transactions, or what? No, no, it's not. So if you had ones which were modified, like if you modified one value, and then you say did, you would say you modified a score in the set, and then you asked for the sort of version of that set, then you changed something else again, then you asked for it again. The database, it serializes them. So it ensures that the order is correct. So in fact, this is getting a bit into the details of how it's implemented, but from what I understand is everything, it serializes all the clients. So it's kind of along the lines of, I think, it's event-driven. So it's basically an event-driven main loop from what I understand. Like I don't really know too much about the internals, but that's how it works. So that's how it can do this in a really straightforward manner in terms of its implementation. Kind of. I wasn't going to cover it, but there is a way, basically, of doing an optimistic transaction where you're saying, instead of asking for this to be guaranteed, it's more of an improve a performance thing. You can actually ask it to do this optimistically, letting other clients try and write to the keys that you're using. And if another client writes into one of the keys, it will abort your set of operations. Does that answer your question? But you can't explicitly set a condition, I think is what you're asking. That's fine. Yeah. Okay. It might be. To be honest, I haven't used all its functionality or a lot. Sorry? It's in set. You could probably do it because there is an explicit test and set, atomic test and set operation. So set if... In the end, it's one of these multi... Yes. So there's an explicit set if exists or set non-exists. So set a key if this key doesn't exist or if this key exists. So it might be possible. So I'll keep tracking along because I don't want to run out of time. So doing sorting, because the sorted sets are pre-sorted, doing things like league tables, like showing ratings or high scores for a game, very straightforward because it's all there for you. The commands make it pretty trivial because they do exactly what you're... Or almost exactly what you're asking for. The thing is that with this, the command Z range by score. And with scores, you can actually ask it to give back not just the values but also the scores that go along with them. The sorting is actually the other way around to what you'd normally want. So it's lowest to highest rather than the other way around, which is normally what you want if you want a top score or the top rating. The latest version, the latest development version in Git actually adds a command to do it in that opposite order. I'm not sure why they never did it like that to begin with. And there's also, if you want a particular rating of one of the elements or the score of one of the elements in the sorted set, you just pull it out with a command. You just say the key and the member of the set that you want the score for. There's also, as was covered in the previous talk, set operations. So because they're sets, they support things like there's an intersection command for two or more sets, exactly the Venn diagrams that we saw before, and also union. There's no SQL, there's no pretense to giving you a full query language, but these commands are helpful for some types of query type functionality that you might need. The most flexible and powerful bit of that is the sort command. So basically what it has is the ability to let you sort sets. There's also lists, which I haven't covered just because I didn't have enough time, which are essentially just an array which there's no guarantee of unique values unlike a set which does guarantee that all the values have to be unique. So all the members have to be unique. And I'll just miss out alpha. And yeah, you can change the sort order, you can ask for it to be alpha sorted or not, that kind of thing. Now getting to the scaling bit, the nice thing is if you've got a small problem, we've read us, it's really very handy, very easy to do. It's not much difference for developer anyway, from my point of view, than just stashing things in memory, in arrays and hashes within my programming language. So one way to think of it is it's just really like giving network access to your app's memory and being able to share it around in an easy way without having to write all that boilerplate yourself. The nice thing is it does have a really simple replication mechanism where all you need to do is basically, if you've got one instance running, you fire up another instance in its config file, you just say slave of IP address port where the first instance is running and it will just magically, or not magically, it'll just stream all the data from the first to the second one and then stay in sync in an eventually consistent manner depending on how fast it can basically send the changes from one to the other across your network. No, single master. So you can only have one master for each slave and the other restriction, it's not super deeper, it's just okay. The other restriction is that it's read-only. So the slaves read-only, there's no two-way communication. The slaves are basically read-only copies. Similar, I think, to why a lot of people set up MySQL, single master, multiple saves. But the nice thing is, is if, say, your master falls over, you could have just basically an automated script that changes the config files around and any of the slaves become the master and the other slaves just start looking at them. And I'm pretty sure, as long as your application knows that the master's fallen over in your application logic, as long as you've got test code to detect when you need to file over, you shouldn't lose any operations in that switchover. Like before you, if you switch once, if you've got a slave, you make it a master and then it takes a few seconds for the other slaves to start looking at it as a new master, they won't lose any updates in the process. They are working on proper clustering support. Apparently it's coming soon as in the next major version in July or something. So for the moment, you need to file over and stuff in your application layer or most of the libraries provide some kind of support because one thing I haven't mentioned is, I don't know if this is much use to the people here, but for application developers, pretty much every language, because it talks a simple protocol, text-based protocol across the socket, there's libraries for pretty much every language that people use to write apps, web apps. So PHP, Python, Ruby, Java, JavaScript. One thing I didn't mention was that there's actually, for a lot of languages, it's sometimes actually hard to pick which library to use because several people, because it's so easy for people to write libraries, often several people have written their own library at roughly the same time, so several have shown up. So you sometimes want to pick the best one. Using it is really as simple as I've seen in terms of using a database, apart from typing in, you know, gum and store MySQL. I recommend grabbing the latest version of the Reader's website, rather than what comes in your distro because they've been moving very quickly over the last year and a half or so, so all the distro packages are rather out of date, but there's not even a configure step that you just make, and then you can, even on my little teeny weeny laptop, it's about 10 seconds, and then you can just start the server in a new terminal, start the client, and the command line client pretty much, all those commands that I showed you in caps, that's the ones that you can run on the command line. Question often, people ask me when I try and give them a whole spew of why I started using it was, to use Reader's is why I started using it. I came across a nice quote from a guy who's been using it for a while on the web, saying it's basically when you want something that can store data really fast, but doesn't have to be 100% consistent for your app, that's a good use case for Reader's. One thing I haven't mentioned so far is that there is one really big limitation if you're talking about big data sets is that it's, everything has to fit into memory. Up to version two, it had to, all the data had to basically fit in memory, and what Reader's would do is, it had two different strategies of writing that out to disk, so it's not like MemcashD where the server, the demon dies, everything in memory's gone. It does write everything back to disk, and you can actually choose how quickly it does that as a performance trade-off between durability and speed, but everything does have to fit into memory. The improvement they've gotten in the new version two, Stream of Reader's, is that it's only keys now that need to do it. They actually, the main developer actually implemented his own virtual memory implementation, because he didn't think that the operating systems one quite suited the use case, so he's written his own VM to let you basically only have keys be required to be in memory and then the values can sit on disk, and of course there's the performance trade-off is essentially swapping out to disk. There's lots more features I haven't had a chance to cover, so the website has, they've just recently redone it, so it's got a nice command reference where you can find listings of all the commands, and that's it, if there's any other questions. Thank you.