 Hi, I'm Derek. I have no introductionary slides, so I'll just jump straight into the subject, which is a subject of storing non-scalar data. So non-scalar data is more complicated things than just your standard strings, numbers, and booleans. So they tend to be arrays, your nested objects, your arrays of objects. It tends to be a more natural way how you would store application objects in a database in different ways. And traditionally people with relational databases, you get to do that with different types of patterns. If any of you have seen AEV or something like that, it's a way of storing a single object with lots of not necessarily defined properties in a relational database. But in the last years, decades maybe even, more and more different types of databases showed up, known as call databases, as they're called, in which many different flavors exist, with the intention of storing data in a more natural way as it would show up for applications. So that is the non-scalar data that I'm having here. So a few examples here. For example, if any of you've ever written a block application, I certainly have a long time ago, it's often a common thing that you tag a specific post with multiple tags, right? So in this case, the tags that I picked here from OpenStreetMap data. OpenStreetMap data is free data. Each object you see on the map can be tagged with many different tags. But storing this kind of information in a relational database, what you would often do is, well, you have to create an extra table with this to store the tags in that you can then link to your articles. So you actually storing an array of tags separately from where you usually store your data, which is more complicated in a way of doing it. And then separately, there's examples like you have product catalogs. If you have a product catalog, I always take Amazon's example here. I'm pretty sure not many of you operate it with the skill of Amazon here. But of course, those product catalogs, you can store both books as well as, in my example, Whiskey's here at Amazon sells Whiskey's in the UK. Sorry, handy. And of course, all these different products will have different properties. Like a whiskey doesn't have a number of pages, makes no sense, right? And similarly, a book doesn't have an alcohol percentage, for example, or a distillery attached to that necessarily, right? So you have different properties, but it's still part of something that is a product. And traditionally, relational database don't handle storing that data particularly very well. Or unless, yeah, it gets very complicated sometimes. All right, so what is this presentation not about? It's also important. I'm not going to talk about how you store this data in a scalable way. I'm not going to talk about high availability. I'm not going to go into an in-depth technology covering of all the things I'm mentioning here because I'll be talking for days. And I only have 57 minutes left. And I'm also not going to talk about benchmarks. So benchmark is always a bit of a tricky point because you can't really benchmark the different-nosed goal solutions against each other because they are all there to do something very different. So it makes no sense to compare these things together. So I won't be talking about that. All right, so the things that we're going to have a look at in the next 50 minutes or so is we're going to look at the different database types for storing non-scaled data. Go a little bit into what sort of data you can store in these different types and what they're good for. We're then going to have a look at querying and manipulating data indexes a little bit, and then we have a conclusion at the end, and then some Q&A as well. All right, so different database types. How many database types do you think there are different database types? You've heard of relational databases, right? Key value stores? There's a document data stores, there's XML data stores, there's so many of them, I don't know, right? There's an article on Wikipedia, I believe, that there's 57 different types. Now, to be honest, I think they went a little bit too far with that, and I tend to want to group them together in about four different categories, of which I will be talking about three of them and not talking about relational databases in this presentation. So the simple one is key value stores. Key value stores, what are really good at, they're usually really small, really fast, and all the operations that you can do with these are on the key only, but minor variations, but usually on key only. But what they do allow for often is that the values that you store with these keys, they don't have to be simple numbers or strings or booleans or something like that. They allow you to store more rich data as a single value for a specific key. An example that we'll be looking at here is, for example, Redis. Memcache is another variant of this group of databases. Then, so diving a little bit deeper into Redis, so in Redis, the keys are binary safe strings. The values in Redis are also binary safe strings, but it's strings between quotes because it does do clever things with numbers. Numbers are also stored as strings, but you can also do calculations with them. So they're strings, strings. But beyond these, it also allows you to store lists and sets and hashes, sorted sets and a whole bunch of other things that are sometimes a lot more natural to how an application deals with the data. So the interaction with this happens through Redis CLI or PRIDIS PRIDIS, which is a composer-installable library that helps you do these things or you can talk to it with Telnet if you really wanted to. The examples that I've been showing here are with Redis CLI, which is a very simple way of doing it. If you're doing this through PHP application, you'd be using PRIDIS slash PRIDIS. And I've spoken about the normal string. So the way how the Redis protocol works is kind of very, very simple. It's a command line interface that you talk to over network connection, except that there's some binary information in here that you can't see because I'm using this through the Redis CLI tool. So the data that comes back is a little bit more rich than just strings, which are the commands that you give it to. So all the commands in Redis start often with a single letter or multiple letters to indicate what type of data we are operating on. So the S in S add stands for set. And a set of data is a collection of data that belongs to a specific key. So what I'm doing in this first example here is that I'm using the key whiskey colon, glen fitter, colon tag. That is one single key because all operations you do are on a single key. You need to do something clever about being able to store properties of objects. So what the key here basically says is that our data type is a whiskey. Our unique identifier is glen fitter dash 12. And the property that we're storing with us are the tags. So the set name is whiskey colon glen fitter 12 colon tags. And then the values that we're adding to the set is what we're doing as a second argument here. So in this case, we're tagging that this whiskey is fruity that has vanilla in it. And if you add another element to the set that's already in there, it will tell you by using the zero as return, that this element is already there. So a set doesn't allow you to store the same value multiple times. Because that won't be a set anymore. Or you can add things to it. If you want to have spaces in your names, then you need to also double quotes from PHP itself. You interact slightly different whether they have nice names for all the operators. I don't have an example of that. So after adding elements to the sets, you can of course do operations again system trying to figure out whether a tag or a word string is a member of the set. So if I use the s is member operator, and I'm basically I'm trying here is the whiskey tag with the word PT. And as you can see in the first bit here, we haven't added that sort of return value of this is going to be zero. As members return to all the members of the set. So that's all the ones we have set in the first example. So these are rather sets. It's a way of storing this information. And you can then then operations against this data type and those operations are atomic operations. You don't have to pull the data down. Compared in your PHP application or all application, modify them and store it back. All those operations are atomic operations, which is important to know because that is how you interact with almost all of the other soil database. You need to figure out the best optimal operations you can do on your data type in order to not run into race conditions and to make optimal storage, make use of the capabilities of the databases. So rather sets are very good way of quickly doing this. Whereas usually runs into memory in memory, although you can back it up to disk as well. And there's ways of doing that. But it's a very quick way of doing this kind of operations. It's really, really fast to do so. Similar things. There's also a hash data type. They work like associative arrays in PHP. So H, the first letter here stands now for hash. And when you do a hash set, again, we use the key. In this case, it's the props key of the Ben-Nevis whiskey. And we say that the distillery associated with this is called Ben-Nevis. And then with HM set, M is a letter that often comes after the first letter. So it stands for multi. Means we can set multiple values in one go. So in this case, we're setting the properties region Scotland Highlands as well as age 19 at the same time. Again, one operation doing multiple things. And then retrieving is also possible by using HGADL. If you ask for the properties, and then what you get on the Redis command line script is you get key value, key value, all the different values in your result, which is kind of ugly. P Redis handles that a lot better than you get in associative array back. So that's better. Beyond that, you can also retrieve single elements of this hash by using HGADL. And you can see that even though we set the value as 19 as a number, it still comes back as a string 19. So the strings are a bit tricky here. All right. So that was a quick overview of Redis. It's a very simple way of dealing with very specific data types. But because it is all in memory, it is a really fast way of handling that information. So Redis is often used for either kind of caching information or doing on-the-fly calculations of quick things that are always available. And it doesn't particularly matter very much. Sorry. The intention is that you can always build back this data for more information. Sorry, give me one second. This always happens about five minutes in. Sorry. So a little bit further than, who's got some music going on here? So a little bit more feature-rich than what Redis or key value stores are, are the document data stores. And the document data stores are often feature parity with relational databases. Not always. Not in all specific cases. But it tends to be built for having a, to be able to run on distributed networks. Sorry, I really have to do this, because otherwise it's going to annoy me. Please sort this out. So some humor in there. All right. So the document data stores allow for, to store more rich information that you can interact with in many other things than just the key. So it is now possible to do operations on keys and values. You can do searchers on values and not just on the keys like you can do with key value stores. Often if you see examples of document stores, they all will show you the data in a JSON format, because the JSON format is a slightly richer data type. However, most document, most document stores will not actually store the information as JSON on this, because it's a text format that you need to parse all the time, which is really inefficient, right? So examples of document stores are, for example, MongoDB, or CouchDB, or Lasik Search, and they all fit you much work in the same sort of way. Or at least the way how you can, how your data modeling for these databases works. Of course, things like MongoDB and CouchDB, they are, they are like, I call it, they're databases, right? They're storing data. Lasik Search has a different aspect to it, where it is really good as a full text search index. You can also use it as a database sort of, but it isn't meant for it. I mean, unless you search, your search indexes aren't always necessarily up to date immediately, which is something you expect from a database, right? I mean, if you write data and you read data, it should be there immediately. From the same server, if you talk to it. The difference between MongoDB and CouchDB are mostly around how you write data to it. MongoDB being single master, CouchDB being multi-master. But by having these differences, you get other points of contention in other different ways on how you store data. Or rather, how you distribute this data among multiple nodes. So, although they're all a little bit similar, they're also a little bit different. Again, I can't go into detail between all of it, because I'll be talking for days. So I have to focus on one database that I'm talking about MongoDB, mostly because I know it's the best thing. All right, so the first line is a bit controversial at the moment, so I won't talk to that about too much. But MongoDB is like any other things I've been talking about. It's something you can use for free. It doesn't cost you any money. So the documents are stored as Beeson on this. So Beeson is a binary way of storing JSON information, except that it has a few additional data types. So it's like a date timestamp, for example, or there's binary data. And those data types are something that normally JSON doesn't have. So there's a few extra data types thrown in with that. So the interaction from within, say PHP or any other language you can think of, what you do is you build data structures inside the application, just like there were normal variables. And then you can transfer those through a client through PHP extension, like you have the MySQL extension, there's a MongoDB extension, or an Elastic Search extension, or things like that. It's the way they will take this information, put it in a different format over the network, and then the database stores it in its own format. So the documents that you interact with from your application will be as natural PHP objects. You can just say, sort this PHP object for me, and it'll get sorted for you. I mean, there's always going to be some gotchas in here, of course. But what it does mean is that, of course, in a PHP, if you have an object, a single property can be an array, right? And in an array, traditionally in a relational database, you can't really store an array of values for a specific field. Well, you can do that if you serialize it, PHP serialize it. I'm sure some of you have done in the past. I know I have. So, yeah, those documents are richer because they can store these arrays. They can also store arrays of objects, or nested objects, and so on and so on. MongoDB interact with through a PHP extension called MongoDB, and to make things easier to use, you also use an optional compose installable library called MongoDB slash MongoDB. All right, so some examples here. As I said, most of the examples that you see in online documentation will be with JSON documents. The things that I'm talking about here, how those documents look like, will also apply for CouchDB and Elasticsearch. Virtually no differences in whatsoever. So what are the things in here? What are the important things in there? Well, first of all, you have a unique ID. A unique ID is basically your primary key. If you don't set one yourself, it will automatically generate a complicated number. We usually advise to use, like, if you have a unique key already, use it yourself. Don't come up with something new that you don't have to invent. So, this underscore ID field is your primary key. But then there are values that have extra things in there. So the word field has four words in it and they are stored as an array. They're not going to be serialized or anything like that. They really just sort as an array. And similarly, the field badges is an array and each element in the array is an object with key value pairs again. This is also that you can just store and you don't have to worry about this and you can even query this directly through the query languages. So as an example here, just to look at how simple this insert data is, first thing we do is we require the library, the compose installed library, because it provides a nice interface to it. We define a variable called whiskies, which we define as we make a connection with MongoDB client, use the DRAM IO database and the whisky collection. Collection in MongoDB is your table. It's equivalent to a table. So the whiskies variable here is now the table or collection object that we can do operations on. Like insert, obey, delete, your quad operation, we can think of some documents. The basic element that you store are often either represented in associative arrays like I've done here, or there are PHP objects and the driver allows you to map different data types depending on which field to all the data types when you're hydrating or dehydrating. Again, can't really go in there because I'm not going to have the time for that. But there's a lot more things possible than just looking these at normal PHP objects or associative arrays. So but what I'm doing here is that with insert one I'm inserting a single document and with insert many, you insert multiple documents. It isn't that complicated to make use of this. All right, so once many no-skill databases started adding support for these richer data types that I've shown you but either whereas or the document stores with MongoDB. Many relational databases also started adding data types to its functionality to store that kind of extra data because they must have seen, well, lots of people find this useful. That's how this functionality to relational databases is on. Because I mean, how many of you have had formal relational database theory in uni at some point? Must be some, yeah. Right. And that is really nice and it's allowed you to make the most optimal database scheme to store in your database. But it isn't always the most optimal way of interacting with your data from your application. Right. I mean, I'm pretty sure many of you will have taken shortcuts to make data easy to query or easy to store from application or to avoid having 20-way joints. All right. I mean, they happen. So these additional data types that I've added are good for storing this unstructured data that we have seen with both, say, Redis as well as MongoDB. And examples of databases that have done so are One MySQL and PostgreSQL. If you go even back, traditionally, IBM DB2 would allow you to store similar things in its XML database. Right. I mean, XML database is a whole different can of worms again, but I mean XMLs, of course, is a richer data format than just your standard numbers and strings. So the examples of this group are MySQL and PostgreSQL. I'll be talking mostly about PostgreSQL here because, in my opinion, it has a slightly better implementation or it allows a little bit more functionality, which makes it a better target for comparing it. All right. So just like Redis, PostgreSQL has actually different types of data types to store different types of data. Primarily, there's HStore, which is your associative array kind of variant. There's JSON and JSON-B. So you can see that I haven't highlighted JSON because nobody should ever use this anymore, but it was the first one that added to it. The JSON-B variant, the B selling for binary, is an optimized version of storing this information. Very similar to the B send for MongoDB, for example. And PostgreSQL has a much richer querying index support than MySQL for this data type. I also like that PostgreSQL has also an elephant as a logo, so I prefer to use that. So how does an HStore look like? Well, basically what you do is you store a JSON document into a single data type. So in order to make use of this, you need to activate the HStore extension, which I have to omit. Some of these slides are written maybe half a year ago. This technology in PostgreSQL moves very fast, so I might be slightly outdated here. When I looked at this, you still had to activate the HStore extension. It is quite possible that you don't have to do this anymore now. So with this extension available, we then create a table. We create a table of users with a single field called HStore and the type of this data type is HStore. This is not something that most people do because at least you'd have a unique primary key and things like that, right? But I tried to make this as the example as simple as possible, so I left all these things out. And in the last block here is I'm inserting into users a disassociative array. But how am I inserting this? As you can see, I'm storing this as a string. Now I know there's a little bit hard to see somewhere, but there are single, single quotes here. The single quotes here to show that you store it as a string because interactions with a relational database are always going to be through strings. But probably the exception of sending values through prepared queries where they will have the possibility of not using strings. But most of the interactions, especially from PHP, if you don't do anything with it, any query that will come back from, say, MySQL, the data types will still be strings, right? I mean, PDO does that a little bit better, so there's other things to it, but traditionally you would only get a string back because that was a standard way of interacting through SQL. Now, interestingly, there are a few limitations to what you can do with hStore. So hStore, although you can see that we're storing it as a string and a number, it basically stores it as a string anyway. So you can't really do operations on these values. You can't, for example, increment the number by one for account if you view or log in an extra time. You can't just simply bump that number. But it will have things like strings and booleans and numbers in that. So what PostgreSQL wants to do with hStore is that this is some extra information that we can tag to things that we're already storing in the database. So as an example, I was talking a little bit about Open StreamMap and storing tags with it, right? Because in Open StreamMap the data model is like, well, everything has a type, which is either a note or a way or a relation. It has an ID, a number from one to four and a half billion or whatever the number is at the moment, and then it has tags that you can just cook up yourself. There is a standard set of tags that people agree with to using, because if you have a road, well, you need to know what type of road it is and what its name is, and people need to agree on what the names for these tags are, otherwise you can't make a nice map out of it, right? But as other people then added information for, say, cycling infrastructure on these roads, so they came up with a scheme to add additional tags to put an addition to this object, to this road, for this information, and nothing stops any other user from adding additional information to it with the own tag schemes that I come up with. Now, if you want to draw maps out of this, the way how this usually works is you do a query, and then with this information, you search for roads or you search for cycling infrastructure, you can then draw something. But because the amount of tags that a specific object can have is unbound and upfront undefined, how are you going to index all that information, right? I mean, you can't really create a column for every single tag that you come up with because you end up having hundreds of thousands of columns. Databases don't handle that. There's no database and other can handle that. But what you can do is, for the tags that are not necessarily agreed upon, you can still chuck them in an h-store because you can still query against this to do your map drawing without having to create a column for every single tag that you come up with. So an h-store is really good for handling this kind of data storage where you have a common set of information that everybody agrees on and then a whole bunch of other things that you don't. So the querying against an h-store is not going to be as fast as doing an index query against a normal field value combination, but it's still possible to do. And you can also, if you retrieve this, fill back in any application, then you can do other things with this, like render the information in this h-store without too much complications. So h-store is used in OpenStreetMap projects to do this kind of stuff. The other data type that people use with PostgreSQL is the JSON-B data type. And the JSON-B data type is very similar, how you can interact with that as well document data stores like MongoDB and Couch to be an elastic search gym. And it can be, you can create indexes on it, unlike MySQL's JSON type or PostgreSQL's JSON type. The indexes you can set on it are very different from what you would normally do on relational database data. Like you don't really set things up like unique indexes or an index on a specific field or compound field. You set them on a specific data type and there's different indexes. I'll get back to that in a moment. So JSON-B data type supports richer data types than the values compared to h-store because an h-store is just really key value. And the values can't be things like arrays or nested objects whereas with the JSON-B data type you can store that information. So again let's have a look at how that looks like. So instead of h-store we're using the JSON-B data type and in there we now insert a string which is a JSON document and it's formatted according exactly to JSON. Basically what I've done, I created this example by copying, pasting the information I had in my Mongo example and I just put insert into user's values quotation mark in front of it. That's the only thing I've changed here. And then with this data after stored PostgreSQL will parse this when you send it to the database stored in a binary format so that you can create indexes on it and do queries against all these different data types. All right so now we have stored data. It's probably useful to have a look at how we can query for this data because database is pretty useless if you can't get the data out. All right so with Redis we already looked a little bit at this how we can do this right. But you remember the example about the sets that you can use as is member to see whether a specific value is a member of a set or you can get all of them back by as members. Similarly for the hashes there's an h get all to get all the key value combinations and hm gets to get only specific hash keys out of that. But there's a few other data types that allow for more specific query types so L range and Z range allow you to retrieve a list of elements from a left position the starting position in your array to your right position the end position in the array. And as different variants there's a sorted version where all the values are sorted already for you so you don't have to do it yourself or insert order so that's the difference between the L and the Z here. So you can do optimize queries against these very specific data types. Looking at MongoDB, MongoDB has a much richer data query language. It is it it isn't SQL but it has nearly or it has functionalities on par with query functionality from what you used to in SQL. But it looks very different. But once you look at it it is actually that complicated. So this example is an example from the MongoDB shell so I'm not doing this through PHP. But if you want to transfer this to how PHP does this what you change is you add a dollar sign in front of DB and you change the dots to an error. And then of course you don't type JSON you use a PHP variable data structure. So it's not I mean the APIs are exactly the same it's just a different language. So what we're doing here in this query what we're finding is we're finding all the documents where region underscore slug equals Scotland isla and the rating is larger than three or greater than or equal to three. These operations with a dollar sign are query operators. These are a so because this is done as in javascript syntax or PHP syntax you don't build up a string to do queries. Like in a SQL query there are all strings right. I understand there's prepared queries where you can replace your question marks with values that are not being put in as a string before it gets sent to the database. They're different part of the protocol. But in general you always have to build strings to interact with it by writing a SQL query. So a slight benefit here is by doing it in the declarative way is that it tends to be a bit harder to get like SQL or no SQL injections because the values are just values in sample strings. So you don't have to think about escaping yourself and things like that. Having said that you can still get no SQL injections by doing queries that you shouldn't do so. All the requirements are for dealing with input data correctly still hold true for no SQL databases too. All right so the query here is refining all the documents where region is Sculpted Isla and the rating is larger than 3 or larger than greater than 3 and we only want the fields back that are called whiskey rating and H and pretty makes it looks nicely instead of picking out of one string. So we have this equality match. We have this match with a query operator and then we have the projection and the projection is you list basically all the fields that in SQL you'd select star but instead of the star you now have star again. Oh whiskey rating and H. Okay that's a good question because the projection because it is Java syntax you can't just have a single word and it's a hash right so you need to have a key and a value. Now the key here is the name of the field the value is either one or zero. If it's one that means included if it's zero it means excluded. So you can you can either include fields that you want or you can exclude some fields that you don't want. That's for the both different things. So if there's some water that'd be great. He's solved. Thank you very much. All right so if I rewrite this query in SQL you get a query like this right you get select whiskey rating H from check-ins where regions like equals stock Scotland Euler and then rating is larger equals to three. All right so an additional way of doing queries in MongoDB is the aggregation framework. And the aggregation framework is a set of operations that you define that are run in order on a collection or a table. So what aggregate does it we've defined two operations here this match and group. So the match stage sorry in the match stage what we're now matching is not a specific value but we're using a value prefix. So it allows you to use regular expressions which you anchor a bit of carrot to the start of the string. You need to do that otherwise it doesn't make use of an index in a similar way as if in SQL you use find oh sorry select blah blah from table where field like something. If you don't have if you have the percent at the start of that string it can also not make use of index because the ways how strings are indexed they usually you can only do that when they're anchored to the start of the string. In any case what I'm trying to find is all the whiskies where the region is in Scotland. All the countries also make whisky. So that's what we're doing this match and then the second step the group stage we then group all the all the things that we found after the first step we group them by region and then create a list of all the whiskies that we find by this region and put that in the whisky field. This looks kind of complicated if you want to know more about how the aggregation framework works come talk to me later I'm going to run out of time if I go in there too deep. So for each of the matched documents we're creating this bucket indexed by the region name and then we're adding the data to it. So the result of this query would be something like this. I've prettified this a little bit because the ID the region slug has gone but the ID value is now the name of the of the region itself. But you can see that the result of this is in an array of all the different whiskies. So you can do quite a bit richer queries with this that's you well you can't do an SQL right you can SQL you usually traditionally you can't return a value that is an array of values. Of course unless you use the json b data type in PostgreSQL. All right so queries with the json b data type in PostgreSQL or hstore in this case are by using specific extra SQL operators. I don't necessarily remember the names of these operators because they have funny names and yeah I just don't remember the names of them. So in this example we have a document with four fields we have ID location count unique in this admin to get the value for a key you see that we use this error operator and the error operator basically says give me the field is admin for the field age doc and age doc was the name that from the previous examples. So that will basically give me for every document in users the is admin flag and the only thing is output is pretty useless but this is an example here. So what you can also do is you can use normal SQL operators on these filters as well. So in the second example here is that we're going to show all the underscore d values for all the users where the field count unique is larger than 10. So basically what it says users that have logged in more than 10 times and then for those we want the underscore d values which is the unique identifier for a user. Now because of course upfront you never know which fields and values you have it is also possible to find out whether those fields actually exist in the data. So in this case the third example here where we do select age doc question mark age basically says if this field exists return return boolean through I think it actually does. So you can text so you can test for keys existing which is well you sometimes need to know that right. And then in the last example which is probably the most unstun that SQL that I've seen in a while you can use age doc the name of the field and then we can check by this M present greater than sign where the key value per location London UK notice the double quotes around the key and the value and the single quotes around the whole thing which we need to cast as an age store which is the colon colon age during the end basically what this does is it selects all the documents and users where there's a field in the age doc sorry where there is a key value combination in the age doc field that matches location London UK that is what it says this gets more complicated I think it doesn't look as nice in my opinion as how no school databases do this because their query language is made to do this kind of thing whereas SQL sort of has this bolted on and I'm also not entirely sure of the standardization process of these additional operators I know people are trying to standardize what all these things mean but if it's new technology and databases implement them in a slightly different way is kind of pairing from what I know both my school and post-grad school do implement the same data types so that's for age store if you look at Jason B it is similar you have both the arrow as well as the double arrow and the double arrow is a little bit better because instead of returning a string with the value unit it will actually pull the data out of the string and return it as a normal data type so most of them people use the dash arrow arrow instead of the dash arrow operator um and by doing this kind of queries it becomes important what kind of indexes you use now normally what you have on a index fields if you do normal create index you create a b3 index you get a b3 index on a on field and values in order to this kind of queries to make it go fast without having to scan the whole table you have to create an index on age dog email so post fiscal knows if there's a json b data type you can actually set an index on sub fields in there without you having to do anything about that so in this case you just like you normally would have an index on email you now have an index on the email field inside this json b data type all right so we again find the id for the email address which is my email address second example we're finding we're now using the is not null as called operator operator is that how you call this let's call him operator which is the same thing as you how you otherwise would do it if to check whether this value is null or not which you can also use to see whether the field actually exists because a non-existing field has a value of null in this case now then in the third example we're going to be more complicated and we're using this greater than operator to find out whether there's a word the word London whether that is part of the array associated with the words field okay this is a bit complicated so as you see the words field here has four values the query that we do here is a subset of this because it only matches words London which is a subset of the document and hence we'll then return it as a result if I would have used words Paris for example because it doesn't match it wouldn't return it additionally if I would have done words London comma UK it would also find it because it's still subsets of this document in order to to this kind of queries Postgres core recommends that you use a gin index and a gin stands for the G sorry because my genitive index something like that genitors inverted index basically is a index type that allows you to map multiple values to a single single document sorry single record which is a way of basically indexing all the key and values in this JSONB data type in one bucket so you can do queries against that in one go without having to do it on specific fields like we do for in the first example here where we look specifically at data against the email field in the age doc field I know I'm using the same word a lot over and over again that's confusing after all and so just in the last example what I want to show is that how would you update or add a field I already mentioned when we looked at Redis before that it is important to use atomic operators to do operations on the data what you don't want to do is retrieve the record in your application then add an extra field to the age doc type which in order to do that you basically have to rewrite a string not ideal way of doing that so what PostgreSQL allows you to do is actually update or add fields by using the colon not the pipe pipe operator which basically adds a field value combination country Europe that's not a country sorry or if you want to remove a filter you can use minus instead of pipe pipe okay I'm going to have to change that country to either continent or change the name of the country for Europe or something else all right all right so that's the interaction with with the different data types so a bit a bit more about indexes so indexes and Redis they're only going to be on your primary key if you want to do more if you want to have a compound index as I've shown you before you use it is a a preference to use the colon there's no there's no hard requirement for using the colon but this is what people tend to do but some of the data structures provide additional indexes right if you have a set for example even though your primary key is still the first one there's still sort of index on the value because it knows not to add multiple multiple the same values to the set so there's some indexing going in there and similarly if you have like a sorted list or something it's also just that kind of additional indexing to make the queries be faster without you having specifically having to create an index it is part of the data type that you are using for storing the data which is kind of handy so among the indexes they are on normal fields as you'd expect you can set unique keys you can set indexes sorry I have this one you can set indexes on a nested field like if you remember from the PostgreSQL for example where we had HDOC arrow email you set an index on the email field with inside HDOC which is what MongoDB uses with the dot instead of the arrow and you can set an index on words on the field words the field words will now have multiple values associated with that field unlike PostgreSQL which allows you to do this on the whole data type only in MongoDB you can do some specific fields inside the documents that you're storing so it's a little bit more precise on what you store your indexes on making it of course if you're being more precise with your index you end up storing indexes of a smaller size because you have to store fewer information or less information in there so PostgreSQL has this gin index oh the contrast really doesn't work here apologies then again picking red and blue on purple should have seen that coming but this gin index is a really quick way of creating this index on all the data instead of the h-store and it is you're going to have to do something in order to not have a table scan so yeah so there are two types for JSONB fields there is a kv or just a v key value or just a value I think in general people will use the standard which is key value and it allows you to map multiple values to a single document so in this case all the the combination like name glen25 or slug glenalbin or words glen words albin and words 25 that will all show up in this index so that you can quickly relate to a single record that contours that information in this data type so with this as I showed you before you can do these key value matches so we're creating the index here with this using gin on properties so you have to specify which type of index you use for it if you want to create a b2e index you don't have to specify the gin thing so it will then index this whole whole properties fields and similarly this index is also used when you do this array containing a match right so so we create the index so we create the index own properties words so that is a more specific type of index to then be able to match the properties to see whether in this case with the question mark we're trying to find whether the value album actually matches inside the words fields I've shown you only very simple operations you can do on this data I mean I can probably give it all that last several hours on all the different operation that you can do with this data but again not having the time for that so we'll get it going here a few more words about manipulating data so I've already mentioned is a few times the way how you interact with data you need to make sure that you do this in an atomic way you can do this either with transactions or you can make use of all the operations that work on all the specific data type but it's always going to be your preferred way of doing it so that's the atomicity so you shouldn't do retrieve manipulate and store this is true for any database interaction you ever have with any database you can think of so this should not be anything new so Redis has a bit of extra operations here that allows you to work on things that you've already stored so remember that we have the in Redis we have the properties and in this case the property for H is 19 returned as a string but you can do in core H incur they get difficult to pronounce those operations H incur B yes I pronounce it I don't know they need to come up with something slightly better than I think but what it does allow you to is take the field and then use a value that you want to add to the field so even though it is stored as a string you can still do numerical operations to strings that look like numbers and because it's an atomic operation it's also really fast that happens in place and is basically a free thing to do nothing is ever free but it's just a really fast operation and this is used in many applications where you keep counters like visitor counters or anything like that or if anybody still uses four square like how many people have checked into specific venue right all the kind of things Redis is a really good database we're doing this for also if you lose the data it's kind of annoying which you should always be able to build it back from data you've already stored on this and then there's other things like the atomic operational sets I already spoke about this as assets and elements to set or as REM from removing them and so on and then there's transactions as you saw so among them to be a similar thing never do this never do a query update your value and then store it again you get race conditions and among them to be has operations starting with a dollar sign which if you're using PHP is kind of annoying because you really need to make sure you're not using double quotes you need to use single quotes for everything and in this case what this query says in the first example it says increment the subfield 2017 0202 in a field steps underscore made and increase it by 7124 as a atomic operation is kind of handy it also has operations for sets although MongoDB doesn't have a set data type per se there are operations in the query language that allow you to deal with specific arrays as they were sets by using these operations so adds to sets basically says add the word open stream map to tags unless it's already there and then just ignore the operation similarly can you do things like push to push things to the end of an array you remember that in Postgres so you could do that with a pipe operator to add an extra field to it similarly MongoDB allows you to say push tags so what we're doing here is that we're adding to the tags property the values MongoDB and XDBug and then after we've done that we only take the last two elements and everything else that we discard okay in this example that is kind of silly because we've just added two elements and then we only keep the last two elements that we have but what you can use this for is that you add a commons to a blog post and then by using slides minus 10 you make sure that you only store the last 10 commons with a specific blog post and this is all one single operation without you from the application having to create transactions and do this kind of thing so as I said all those no-score databases as well as Postgres so it's a json vdase type have these operations that you really want to use okay schema validation in a relational database well that's schema validation because you have to create a schema upfront right but with all those extra modern data types now you don't describe this schema of the data that you store in there upfront in relational database with these data types you generally cannot enforce this schema on the data that you store in there it is really translated treated as a bucket of data but of course the things that you want to enforce the structure of you wouldn't store in this json vdase type anyway you would store that as normal fields in your in your record with no-score databases it isn't generally possible to enforce a schema although the database have been adding functionality to this I can't remember but a couch should be as anything for it but MongoDB allows you for example to set up a validator and a validator is basically a way of running a query against a document after it's been updated or inserted to see whether it matches specific query the validator and if it matches if the document would be found by running this query it's considered being validated so this is a way of enforcing a schema by not by still not having to create a schema upfront for example so that is a possible thing to do I think the latest version of MongoDB actually extend this language to json schema so if you use that it is a lot richer than what MongoDB originally had before json schema existed for example all right so conclusions the way how you store non-scaler data on different databases is very swalding it's important to pick the right tool for the job right I would usually recommend people use Redis for this very quick in-memory operations because it's really good at that but Redis isn't particularly very good at scaling it beyond a single machine a relational database such as Postgres with a json b data type or MySQL but it's json type which is also getting better and better over time if you're already having an application using relational database and you need to store some extra unstructured data that doesn't happen very often then use a relational database and use the json b or json type as additional to store additional information if your application is already either highly unstructured with some data it's just like that using a NoSQL database like CouchDB or Elasticsearch or MongoDB is the right tool for the job there right because it's already architected in such a way in addition most NoSQL solutions are also built to be run on a distributed platform so if you need to have easier failover capabilities or distributed writes and recent things like that relational databases tend to be less good for that because they have not been architected for that right they're 30 40 years old where the way how you scale your database which you get a bigger machine most of the time and then there are some data types that work better with some database like having the sets it's probably rather would be a much more natural fit for that than you do that in a relational database or even in a document database so those are the different things that you need to look at many different possibilities there as I said Wikipedia list 57 different types of NoSQL databases with others with more exotic features than others so yeah if you have things that you don't think that fits with the things I just spoken about today have a look at what else is out there and there's a lot of different things out there having said that are there any queries or you have two minutes that's one right up front here use the mic please would you recommend any of these for storing events if you're using libraries such as Broadway for or go into the complexity of a dedicated event store so when you say events so you don't mean time series data right is it just events but everything that happens through your application and you want to store in a big log basically you see that right yes so something like event driven architecture where you have your domain events I don't think it particularly matters much but what you need to I don't think I'd use Redis for it because Redis is not good as a system of records neither elastic search in this case both relational database as well as say couch to be among going to be as no school data so would be perfect fit for that right I mean at that point you just need to start figuring out I need to you need to run some benchmarks what is the fastest data store for you or similarly could point about thinking about which one to use which one I'm most familiar with or do I really fancy playing with something else and figuring out how how that works for me right so these are the concerns there but because in the events store itself they don't really have unstructured data very much it doesn't particularly matter in my opinion anything else no questions oh right in front again so I've not implemented or used a no sequel database before but we had we had a technical problem which we are trying to solve using no sequel but I wanted to know for example the way no sequel works is there's no definite schema so one json object might differ from others say just an object for example one json object might have field called name the other one might not so is there it's possible to have an index on the key called name and then tell the no sequel database that if you find this key called name index it and then I want to pull out everything which has that name and then if it doesn't have it ignore it yes so couch do we handle that in a different way of course you need to create different sorry for MongoDB I know you can create an index on the field and you can create sparse index on this field which means that any documents where this field doesn't exist just don't show up in your index or you can make a non sparse index where the fields will show up but the value will be null yeah that should be possible anything else one right here and then I've run over time so this is going to be the last one hey there so I noticed at the beginning of the talk you said there's four database types but you only mentioned three from what I've graph databases is the one I missed ah cool thanks yeah yes I realized that later during the talk I meant forgot to mention graph databases but there's always an eagle eared person in the front row right you're forgiven yes yes graph databases is the other big type big type all right one more slide QR codes who doesn't love QR codes the QR code will go to my my website where you can download the slides of this presentation it also has a list of resources if you want to do a little bit more reading if you have further questions come and find me at a conference email me I'm getting better at answering emails no there's no loss I'm getting better at doing that having said that thank you very much