 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining this Data Diversity webinar improving how to handle NoSQL with a relational database sponsored today by MariaDB. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag Data Diversity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Thomas Boyd. Thomas is the Director of Technical Marketing at MariaDB and joined the company in 2017. He brings 25 years of experience in the IT industry with a strategic focus on databases and data storage technologies. Prior to MariaDB, Thomas worked at Oracle, Hewlett Packard, MongoDB, as well as with smaller startups as an independent consultant in the San Francisco Bay Area. Thomas, hello and welcome. Thank you very much, Shannon, and thank you everyone for joining today. Let's go ahead and jump right in. So I'm excited to bring this talk to you. Our agenda looks like as follows here. We're going to talk a little bit about relational databases. I suspect that's a review for pretty much everyone, and then we'll talk about NoSQL sort of in the larger sense and then kind of specifically to document databases, which I believe is the variety of NoSQL that's really most closely overlapping with relational databases. And then I'll talk about how MariaDB brings these two things together in specifically in those three areas that are bullet-pointed there, and we'll jump into Q&A. So relational databases in SQL. So as most of you guys on the call know, this being sort of a very data-focused audience, relational databases have been around for a long time. And it has several advantages, and one of them really is that longevity and a well-defined standard, the structured query language standard for interacting with the database, and that has allowed this really large ecosystem of different RDVMS vendors and tools to develop around this defined technology. So when I say tools, here we're talking about things like business intelligence tools, reporting tools, extract, transform, and load tools, all of those things that people have been working with over the past several decades. And it's also developed this large talent pool of people who understand how to model that data. We'll talk a little bit about how data modeling works in relational databases, how to build functional applications against a relational database, and then the database administrator and operations aspects are also well understood and we have a large talent pool to draw from. The modeling that happens in a relational database, I think the analogy for those who aren't familiar with it of spreadsheets works relatively well, that we're really talking about defining, if you will, different pages in your spreadsheet which would be analogous to the tables in a relational database that hold rows for each item in the table and have individual columns for the attributes of each one of those items. So this is the classic example where we're modeling sales orders, we have an order header table, that's the orders table there, the first one on the left, we have some kind of a primary ID, and then we have multiple columns that define stuff at that header level of the order. You know, the order number, perhaps the sales rep that sold it, and the order line ID for example. For sub elements, those are broken out into a separate table, in this case each individual line in the sales order is represented by one row inside the order lines table, and it references the order that it's attached to by that order line ID which matches the ID. So I didn't want to spend a lot of time discussing this because I'm assuming that most people are very familiar with this, but it's kind of important to note the fundamental difference between this and document databases, which is this separation of the data of what is one order into separate tables, which has pluses and minuses. One of the great pluses is that it really allows us to do analysis and queries directly against those order lines without incurring a lot of overhead of pulling in order header information. And vice versa, if I want to do analysis at the order level, I don't have to pull in information about all the order lines that are existing on that order. I can just focus on, you know, my number of orders per sales rep, for example, in a particular quarter, and that's a very efficient operation in a relational database. Depending on how you modeled it in NoSQL, it may or may not be efficient. So again, that was just meant to be a quick review and kind of contrast with NoSQL and document databases. So with respect to NoSQL, and again, I was around at a NoSQL company kind of around the time people started changing the definition from NoSQL to not only SQL, which was really an explicit admission from the NoSQL vendors who are going to live in a world that is going to have both SQL and other mechanisms for storing data. So, you know, for better or worse, NoSQL has come to be this catch-all that includes straight key value stores, wide column stores, document stores, maybe graph stores. Some people might even define, you know, some of the Hadoop solutions as a NoSQL database kind of depends on the layers you put on top of those Hadoop systems. I kind of mentioned at the beginning of this, I think I spent a lot of time talking about document databases, and that's partially because that's where my, a lot of my background comes from, but it is partially, it was also because document databases have, you know, proven their applicability to some of the use cases that relational databases have had in the past, and I think that's the most valuable place to do a, you know, a comparative contrast. And we've got MongoDB, for example, that's grown in the marketplace and has kind of defined that document database niche. From my years working with document databases, I would say that there are three main drivers for the popularity, and these aren't actually in a particular order, but the first one that I'm going to talk about is scalability. So document databases came to the fore around the time that big data and the explosion of data sizes and the, you know, the four or five Vs of big data became popular and they got bucketed into this big data area where scalability was really at a premium. And because they were built in that area, they had, you know, scale out capabilities as part of their core functionality. And part of what I'm going to talk about today is, yes, you know, many of the NoSQL offerings have that, but MariaDB also has very similar functionality built into our storage engine technologies. The second one that I would list is direct support for JSON in a schema-less nature where there wasn't a strict definition of the schema up front, which was perceived as a weakness of relational databases, the inability to be agile and adapt quickly as your application needs changed over time as you had to model different data and the ability to evolve that scheme over time. So I'll talk a little bit about that, and I think this is a valid area where some of the document databases still have some advantages, but as we'll see today, MariaDB in particular has caught up quite a bit. And then also just the fact that those databases, those NoSQL databases worked well with a modern application stack, and Node.js was also another technology that, you know, inherently uses JSON that became popular around the same time that some of these document databases were growing. So we'll talk about those pieces as well. If you're not familiar with a document database and modeling data in a document format or a JSON-like format, this is really how that sales order would look modeled as a document, right? So we'd have the order ID, then the user-facing order number. Perhaps we would have the sales rep themselves stored in here as, you know, a sub-document. That's where you see the sales rep attribute, then has a sub-document, which is opened by that curly brace that stores the first name and the last name and obviously some additional fields. And order lines might also be modeled in a similar fashion, in this case because we have multiple elements, the order lines, the value of the order lines attribute is an array opened with the square braces and inside that array you have curly braces. So this is where I mentioned before there are pros and cons to this, right? So the first thing to understand is that everything that you see here in the document, critically, including the name of the field, is stored in the database on a per-document basis. So every time I'm saving a new sales order, you know, literally the text order underscore ID gets saved and the next time I save the next sales order, the text order ID gets saved along with everything else. So this is what you have to do if you have this schema-less nature where each document can define its own schema and the next one could have a few extra fields in it. But I want to point out that it comes with its inherent overhead of storing that. And I'll talk about that a little bit later on. And again, as I mentioned, if I really just wanted to focus on, you know, the sales reps, sales, you know, perhaps all these order line details I don't need to access and pull into my main memory and my database potentially decompressing out of compressed storage on disk as I do that kind of analysis. I'll be honest and when we're trying to re-render the whole document, right, the whole sales order, and I have a use case where I need that whole thing exploded out and maybe displayed on a GUI front-end, then this is pretty nice and there's not extra work that has to happen behind the scenes to rejoin the sales order lines with the sales order header. So pros and cons to this approach, but some that are maybe not as well understood as we pointed out. So this is again, you know, I touched on some of these points on that last slide, but a few caveats to be aware of. The first is the document databases and NoSQL in general really has limited transaction support. And I know that some of the vendors have been layering on the ability to have transactions that cross multiple documents. And if you look closely, A, it's brand new. B, it has some performance costs, so maybe some of the performance gains that you thought you were getting out of NoSQL in the first place you're giving away if you enable that transaction support. And then B, you'll find that it doesn't work well with their scale-out technologies. So I would say that if you're really looking for that flexibility and if your ability to scale out over time is important, then you effectively have no transaction support in these popular NoSQL solutions. So just be aware of that. There are applications that work well without transaction support. But what I've seen, and again, in my experience in working for some of the NoSQL vendors is what looks like a really simple development model where, well, this is great. The JSON object that I have on my application code sort of natively stores down in my document database. I don't have to go through, you know, marshaling and un-marshaling or an excessive amount of defining a data access layer. Some of the small gains you get there go out the window and you have to write all of this special code to handle what should have been executed as a single transaction that fails halfway through and you have to unwind the multiple updates that you were doing. That's complex code to write. It's easy to get wrong. It's corner case code that people maybe are not thinking about right up front when they're developing their application, but when you're hardening your application for production deployment in a mission-critical system, you have to think about those things when you don't have transaction support. So, very important to keep front of mind. The second point I would make is, again, as these document database vendors matured and they found that they had to compete against the SQL ecosystem that had these great tools that were built on top of the standard, NC Standard SQL language, they layered on some pretty inefficient SQL layers on top of their databases. And so, because they weren't built from the ground up of the standard SQL, they're relatively inefficient. If you look closely, a number of them are only available as the proprietary add-on to what, you know, nominally is an open source solution. So, be aware that if this ability to use this massive ecosystem of SQL tools, ETL, reporting, BI, et cetera, is important to you, just make sure that as you're looking at the SQL layers for those document databases that you understand some of their drawbacks, I definitely recommend benchmarking. Third point is pretty similar to that. First point, I already talked about talent pool and, you know, that no SQL solutions are becoming more popular over time, so that talent pool is growing, but it's still a fraction, a tiny fraction of the expertise you have for things like MariaDB and standard relational databases. Last point, and I think this is something that is also, you know, proving to be more and more true over time, that the schemalist nature of document databases does enable some nice flexibility and some agility, but over time, it runs the risk of having really poor data hygiene, so the idea that there aren't inherently constraints on what's getting inserted into your document database just naturally leads to dirty data. And the document databases have added some constraint checking and some requirements around saying, okay, this field must exist and it must be this type. And I think, you know, once you start to do that, you really are defining a schema. It's just not defined sort of down at the document, it's a separate layer where you're defining it. I already talked a bit about storage size and query inefficiency. The document vendors will tell you that, well, because we have such great compression, it doesn't really matter that we're carrying those field names that I mentioned before inside our documents. And that is true and not true. They do get good compression when you have repeated field names in every document, but just remember that compression is costing you CPU, right? So, you know, if I'm going to compress every time I go to disk and every time I come back from disk, that soaking up CPU, that, you know, maybe I didn't have to do if I had my schema defined separately and I was enforcing that schema relational database, you know, as relational databases do. Lastly, when you have denormalized data, so those sales reps, for instance, that were stored across all of my sales orders, where I had the first name and the last name stored directly in my order document. Well, what happens when, you know, one of my sales reps changes their name, right? Somebody got married and they decided to take another person's last name. Now I have this massive set of updates that need to be done across all of my orders. And there are all sorts of modeling tricks that the document databases will sell you on in terms of kind of how to handle that. Then maybe it's okay in a document to have a reference to another document. What I find is, again, this story of, boy, it's a really nice native match between my application object and what I'm storing in my database starts to break down a little bit when you're doing these specialized data modeling and document databases. And I have found that, you know, that the well-defined third normal form that relational databases sort of move towards and the rigor around defining your document and defining your storage that way really makes for cleaner data. It makes the data modeling exercise much easier and just all in all, it's a more rigorous approach to the way you're modeling your data. This might be a good time to tell the one story that I like to tell. Again, my previous job prior to this was with a document database and at the time I was in a solutions architecture role where we spent a lot of time talking to potential users of our database and, you know, there's a lot of dissatisfaction with the really big relational vendors out there, one in particular. And so we would walk into customers who said, wow, we have a strategic initiative to get off of, you know, our big legacy relational database vendor. Here's a list of my applications that we're gonna, that we're targeting to port over to a new technology. You know, how well does this fit with your document database? And, you know, as we went through, you know, the list from the last company that I was interacting with, which was hundreds, literally, of applications, it was maybe a 5% to 10% that we felt like we're gonna be a good fit for the document database. The rest of them were a much easier port to a more modern open source relational database. The bottom line is porting these applications from an existing relational database into a document database is a lot of work. You're gonna be changing the way that you model and store the data. And if we already have a reason that we're gonna crack open the code and do a lot of surgery on the actual existing application, maybe that's a good time to consider, you know, that kind of a large scale change to your data storage layer. But if you're really looking to do a port and quickly get off of an expensive legacy database, oftentimes that work is just, it's not cost-effective and you should be looking at something like MariaDB. With all that said, let's get to the meat of the presentation today, which is really talking about where MariaDB overlaps with SQL and no SQL. And I'm gonna start with the first aspect. I'm gonna start with a real quick introduction of MariaDB for those folks who aren't familiar with us. So we are really the default database on all of the leading Linux distros at this point. So whether we're talking Red Hat, Ubuntu, Susie, when you are executing the MySQL service, even though the executable itself and the service name is called MySQL, if you look at the log files being produced underneath the covers, it is in fact now MariaDB that's running. And so there's a long history there of sort of how MariaDB has assumed kind of the mantle of the default distro in these Linux distributions. And I just wanted to let people know that we're widely used, maybe more widely used than people would expect if you're used to kicking off those services with a different name. And then in addition to the Linux distros, widely available in cloud services and stacks, I don't think that should surprise anybody that were widely available there. We're also used in a giant variety of different use cases. And that's really true of relational databases in general. Again, they have that 40-plus year history of proving themselves as being very adaptable, a very general-purpose database that can serve small use cases, little small startups that don't have a lot of scale, all that we have to supporting applications inside telecom where we're dealing with call detail records and the billions of records. We do equally well at OLTP and at analytics use cases. So people are looking to do big data, business intelligence types queries, even machine learning. So we've done quite well there with some of our specialized origins and so I'm not going to talk about here. Maybe the one logo that I often talk about here is BDS Bank in Singapore. So the largest bank in Singapore, you know, financial services, a heavily regulated industry where we had to go through all of the checks for security and business processes to be accepted into their organization as a standard and they're in the process of replacing all of their Oracle installations that big, it's a multi-year process, but a real, you know, C-level initiative for that company to move to a modern open source database. So let's talk about scalability. And again, you know, this is something that the NoSQL vendors get high marks for and deservedly so and my real point here is there are many ways to scale MariaDB that, you know, really the first one will talk about all of these kind of in piece here. Maybe the one that isn't mentioned is just vertical scaling, right? So you can support very large workloads in the hundreds of thousands of operations per second on the big machines that you could be running either in your data center or in, you know, in cloud offerings where the cloud offerings now have some really big iron that you can get yourselves onto. But let's talk about re-scalability via replicas. Again, those who've been in kind of the data management space are familiar with the idea of maintaining read replicas that just maintain a copy of the database. So if you have a really high read-intensive application, replicas are often the best way to scale out and you just direct your reads across those multiple replicas that MariaDB can maintain as a core part of its functionality. There's not a lot of extra setup that you have to do to run in this mechanism. What maybe people are not as aware of is the ability of MariaDB to actually partition the data and distribute it across multiple physical nodes. And when I say physical nodes, they really could be virtual machines or containers or separate nodes in your cloud environment. And this is really done leveraging an API layer that's existed inside MariaDB for a long time, which is our storage engine API that allows us to change storage formats and introduce compression and many other things can happen using the storage API. But what it is is a really rather clever usage of the storage API that spreads the data behind the scenes out across multiple nodes. Critically, this is all done transparently to the application. So the application doesn't have to make, isn't even aware of the fact that the data is spread out across these nodes. It just does. It inserts, updates, and selects just the same as before. And MariaDB server and the storage engine by virtue of having declared these tables as stored via the spider storage engine handles everything transparently to you. Critically, it's transparent also with respect to transactions. So if you are issuing a transaction that spans multiple of those nodes, MariaDB will do the two-phase commit to make sure that that transaction is an all-or-nothing operation. And so you don't have to write this specialized code to roll back what should be a single transaction in the event of a failure while it's executing. We also have a lot of choices in terms of the way the data is spread out across those servers on those tables that you've chosen to store inside the spider storage engine. You can do range, hash, key, paste, partitioning. There are pros and cons to those in terms of which ones perform better for single item lookups versus range scans versus other access patterns. The whole topic of the spider storage engine has its own separate webinar and I'll point you guys to some links at the very end here of how to search out and find more information about the spider. But I just wanted to let you know that this is very similar to the image that you'll see from the the scalability in various NoSQL offerings. One thing that you won't see for most of the NoSQL offerings is this ability to really serve OLTP workloads and analytic workloads out of a single platform. And we do this again using that storage engine API that I talked about where behind the scenes as you read and write data into the MariaDB platform to make a decision about which of that data should also be stored in a columnar format that's really well suited for large scale analytic queries. So for those of you who haven't worked with columnar databases before, when you're doing these summarizations across multiple years of data, billions of rows, maybe looking at a handful of the columns to summarize, let's say we're doing the sales orders to summarize sales over quarters, over a massive organization that has potentially hundreds of millions or billions of individual items that are stored in these tables, having that stored in the columnar format can increase the speed of those queries by several orders of magnitude. If you're like me, I came more from the OLTP side of the world and I didn't fully appreciate the power of these columnar data stores until I actually was here at MariaDB and again the nice thing is this can be done transparently to your application. You just make a couple of extra API calls to set up the tables that you want to also be stored in this columnar format and you're off and running with it. And I'll say this is a somewhat simplified picture, of course you can have the spider storage engine involved on the left hand side here, the OLTP or transactional side and the column store side has its own ability to spread data out across multiple nodes. So again, kind of multiple ways and you know in summary, multiple ways to get up this scalability question and I think this one is actually unique to us where we move above and beyond what you'll find from the most of the NoSQL offerings out there in particular the document based offerings. So let's talk a bit about JSON support inside MariaDB. So it is possible in MariaDB to define a column that's meant to store JSON and then do a meaningful set of queries, a meaningful interrogation of the data that's inside that JSON column. And we've seen this adopted where people will have some of their structure, their well-known data stored inside traditional relational database columns and that we see on the left hand side here in the blue. Things like in this case we're doing a list of products. So our product might have a name, it might have a format, it might have a price but there may be an extra set of stuff where depending on whether it's a Blu-ray or a DVD or a book or a completely different kind of product where the information that we want to store is slightly different and that's on the right hand side here what we're calling semi-structured in this field that is actually represented by a document in the curly brace. We have the name, the attribute name that comes first and set a key value pair with the attribute name and then the attribute value. So in this case we have a name of resolution and a value of 1080p. So again, you can see here this critical thing where or this critical aspect where the schema for the structured parts of this row are defined sort of at the table level whereas the unstructured or semi-structured portion is described by the data itself and that really kind of is the core difference here. You will sometimes see this approach to modeling your data in a relational database called the anchor schema approach where you have a set of anchor fields that you know are always going to exist or maybe not always, some of these fields could still be nullable but it could be common, largely common across all of the items in the table and so you will explicitly declare those as an anchor as something that is going to exist for all of the rows and then you might have one or more JSON columns where the semi-structured flexible portion of the data is allowed to live. This is accomplished with a regular create table statement so you can see the first five columns here are declared as regular relational database columns and then the last one is just the attribute field is declared as a JSON type and it is interesting you will see oftentimes because it's a generic column that's meant to hold different kinds of information depending on the row you'll oftentimes find that that JSON column is named something like ATTR, short for attributes sometimes data that's not really my favorite name but some name that's a little bit generic that reflects the fact that the actual contents of that column can vary from row to row. So here's an example of doing an insert into that table that we just declared standard insert statement really until you get to the very last field that you're inserting which was that attribute field where you see that we're just straight inserting what essentially is a string but that string itself is formatted as a valid JSON document. So a couple of examples there of doing inserts relatively straight forward and again now we have the flexibility that you see with the no-SQL databases the ability to along the way have different rows that have different items in them and this might look like it was planned up front to me the real advantage here is when later on we have a new product type that we weren't thinking of before now we're going to start selling books and we wanted page count fine we can simply insert that we don't have to do any separate scheme of definition we don't have to do anything differently in order to start to store that information. This is a good point for me to mention one thing probably should be in the slide deck relational databases used to have or Maria BB used to have a little bit of an Achilles heel that when you wanted to add a new standard relational column it was an expensive locking operation that could take a while. There is now an instant add column feature that would allow you to drop in a new standard relational column into an existing table on a running database without horribly expensive locking that was going to cause your system to kind of crawl for a period of time. So while you get this great flexibility out of JSON there are some options to enable that kind of flexibility or at least to evolve your schema over time without some of the locking that used to be a bit of an issue with relational databases. So if you're curious on that what you want to look up on is the instant add column feature. It's just an extra bit of syntax you add in when you're adding a column to a table. So what about when you want to read the field or read data out of a table that's been stored in there? So when you do queries you can choose to extract data that's stored deep inside that JSON field using various syntax, various commands, various operators that we have against the data. So here you see an example of pulling the name, format, and price, which were the standard relational columns, and then for the fourth field that's going to return in this query all we really wanted was the video aspect ratio portion of that attribute column. And that's where you use this JSON value operator to extract that information and present it as part of the cursor result set as if it were just a regular relational piece of data. So again, anybody can store strings of JSON information inside their database, but without these JSON operators it isn't all that useful or helpful for you. Here's another example where we're again pulling the aspect ratio. In this case what I'm really highlighting is for the document that didn't have an aspect ratio defined it's just returned as null, as I think you would expect from a regular relational query. Here's one where you're using information about the JSON that's stored inside that attribute column as part of the where clause. So in this case we wanted to make sure that we were only returning movies that had DTS HD audio as one of the options inside the audio stream. So I'm pretty straightforward, just highlighting that both in the select and the where clause you can make interrogations of your JSON data. Here's an example where instead of returning a regular relational field we really wanted to return a JSON object. In this case an array of the an array of information that was stored in that audio attribute inside your attributes column. So again we'll talk a little bit now. What I'm showing right now is how to reformat this and present your information as standard relational queries. Of course we can also get to this data still showing it in the JSON like format, which I'll show in a second here. Here's an example where we're pulling a specific array element. If you look inside that JSON value query we're saying have a look at the attribute field look at the audio subfield and take the first element in that array and return it as the default audio. So that's what you see right there. Again I would say proper interaction with arrays and sub documents is a critical feature in anybody who's claiming to have proper JSON support. At that first level if all of your documents have to be flat and just be simple key value pairs without the ability to have arrays and sub documents stored in there, then the solution isn't really presenting a nearly mature JSON offering. Here's an issue where we're an example where we're returning a JSON object in that video field. So it's opening and closing with those curly braces. So now we're getting closer to returning queries that look like natural JSON if that's what your application requires. So again in my world we see a lot of data stores where, okay, yes, there's my front-end application that's generating the data and maybe it wants to interact with the information in a more native JSON format. But I might have some reporting functions. I might have some other service layers that are hitting the same data store that would like to interact with it in a more anti-standard SQL format. And so we have the ability to go both ways here, which I think is really the great strength and one of the points I'm trying to get across in this presentation. Again, here's just an example of more drilling down into the where clause where we're looking at sub documents, not just video, but the sub portion of that, the resolution portion of the video document. You can declare indexes on the fields that are stored deep inside those JSON fields. But it is a little bit trickier, so in this case what you're going to do is you're going to create a virtual column using those same JSON operator that I talked about before, but you're going to end up declaring a virtual column and it can't even be a hidden column that doesn't show up in select stars that pulls that data out and stores it in a standard relational value also, and then you can index on top of that. So here you see a case where we've created that virtual column called video resolution, but we haven't yet declared an index against it, so if we explain a select against it where we're looking at that video resolution column, there are no keys that we can do. We're going to end up doing a full table scan if we execute the query as you see it now. If we then create an index on that new virtual column that we've declared, then as you see in this explained plan we now see that we can actually do a lookup on that index. So again basically we're using this virtual column mechanism and that virtual column is going to be automatically maintained when you do an insert into the document or into the table and you inserted that JSON that had video resolution behind the scenes, the virtual column and the index are automatically going to be updated. So here's an example of doing an update where you're actually looking to insert additional information into an existing JSON document there. So what we're saying in this case is go find the attribute field, remember that's our JSON field, and please set a sub field inside the attribute field called discs and set it to the value 1. So again you need this flexibility, you need to be able to interact in a meaningful way with the JSON that's been stored down inside your attributes column and that includes not just the queries that we're showing but now some updates. Also as I said any mature JSON solution is going to allow you to interact with arrays so you can again do sets and manipulations on arrays down inside your JSON column. Here's an example of adding an array element so before we just add English from French, now we're saying please append into that array a new language called Spanish. So a nice compact example when you don't have these kinds of operators you'll end up seeing this pattern where any change to the JSON field or the JSON document requires you to rewrite the whole field or maybe even the whole row and when you start to have relatively large JSON fields that's very expensive to take the whole JSON document all the way out to your application tier. Do all the work there to reformat that JSON document the way you want it to put Spanish inside your languages subfield and then do an insert that asks just to replace the whole JSON field. Very expensive adds to network traffic and you're just you need these kinds of abilities to interact with your JSON fields down in your database if you're going to avoid a lot of the penalties in terms of network traffic and performance and latency. Here's an example of removing an array element here's an example of taking relational fields again remember we had the name format and price declared as regular relational fields and now we're asking hey I'd rather present those to my application as a JSON so you use the JSON object operator to do that. Another example where we're combining in some of the fields that were stored as native JSON so you can do this merge if you will where you're putting data together here's an even more complex merge example do apologize I'm moving a little bit fast here because I want to make sure we do have some time for Q&A at the end. Lastly kind of the last topic here in the JSON section is we do have the ability to define constraints on the JSON attribute field or the JSON fields that you declared in your document so this is a check constraint that's basically saying hey for every row that we insert where the type is M which stands for movie then I want to enforce that we do have a video field inside the attribute column and a cuts field and an audio field and a discs field and you can create these arbitrarily complex check constraints using boolean logic and again those same operators that you saw before and some other ones that I haven't kind of mentioned to create you know checks that will be as complex as you need them to be and so in this case where we said that movies have to have these various attributes defined and maybe they have to be a particular length if you do insert that violates those check constraints then we'll throw air so here's an example where we didn't have a field that was required we had required the resolution field and this insert wasn't supplying the resolution field inside the JSON column here's an example where we had a data type defined but needed to be the discs field needed to be defined as a number and we're trying to insert it as the text one O.N.E and all again say that these kinds of check constraints are really important for data hygiene the number of document databases when I was doing consulting in my document database days where we would wander in and we would find wow this field here while it's declared as price is an integer in 30% of the cases it's a float in 70% of the cases and then trying to reconcile that and you know it's a lot of extra work and having these kinds of check constraints are really important to maintaining data hygiene all right I'm briefly going to talk about the third point where I felt NoSQL had gained some traction and it's really around modern programming languages and Node.js so for those who aren't familiar with Node.js it's been around for a long time it's kind of a different mindset in programming in terms of doing asynchronous calls working in an event loop and having non-blocking IO there's a lot of reasons why Node.js became popular people found that their application stack was able to support much higher throughput when they moved to this asynchronous model MariaDB has been able to work with Node.js for a long time since at least 2014 and really from before that by virtue of using a standard MySQL Node.js driver but it was only Intel the beginning of this year that we released our own produced by MariaDB Corporation Node.js driver that had some great performance improvements it gives us the ability to properly support the Node.js application stack as our customers have started to build more and more critical applications on top of Node.js it has some pipelining capabilities some extra asynchronous capabilities it's promise based for those of you who are familiar with Node.js for the latest asynchronous programming inside Node.js I kind of talked about most of these points so that based Node.js driver you really end up still writing what looked like relatively standard SQL queries inside Node.js a lot of folks are really looking for that higher level that integration with an object relational mapper or even higher level frameworks that help you define a CRUD to create, read, update, delete API on top of your data that's really what these last two points are here SQLize is a very popular ORM framework that maps Node.js JSON objects down to relational databases their latest release directly supports that MariaDB driver that I talked about Feathers.js is a higher level framework that helps you define those APIs like a full CRUD API on top of the tables that you declare in your relational database this is an example of SQLize defining a model definition this is straight from their documentation so in your Node.js code you're going to declare a class called task and then that task init call is defining the fields that are inside meant to be inside that task object right so we're going to have a string we're going to have a text which is really just a longer string and the deadline field is going to be a date and so SQLize behind the scenes takes care of all of that information doing your schema definition and then when it comes time when you have this task object that you're manipulating inside inside your Node.js application when it comes time to save that down to the database you just issue a standard call to say save this object in SQLize ORM handles all that magic so relatively straight forward those of you who are Node.js programmers are going to be familiar with the Node package manager NPM you can install that even install that MariaDB driver using the standard NPM installations so I did go into a lot of detail here again we've got documentation in our on our website before we jump to Q&A I just wanted to point you to a couple of places to really get started rather than try and drill down on to individual webinars and white papers I just wanted to remind you guys or at MariaDB.com if you go to MariaDB.com slash resources that's where you'll find the indexing for all sorts of white papers webinars you know various various resources that are really going to help you dive deep on you know if it was Node.js or was the spider storage engine or any of the other things we talked about today and then lastly we're always happy to take questions the best way to do it is to go right to MariaDB.com slash contact and just fill out the little web form right there it could be a question again very specific about the stuff that I presented today or very general or you know a support question where we're that's what we're here for this is this is what our business is to help make people successful on MariaDB so don't hesitate to submit there with that said I think I'm ready to jump into the Q&A session Thomas thank you so much for this great presentation and just to answer the most commonly asked questions just a reminder I will send a follow-up email by end of day Monday for this webinar with links to the slides and links to the recording so diving right in here Thomas this particular attendee says I joined late can you confirm if MariaDB can store structured as well as semi-structured like documents also the structure definition in JSON only so the short answer to that question as hopefully you saw if you didn't join too late is yes that you can declare a JSON field inside of MariaDB table and pack as much or as little information and variable information into that field as you add new rows to the table and so does MariaDB support both SQL sdql and non-SQL features again so the non-SQL features are wildly different I might say wildly different in the various no-SQL offerings and I would again if you wanted to focus on those three areas that I focused on scalability the ability to natively support JSON and the ability to interact with a modern programming language in a more native way like Node.js then I think the short answer is yes we can support both all right and so so all of the JSON operators are basically parsers of the JSON content is that correct let's see I mean that's correct they're parsers and then potentially updaters of the JSON information that's stored inside that JSON field all right shouldn't give people a couple minutes here they're typing their questions okay I know there's some coming in all righty here lots of questions on JSON here is this I know I see more complex SQL like joins etc can it query across dbms applications so yes so you can do you know regular joins you can use those JSON operators to tease out a resolution field for example and if you wanted to join that to maybe a separate lookup table that stored the detailed information about what that resolution really means you can absolutely do joins I will there is one thing I'll talk about since we're we're waiting for a few more questions anytime you scale out your database and you partition it across multiple nodes you are going to have the possibility of doing you know arbitrarily complex join queries that cross multiples of those multiple nodes and that's part of the reason why document databases have this reputation for better scale out performance with the caveat that it meant that the way you model the data was such that you were using that rich document structure and you were not having to do distributed joins across those multiple multiple nodes in the database in fact the early iterations of the document databases didn't have the ability to do a join at all that was something you would have to do in your application code so it is possible in both document databases and in Maria db to have a document structure that's not going to be very friendly to the kinds of queries that your application wants to execute so that's where both in the document databases and Maria db you might choose to denormalize some of the data and maybe have a more richer document like structure such that you didn't have to do joins that and multiple nodes in the database I just wanted to point that out in the scalability there but that it's really not different between relational databases and no sequel databases that you have to be aware of distributed joins and Thomas can you speak to encryption capabilities encryption at rest and in transit so inside Maria db very easy to set up both of those encryption at rest uses the standard SSL approach it's just a matter of making the right settings inside your database we can encrypt the traffic that's coming from the client going into the database and then if you're running in a distributed system with replication or the spider storage engine you can make sure that all of the data that's getting shuttled back and forth there is encrypted likewise the underlying storage engines have the option to turn on encryption to make sure that storage at rest is done as well so on the slide showing data replication to a column store to create a read-only copy how do you maintain the data transformations and won't you have the same issue around metadata changes that impact the denormalization data sets on the read-only copy so actually can you repeat that question sure yeah there's a lot of info on it on the slide showing data replication to a column store to create a read-only copy how do you maintain the data transformations when you start there and then additionally won't you have the same issue around metadata changes that impact the denormalization data sets on the read-only copy so that's a very good question and it is true that when you're running in that data and you're running in that data and you're running in that data and you're running in that data it is true that when you're running in that MariaDB platform setup with the columner or with the transactional side and the columner side that as you make modifications to your schema there are some manual steps that you have to do to keep your columner side in sync as well is really the short answer there so I think that's a very astute question and is there a happy medium between resolving a typical relational database problem with many to many and utilizing a JSON array for example the Blu-ray example in typical relational database design one you have a table for Blu-rays and two a table for languages in 3 etc so I think that the many to many relationships are typically modeled in a document as an array of sub documents so it really to me it depends on the access patterns if you feel that for example you're not going to be having a lot of updates to I guess I would say the non-transactional side of the many to many the lookup table that's manying in then maybe it is okay to store it inside the main transactional document so I'm not going to have those expensive updates likewise if it's not important for me to do a query across that lookup table where I'm just going to be analyzing that lookup table maybe it's kind of relatively meaningless on its own then I'm fine embedding that inside my document again whether it's document database or relational database if the answer to those questions were opposite then maybe I do want to model it in the traditional many to many with an intersection table in between them which gives me that nice flexibility I only have to do one update when I need to change that lookup information if I want to analyze the lookup information I don't have to involve the transactional information that said that means that anytime I want to do a query that pulls more of that information in that has to navigate that intersection table that becomes more expensive there is a halfway point where sometimes you'll see only some few pieces of the lookup table pulled in as well if I know that I'm very often I'm going to be querying and I don't just want the resolution the resolution the resolution name there's something else on there I don't know what that would be maybe it's the set of TVs that it plays best on or something like that you might choose to denormalize just a few of those fields in to your main transactional table you mentioned something about two phase commit when using the spider storage engine can you elaborate what that was for sure so the promise of the spider storage engine is that it doesn't change the way that your application has to interact with the data and so that includes when your transaction is doing a multi-document operation where the documents are spread out across multiple of the underlying nodes so maybe I'm doing a mass update to all of the sales order for a particular sales rep and those sales were spread across many of the nodes if that update if that commit for example if that update fails halfway through then that means that the ones that were in the process of trying to get committed are going to be automatically rolled back by the system as opposed to if you don't have a solution that supports transactions across your partition data it's going to be the application's responsibility to go in find out where in the process it failed and either pick it up again from the beginning and apply the rest of the changes or undo the changes that had already been done that already means that your application at least for a period of time was operating in an inconsistent state which is usually a very bad thing for applications and that logic of first detecting when there was a failure and then finding where it failed can be very difficult in particular if it's an update transaction it could be extremely hard for the application itself to unroll where that failure happened if it's an insert-only workloads it's maybe a little more reasonable to expect the application to be able to undo a failure like that So is SpiderStore part of MariaDB default installation? It is I believe it's shipped but it has to be separately enabled so there's a setup call to enable it and then when you declare your table you just instead of declaring it as the NODB storage engine the default storage engine you declare it as a spider table there's obviously more setups that happen you have to declare the underlying nodes and relate them so there are some setup issues to enable it Alright and what modeling tools have capabilities or work with structured and unstructured JSON columns? So that's an area that is not well developed at this point so there's a ton of modeling tools that are doing standard relational and I guess I should say that I'm not really that familiar with the current state of those tools I'm not able to answer that question with a great deal of detail Sure fair enough Well Thomas this has been a great presentation and great Q&A thank you so much for joining us today but I'm afraid that is all the time we have for today's webinar just a reminder I will send a follow-up email to everybody by end of day Monday with links to the slides and links to the recording of this session Again, thank you so much thanks to MariaDB for sponsoring today and Thomas thanks for the presentation Hope everybody has a great day Thanks all Thanks Shannon, I hope that went well Went great, thanks Thomas Have a good one Yep, thanks Bye