 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining this Data Diversity webinar based on data modeling and document database sponsored today by Couchface. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag dataversity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To access the Q&A or the chat panels, you can find those icons in the bottom middle of your screen for those features. And as always, we will send a follow-up email within two business days, containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Matthew Groves. Matthew is a guy who loves to code. It doesn't matter if it's C-sharp, jQuery, or PHP, he'll submit requests for anything. He has been coding professionally ever since. He wrote a quick basic point of scale app for his parents' pizza shop back in the 90s. He currently works as the product marketing manager for Couchspace. And with that, I will give the floor to Matthew to get today's webinar started. Hello and welcome. Hi, thanks very much, Shannon. Can you hear me okay? Yeah, you sound great. All right, excellent. Well, welcome everybody. We're going to talk about some JSON data modeling today, because if you're like me, and I know I am, if you think data modeling and databases, you usually think of modeling it like this, like a relational table-based model. And if you want the benefits of a non-relational database and the benefits of a JSON database, you may have to start thinking a little differently about how you model data. Now, fortunately, Couchbase, today's sponsor, is going to give us some familiar tools of the relational world with the NoSQL JSON world. So we're going to go into that today. We're going to spend a little bit of time talking about why NoSQL is important, why people use NoSQL, why you might want to consider using NoSQL. And then we'll get into the JSON data modeling and how it's different than relational modeling. And along with that, along with just modeling, we have to think about how we're going to access our data, because that might also affect how we model data as well. And then we'll spend a little bit of time on migrating data. I have a short demo as well showing some data migration from relational world to Couchbase and use any time we have left to answer your questions or respond to your comments. So please leave those in the Q&A or the chat or Twitter or wherever you'd like to, and I'll take a look at those when I get to them. So let's talk about why NoSQL in the first place, why even look at a NoSQL database. So let's just talk about what NoSQL is first, because it's kind of a broad category. It doesn't really have much technical meaning to it, but generally it refers to databases that either lack SQL or don't use a relational model. Now, once the SQL language becomes optional, historically speaking, and transactions become optional, there's been a huge flurry of databases that were using distinct approaches for very common use cases. So, for instance, key value provides quick access to data for a given key. Wide column can store a large number of arbitrary columns in each row. Graph databases store data as relationships and relationships as first class concepts. Document databases, they aggregate data into a hierarchical structure. What JSON is a means to an end there. So document databases is what we're going to be focusing mainly on today. It's probably the closest to a all-purpose NoSQL database and the closest to a relational database. So it's a good starting point to talk about JSON modeling there. So when we look at document databases, they originally came from kind of the key value world, a minimal set of APIs to access data by keys, but then some indexing added onto that. If we're storing data as JSON in a regular format or XML, usually JSON, then we can start to index that data and query that data by a secondary attribute, not just the key, but what is actually in the data itself. So this gives us the buzzword NoSQL because there's NoSQL involved in operating with any of those. But as we're going to see, that's actually not really accurate anymore. SQL has come to the NoSQL world, making it even less useful of a technical term. So it basically means all data can be stored and retrieved differently than relational databases. So think of a document database as a very simple type of key value store where the value is in a known format. And generally speaking, you're going to code where you start with a key and you ask the database to return that document, that corresponds to that key. That's how key value database works. The difference here is that those brackets there that show doc one, doc two, doc n, those contain JSON and we can do some lookups based on the values of JSON as well. So with that in mind, a very quick overview of what NoSQL is. What actually are the reasons that people go to NoSQL? And the original main reason that NoSQL came about was for scalability. And what's going on is something called horizontal scaling. It's where we have a distributed database, distributed computing amongst multiple machines that communicate and coordinate over a network. And if we need more capacity, we add more nodes, more servers to that cluster. And we keep doing that as we need to add more capacity. So we're scaling out instead of scaling a single database up to a more expensive single still single node machine. And so this can provide a lot of benefits in terms of elastic scaling, responding to demand, cost effective scaling. So for example, if a popular TV show premieres on Sunday nights, then a media service like Sky TV, for instance, or Peacock or something may need, may have a lot more demand Sunday night than it does on say Tuesday afternoon. And so they need to scale up to meet that demand. But on Tuesday afternoon, they don't necessarily need all that capacity, so they can scale back down like so to save on costs. So that's one reason is scaling. Flexibility. And I'm kind of seeing a little bit of a sarcastic comment here. And this is kind of the reason that appeals to developers is a flexibility. It's the easier management of change in business requirements and easier management of change in the structure of the data. And this is a benefit if you're doing some sort of agile development. So you can just change things as you go. Sometimes it's a benefit if you're pulling data together from multiple sources. And that flexibility really helps data ingestion as well. Now a document database, generally speaking, you're storing data as JSON. There's a required key that has to be unique. Otherwise, there's no rigid schema involved with this data. Each piece of data is independent. So you can do whatever the heck you want with the data, right? But that being said, you shouldn't. There still is a schema there, even though it's implied. So you should still have some discipline about your data. And we'll talk about that today. Availability is another important one. This is kind of related to scaling. If one machine goes down in a cluster of machines, then the system can stay online because other systems still available. This is a picture I took of my wife. Jesus, it's got to be six years ago at this point. Five, six years ago at this point, we were watching Wonder Woman. So you can you can date it to that. But we were at a movie theater and one of the machines was undergoing some sort of maintenance. It was broken or needed refilled or something like that. But because they had multiple machines there, a cluster of Coke machines in this case, we could still get our service before the movie. We didn't have to stand there and wait for the machine to get serviced because there's only one machine or potentially not get a drink, right? Not be able to make a purchase because the machine was down. And so this is a very nice thing to have for availability because it means less downtime if hardware fails, or if a cloud data center fails, something like that. And for maintenance and upgrading, we can upgrade or maintain or fix one machine without any downtime. The other machines can still service your customers. And of course, speed is one people talk about a lot. I'd like to be more specific when I talk about speed, we're talking about latency, we're talking about how much concurrency is involved. No SQL systems are optimized for a specific access pattern. We'll talk about that today. And Couchbase specifically has a built in cache and memory first architecture, which gives it a really, really good speed. So that's all I can talk about. We can certainly talk about benchmarks, things like that. But the architecture, integrated cache, these are the things that give it the speed. And that's what I want to focus on more is, okay, it's faster, why is it faster? And how can I take advantage of that? Use cases for NoSQL. I've been giving a version of this presentation. It's changed a lot over the years, but a version of this presentation for five plus years. And this list is something I keep adding to all the time because NoSQL keeps maturing and adding more features, adding more use cases. So I didn't start with finance inventory and healthcare on this list, but they're there now because NoSQL databases, especially Couchbase, are maturing and adding the features that these sectors, these use cases need. Now, using a NoSQL database does not mean you have to abandon relational databases. Most large companies, most large websites use a combination. And it's also worth pointing out that plenty of companies that are doing most of these things with relational, right? So relational usually is at least a mediocre choice for something. NoSQL may be better for some things, but usually the catalyst for NoSQL was one of those reasons I mentioned earlier. And just one of my favorite quotes here, just to bring home that point that different isn't always better, but better is always different. So it's definitely worth keeping an open mind about NoSQL and the benefits of it. So with that in mind, let's get into some modeling. That's what we're all here for, right? So, you know, some of the comments are talking about making up as you go. Planning is definitely still required, no matter if using NoSQL or relational, right? So let's look at, we're planning out a model of a customer on a white board, perhaps. We might do this as part of a proof of concept or discovery, requirements gathering, you know, some sort of planning meeting. And there are some things that we know about this customer as we've drawn them. This crude picture that I drew myself by hand, there's four attributes here I want to focus on. And one is that there is a rich structure to this customer, right? So we have, each customer has a number of attributes, like name or date of birth and so on, and potentially sub attributes. So name might be first name, last name, date of birth might be month, day, year, something like that. We have relationships in this data. You know, potentially to other data, like other customers in this case, we've got, Helen here has connections to three other customers in the system. Maybe they're sharing an account on, you know, a streaming service or Amazon or something like that. We have a value evolution that's possible, right? So we can start with Helen has one purchase of a laptop and she, as she makes another purchase, we can evolve those values to contain multiple purchases. I call that value evolution. And the last one is structure evolution. This one's a little trickier to do with relational databases. So perhaps we start with Helen having a credit card number as one of her attributes, right? But a requirement comes down that, hey, we need multiple payment options for each customer, multiple billing options. Well, if we made the mistake of, you know, creating a column in the customer table, for instance, of credit card number, then we have to go back and refactor our data to accommodate that and change our schema and update the code that deals with that. So this is an example of, I showed you this earlier, this example of how we might model that customer, right? So we start out with a rich structure, right? So we've got attributes inside the customer table. We've got relationships to other data, you know, via contacts, via purchases, via connections. We've got a value evolution so we can add additional rows to any of these tables and evolve the value. The structure evolution is more difficult. I want you to imagine if that billing table wasn't here and we put a credit card number column inside of customer. If we want to evolve that structure to accept multiple credit cards, multiple billing options, then we have to remove that column from customer, add a billing table, add a foreign key constraint, and, you know, make all those changes to the schema. And that can be very time consuming, especially if these are large tables that can involve some downtime with our relational databases. So that's a very difficult part of evolving structure in the relational world. And that's why I think developers really value the flexibility of a document database. So let's go through a little more concrete example. We're going to have a relational here on the left and we're going to have what it might look like in a document database on the right. So this is a pretty straightforward translation here, if you will. We've got a table with a primary key and the relational database, we have a document as a document key. And those can match up exactly. We can go from the column name to a key value pair in JSON. And again, it matches up pretty straightforward there. So that is a very easy translation going from flat data to flat data, not a big deal at all. Let's start making it a little more complex though. We've got now a purchases table. So this customer has one or more purchases, right? So we've got a separate table called purchases that has a foreign key that points back to the customer. Over on the document, however, we still got just the one document. And I've got a purchases array now that contains those items. And so notice in those purchases, there is no foreign key because that is no longer foreign data. It's now domestic data. So we have two pieces of data in two tables versus one document with one JSON object in it. And if we want to evolve the value, add more purchases, then we have three rows of data in two tables versus our document is still just one piece of data in JSON over there. So you can see where that's going. Pretty straightforward to just embed those purchases into the customer document. If we want to get more complex and have a connection from one customer to another customer, well, we can't really embed that customer inside of another customer because then it would just be embedding forever. It's an N plus one type problem. So in the relational world, we typically have what's called a mapping table or many to many table, whatever you want to call it, that has a primary key that contains two foreign keys and then maybe some information about that relationship. We can do a similar thing in JSON world. We can add a connections array that points to another document by the key there. So in this case, our customer, Jane Smith is connected to the father via another document that the father has SKR 007. So again, we don't have to have both keys because it is domestic now, but we can still connect to those other documents and just look them up by their key. And so finally, we have a JSON document that represents a customer. Now, in a single JSON document, the relationship between the data is implicit by use of substructures and arrays of substructures, right? And this is what I was talking about earlier with evolving the data now. Notice we've got the card number there in the customer table. That's a rookie mistake, right? But it's something we can demonstrate here in a simple way. If we want to change that to support multiple methods of payment, we have to go and add a billing table and move that data from the customer table into the billing table. We'll probably have to change a lot of code as well that involves customer and billing information. But over in the JSON world, I've used billing as an array. And so it's embedded right there in the document. Now, one thing you might not notice because it's a little subtle is that on this slide, the key is CBL 2015. On this slide, the key is CBL 2016. Now, both those documents can exist in the same database. So we're not defining upfront that we have to have card number, expiry, and card type or billing. We can just change those as we need to. We have flexibility to change the way we read and write documents as we go. So that is something that coming from a relational background may bore you a little bit because you're thinking, well, now data is just going to be all over the place. It's going to have, I'm not going to be able to rely on how this data is going to be stored. And so that's a valid concern. There's certainly ways to deal with that. But just remember that this is not, this is a tradeoff involved here. We're gaining the flexibility in exchange for more responsibility of how we read and write our data. So I want to go quickly through three different versioning approaches because I get asked about this a lot. So I eventually added it into this slide deck, just some high level ideas about how you might approach this issue of your data changing going forward and how to be more robust about that. So one approach is just by using version numbers in your document. So for example, I've got version one on the left, it just has a name field. And I want to break those out in the first name and last name. So starting with version two, now we have first name and last name. And we can now look at that version number in our application code and say, well, if it's version one, we have to use this class and if it's version two, we have to use this class or we can build some logic into the classes. If our classes are a little more intelligent, we can put some logic in those classes. And they're not just straightforward DTOs anymore, they actually have some of this logic in them. And they'll figure it out as it goes. And name then becomes a computed field in our class, for instance, our Java class or C sharp class. Another approach is a big bang reversing. So we just go in and we do a big update statement on all the data to split it out. Now that's easier said than done for first name and last name, right? But it's not completely impossible or intractable. But other types of data, it's a little more straightforward. So we just go ahead and update document 12902. And we remove the name field and we add first name and last name fields with the appropriate values. Now, this can be a big update or series of batch updates. And this is something that you typically have to do in relational as well, right? Although this might require, again, might require downtime to make this kind of change, or you'll end up with kind of obsolete fields littered throughout your tables, which is not uncommon to see in the relational world. Another one is what I'm calling cooperative reversing. So let's say you have a web application that accesses a piece of data in the course of normal operations. So it may access a user profile. But while we're accessing that data, when we go to save it, go back to the database, let's change it to the new format. So as people log in and use the system, the data gets reversing to the new model gradually. And the data that's in the old format is basically data that's not even in use right now. So you can kind of take this approach with relational as well. It gets a little messy because you'll end up with a name and a first name and a last name column all in the same table, and a lot of null values, which I'm sure no one here has ever seen before, right? But in a document database, you don't need a bunch of those nulls hanging around. You can split name into first, last name, and then ditch name on a per document basis. So these are just three approaches. There's others that are more complex out there. You can use any of these three or a combination of them or whatever you think is the best approach to dealing with that. And I'd love to hear your comments if you've gone through this process before, how you dealt with this issue. And again, to go back into some of the discussion going on about just changing data willy-nilly and not planning and just going like it's a Wild West, you know, that may be an organizational or team issue. There are some tools out there that help your team get on the same page about modeling. So something like Hackalate supports Couchbase and many other NoSQL tools. And Irwin DM NoSQL, which also supports Couchbase. Even something as simple as like JSON editor online.org, which is a, it's a no frills, but totally free offering. This can help you with modeling, can help you with diffing. These help you maintain a little more robustness about how you're modeling your data. So it might be worthwhile if you're concerned about this to, you know, put a process in place, put some tools in place to help manage the structure of your data. Okay, so we want kind of a whiteboard exercise there. What I want to look at now is some other things that may affect how you model your data. And one of them is, you know, how do you plan to access your data? Because this can also affect the way you model your data. So with relational, you really only have the one way to access data. That's a SQL query. No matter what you're doing, it's ultimately going to be a SQL query. That's a reading and writing data. With a no SQL database, you have multiple ways to access data, including ironically, these days, SQL, you can use on no SQL database. So just to talk about key value, I mentioned this earlier on, if you know the key already, it's really simple and extremely fast to access that piece of data by its key directly. This is a C sharp example. You know, I'm not going to go into too much more detail of C sharp, but kind of get the idea here is that we get an ID as a parameter, we look up that ID in a collection and return that as a shopping cart class. We didn't use any SQL to do this. We're just going right through the, in this case, couch based SDKs to read data by its key. And the same thing down there at the second function, we're creating a shopping cart. And I'm just passing in a new shopping cart object that gets serialized to JSON ultimately from a C sharp object. And in this case, it's got a GUID as the ID. So I'm not using an insert there either. I'm using the key value API directly. So it's not being turned into an insert or being turned into a select. It's actually using the no SQL APIs directly. Some recommendations, if you are going the key value routes, since key value is so fast and easy, it's going to benefit you to use it as much as possible. And if you're going the key value routes, some of the things you can do is use a natural key. So I've showed GUID in the last slide, and you can use a GUID. It doesn't really have any meaning though. But if you can use a natural key, something that's not going to be changed very often in your system, an account number, for instance, a driver's license number and ISBN for a book, once it's assigned, it doesn't change very often. And make them human readable. So a GUID is not human readable at all. We can't tell anything about it from that GUID. So maybe user 123 or something like that tells us that it's a user document in there. And if you construct your keys in a way like this that they're human readable and have natural keys, we can make them deterministic. And I'll show you what I mean by that in the next slide, how this can benefit you. And ultimately, all these things are going to give your keys some meaning. They're going to make them semantic. And what I'm not showing yet is that no SQL databases like Couchbase, we'll show you later, have concepts of scopes and collections that allow you to further organize your data. So we'll talk about that. So let's go ahead and get to the demo. So here's a quick example if you're using key value of how deterministic keys are going to be helpful. I'm not going to read this all to you. Maybe we want to take a screenshot and look at it later. But basically, we're walking our way through a graph of different pieces of data without using any sort of joins, any sort of SQL queries, we're just going straight through key value lookups. So we have a username given to us, we can look up that user. And from there, we can go to that user's blogs. And from there, we can go to the blog itself. And from there, we can go to the comments of the blog. And we can do this with just this particular path here has just four lookups. And four operations may seem like a lot from the relational world. We don't want to execute four different select queries. But the key value lookup is so much faster that chances are it's going to be a lot quicker to go through this than a single type of select or join overhead. Some other things to think about when you're modeling your data. And these are just some rules of thumb, some strategies that I would recommend you go this way. So relationship is, let's say it's one to one, or it's one to many, then it's going to benefit you to store that related data as nested objects. Right. So just like we had with the customer earlier with the purchases array. If your data, however, is many to one or many to many, then it makes sense to store them as separate documents. As we showed with connections earlier, we have Jane Smith connected to other customers. We don't want to nest those because it's a many to many to one or many to many. And if we're doing a read, that's mostly a parent fields, for instance, then we can store those children as separate documents. We don't need to pull all that data back in one operation. We can store them separately. And just kind of see where this is going. If we're doing a parent and child read together, then it might make sense to store them as nested objects. So a shopping cart, for instance, might make sense to store all those items as nested objects. We're often pulling back the entire shopping cart in an operation. If we're doing writes, and again, hopefully you see where this is going, if we're doing writes to mostly parent or child, then we can store them separate documents because we want to change them on their own individually without having to modify the whole document. And if we're doing a write to a parent and child both together, then again, it makes sense to store those as nested objects there as one document instead of multiple. So I just went through these six really quickly here. These are, again, not hard and fast rules, but they're a good place to start. So it's a good rule of thumb to think about how am I going to read and write my data? And we'll start with these rules and adjust them as we need to. And one other thing you think about is does your database have a sub document access? So this can vary from database to database. So one thing to consider is, is there a sub document API or whatever it's called available? And if you do have that, then you can even more flexibility with how you model your data. The idea is this, if I just need to update an address of a person, let's say the person document is very large, we don't want to necessarily transport all across the wire just to make one change like the address. So if we can just use a sub document API and specify just the address is what I want to update. I want to read just the address or I want to write just the address. Then we can go ahead and use a sub document API to just work on that part. So this is helpful for very large documents. If you only need a small portion of the data. So keep that in mind. If your database has that, that's another way you can, another thing to take in mind when you're modeling your data. So I've been mostly talking about key value access. Most no SQL databases will have at least one other way to access data besides key value. And what does that have to do with modeling? Well, because modeling doesn't exist in a vacuum. You have to think about how you're going to interact with your data. So I'm going to show you some examples from Couchbase. In Couchbase we have what's called SQL plus plus. That's an ANSI SQL implementation for JSON data. And I'm also going to briefly show full tech search, FTS today. There's other options. I'm not going to cover like analytics and mobile. I'm not going to have time to get into those, but notice they all operate on the same documents, the same set of data. So we're not copying data to different services. It's all in the same core data. And these services are all operating on that. So key value, as you can see, goes directly to the documents versus a SQL query has to go through, usually has to go through an index service and full tech search has to go through an index service and so on. So there's some overhead involved with those. SQL plus plus is a powerful, flexible, declarative nature. It's familiar to developers. You have joins available there as well. But notice once we step out of key value access, we need to involve other processes. We've got to parse that query. We've got to use an index service most likely. And in the end, what it's ultimately going to do, in the case of Couchbase, is going to, again, behind the scenes, it's going to use that key value lookup to retrieve the data. So there is overhead involved, but oftentimes this is a necessity. We can't look up data just by its key for our use case. So as an example here, if we wanted to find all users that have a Visa or Mastercard, that's not going to be stored in the key of the document. So we have to look at those secondary attributes. We have to look inside the data itself, index, in this case, the type, which is Visa or Mastercard in this query. And so of course, just like in relational, if you're in SQL, you have to understand your query plan. This is true for any database, right, including no SQL. So this is a Couchbase SQL plus plus query. I execute this and it ran in, I think, 1.2 seconds, I ran that locally. It's using an index on the, on the name field, I think, or no, maybe it's using a primary index in this case. And I can bring up a visualization of that query plan to see which parts are taking up the most time, right? So the green ones are relatively fast as they approach orange and dark orange and red, they're the ones that take up most of the percentage of the execution plan. So we can see here that the fetch here is taking a lot of the time, taking 79.1% of that computation time. Now in Couchbase, there's something called an index advisor. This is something you're familiar with in the, probably from Relational World as well. But I can click on this button and it'll select or suggest an index for me to add to improve the performance of that query. All right? So this is saying I should index the name field there that might give us some better performance. It's not going perfect, but we can, we can look into that. A covering index might make this even faster, right? So notice we're doing select star up there in line one, probably select star unless you absolutely need everything should probably avoid that and just select the fields that you want. And if we only need two or three fields from this query, we can create a covering index. So we don't even have to go, we can skip that fetch step completely, get the data right from the index, right? Just cross that right out. So that would take out our slowest part of the query to do that. Again, assuming you, you only need a few, you know, a handful of those, right? If you actually do need a dot star, well, then we have to look into other index strategies. Okay. So that's a standard query, SQL query. Another thing we can do is a full text search. All right? So this is like a Google search for your data where we're searching for text, you know, language aware queries, we might want to search and rank them in results in order of their, you know, how relevant they are, relevancy scores, right? So just an example here, I've got a full text search, I've created an index already, and I'm searching for the keyword submarine. We might get back as some results. We're searching landmarks in this case, and we can see that it's highlighting the word submarine, and it's ranked that first landmark there as the highest, you know, probably because it has submarine in there twice. And there's all kinds of interesting syntax things we can do. This is a very simple example. We could do stemming and it's, you know, language aware. So this is English, we can change it to other languages, pluralizations, and all kinds of other filters. We can even use full text search for geospatial queries to search by, you know, points on a map, that sort of thing. So just sum up here. You know, this is something you'd want to go through as you're building out your proof of concept to see if no SQL is the right fit. So, you know, we're looking at these things are all going to affect modeling, right? I would say go key value operations if you can. If you're doing a text based search, go to full text search. And if you're doing anything beyond that, go down to, well, used to be called a nickel, but SQL plus plus is what we're calling it now. And you can, those are different strategies you can use depending on how you want to access your data. Okay. So let's talk a little about migrating data here. And this is a large, a large area of questions I usually get. And there's a spectrum of approaches. These are some of the typical options for migration, and I've mapped these out from risk and effort. The highest risk and highest effort are the ones on top and the lowest risk, lowest effort is one on the bottom. And of course, the trade-off there is that I'm not going to necessarily get all the benefits of no SQL if I start down at the lower end of this spectrum five and four, but it might be where I want to start just to prove a concept or to get comfortable with that. And again, I'm using the term migration here, but it might not be the case that you're abandoning a database in favor of another one. You might want to sync data between them. You might want to make a copy into a more suitable database for your use case, offload from a mainframe, that sort of thing. There are some tools out there in place that allow you to do this and help you go through this process. So these are a few. Glue Sync is one that's very interesting. It's a real-time synchronization tool that goes between, let's say, I think popular databases like Oracle and SQL Server over to Couchbase. NiFi is a very cool data flow tool that you can use to sync data or migrate data as well. Some of these other ones, Informatica, Talenz, you may already be familiar with these, and you can use those to help migrate your data over. You can also build your own. This will give you some more flexibility as you're doing some migration of data. So it may be something as simple as a Python script. Couchbase has an import tool called CB Import. SQL Server has integration services, SSIS, Oracle has Golden Gate, and so on. And so these are all tools you can use to help build your migration to NoSQL. So some migration options, if you're, again, just doing a proof of concept or just trying to get started, I would suggest keeping it simple. This is that level five approach I mentioned with an i towards the level four approach. So you can start, an approaching start with is just to treat the NoSQL database as a relational database. This is what I'm going to show you in the demo here. So just don't do any modeling up front. Just import that data and query it in the same way with lots of joins, lots of pieces of data everywhere. And once you've got that in place, you can start to transform that data using the modeling techniques we discussed to improve performance, reduce the need for joins, and possibly reduce the need for querying, switch over to key value in some cases. So maybe you just do this one entity, one aggregate route at a time. Don't start with a user model or start with a user model if that's where your performance is the slowest you want to start there. And we'll think about transforming that data in the future. So that might be the next step is to actually go the levels four, three or two anywhere up the chain possibly. Again, we're doing a proof of concept here. So we export from relational to CSV. You can just keep this really simple. We import that raw data into a staging bucket, say in couch base. And then we can transform the data via SQL queries or whatever code we want to run into our optimized NoSQL bucket. So those are the, that's the really simple approach we can take, and especially for a proof of concept, that's maybe where you want to go. It's going to help you adjust that model. Ultimately, I think the key thing I want to drive home here is that we need to align your data model, your migration approach, and your expectations all together. So if you don't model your data to take advantage of JSON and NoSQL, then you're probably not going to realize that benefit right away. Nothing magical about putting data in NoSQL is going to make it automatically faster. We still have to think about how we want to model that data. So I want to take you into a demo here. This will be, we're just going to keep this one relatively simple here. I'm going to start with a SQL server database here that I've got running on my machine locally. And I'm using the Adventure Works sample data set here. And this is, if you're familiar with SQL server, you've heard of Adventure Works before. It's a very, it's a sample data set. It's got lots of SQL server features in it. So schemas and tables in there. So for instance, we've got human resources schema, a person schema. Inside the person schema, we've got address, address type, and so on. So if we go ahead and query the address table here, you can see we've got just a result of address data. So address, city, state, postal code, spatial location, and so on. That's all in relational data there. Now, I'm not going to go through the whole process of actually making you wait on this data to move to over to couch base. I did that ahead of time. And I've got a couch base cluster running here in couch base Capella. This is a database as a service, say a cloud based version of couch base. I've got a, you can get a free trial. I'll show you how to do that here later on. But I've got in here, I've got a cluster already created and an Adventure Works bucket. And so you can see this bucket corresponds to the database here in SQL server. All right. So if I go into that, you can see I've got scopes here. So I've got inside couch base, I've got these scopes, and those all correspond to the schemas. So human resources, person production, and so on. See, we've got those all modeled out here. And if we go into, you can see this 13 collections here in person, we'll go into that. And each collection roughly corresponds to a table. So we've got person address, person address type. So we've got address and address type here as collections. So that's roughly the way you map from relational directly to couch base. You can map bucket to database, you can map scope to schema, map collection to table. And then inside of these collections, we've got documents. And these roughly correspond the rows of data in SQL server to a document in couch base. So if I click on one of these, you'll see that there's a JSON data right there in couch base. All right. And so that corresponds pretty much directly to the data here in SQL server. Now notice facial location in SQL server is this kind of binary thing that's harder to read. And utility has changed that into a JSON object that contains latitude and longitude as you can see directly there. All right. So now let's go over to a query here. So I showed you, let's see, I showed you the address table, address data move directly. I want to show you a join query in action here. So I'm joining in this case, person to email address. Let's go back to SQL server just briefly here and run these three queries. We've got person, person phone, and person email address. So in relational, we're going to model those as three different tables, right? So person, if you want to get the phone number, you got to join the phone number to that person, because we want to have in this case, multiple phone numbers for the person. Same with email address, could have multiple email addresses for the person. And so we have three different tables and we join those together typically. And those generally go through in an application to go through an ORM object relational mapper to handle that impedance mismatch. So over here in couch base, the data is being stored exactly the same, right? So we still have a person collection, email address collection, right? So if we still want to do a join between that data, we can do that in couch base. We can execute and we get the results. It is JSON data still, but we could view that as a table. The same kind of results we'd see over in SQL server. So it's still relational style data with a SQL query with a join here in couch base. Now, what we can do is I did mention it was SQL plus plus. So it's more than just SQL. We have this other keyword here called nest. And this is going to take the results of the join and nest them in an array within the first result set, right? So it'll look a little different. The result here is going to be the first name, last name and an ID, an array of email addresses for Ken Sanchez. And if Ken had more than one email address, it'd be multiple email addresses here that are embedded in the JSON. Now, this is a query that's still doing a join to get that result, right? What we might actually want to consider is do we take this email address and actually put it inside the person documents and just eliminate that email address's collection completely, right? Something we couldn't do in the relational world. And I've actually already done that as well ahead of time just again to save some time here. So we'll go to collections and person. And we go over to this person collection. And in there we've got, this is I think Ken Sanchez's document, if I'm not mistaken. So I've already, through my utility, I've automatically used those foreign keys to map the joins there. So it embeds the email addresses into an array. It embeds the phone numbers into an array and stores them inside this person document. Right? So Ken Sanchez, I can just do one get now to get this complete document as opposed to, you know, the two or more joins that I need to create this document to get the same results in my application. Okay? So that is kind of the decision you have to make. That is really the ultimate core of modeling in JSON data. It's this decision, do I embed or do I keep my data separate and relational style? That really is the key tradeoff there and the key decision in modeling relational data. One other thing I should mention is, I did mention joins. I think I already mentioned asset as well. This is an important part is if I have two, if I'm still modeling my data in relational style where it's in multiple pieces, an asset transaction is still going to be important if I want to update those all together. So that is also available here in Couch Basin. It's very common in NoSQL today to have an asset transaction. All right. So let's head on back over to the slides and wrap up here. So just a few points I want to drive home and then we can get to your questions is, I do want to say pick the right application and those use cases are definitely growing. But, you know, make sure it's, you know, the reason you're getting to NoSQL is something that NoSQL is designed for, right? Is it for scaling? Is it for flexibility? Are there certain performance requirements you need to meet, right? And if you have an architecture that's SOA or microservice or some sort of use case specific application, you don't have to rewrite your entire app, right? If you're a monolith, it's going to be more challenging. But you can slice off one of those microservices and say, well, this is going to be served better by NoSQL database. And let's try Couch Basin. Let's try to prove a concept with Couch Basin. That's the second thing is try to prove a concept. This is where you can really get into, you know, how you want to model your data. Do we want to embed? Do we want to keep the data separately? And have a focus on your proof of concept. Just, you know, have some success criteria. What are we trying to accomplish? Let's work through it with a proof of concept. Let's review the architecture. Let's consider some of those modeling tools like Hackalade that I showed earlier, Erwin. And you can just collaborate on that in a rigorous way and say, is this meeting our requirements or not? We don't have to go all in. We can just try a proof of concept. And then one of the things I definitely want to say is a key point of success with modeling data is matching the data access method you're using to the requirements, right? In the relational world, we use SQL for everything. In those SQL, we have choices. So we don't have to use SQL. We don't have to incur the overhead of SQL and indexes for every single time we interact with data, right? So key value is the fastest way, not as flexible, but it's the fastest way. Let's get as much in the key value as we can. Full text search for our text and geographic searches. And then finally, we can settle on querying because it's flexible and it's familiar. So we can always fall back to that. Next steps for you. What I recommend is you check out the couch-based playground. It's completely free, interactive, online experience, no download required, right in your browser, couch-based.live to start running some code samples right now. If you want to try couch-based Capella, 30-day free trials going on, you can go sign up for that. No credit card required. It's what I showed today. Everything I showed you in that demo there was done in couch-based Capella. So you can all try that there. If you're interested in the SQL server to couch-based, it's a totally open-source tool. It's my own kind of personal project because I come from a SQL server background. So if you are interested in that, feel free to drop by, leave a criticism, leave a suggestion. That's where I can really respond and improve the tool is with that kind of interaction. So that's it I have for today. I think there's a lot of activity that scrolled by in the chat there. Happy to answer any of those questions. Shannon, have any of you picked out? You want to go over? Yeah, absolutely. Matthew, thank you so much for this great presentation. If you have questions you'd like me to ask Matthew, it is helpful if you get your questions into the Q&A portion of your screen so I can queue those up there. And just to answer the most commonly asked questions, just a reminder, I will send a follow-up email by end of day Thursday for this webinar with links to the slides and links to the recording along with anything else here. And again, so many, as you mentioned, so many great questions coming through in the chat here. Is there any limitation for document size? What about blob data types? Okay, so for couch base specifically, document databases in general definitely have a size limit. Couch basis size limit per document is 20 megabytes. As far as blob storage, it's not really meant for that. It's not that kind of database. If you're interested in storing large files, large binaries, couch base can store binaries, right? But again, there's a 20 megabyte limit. So you probably want to think about something like your Azure blob storage or your AWS S3 kind of thing for that. Awesome. Again, trying to get through these. Okay, we got some great questions coming in here. Could you please expand on the deterministic data access? Yes. So that was all about having a strategy with a key value. So let me just go back to that. Maybe I can walk through it here. This is something that is, if you want to go, if you want to take advantage of key value as much as possible, this is one way you can do it. If this is not important to you, if you think SQL is going to be the way you want to go, this isn't going to matter as much. You can just use GUIDs for the keys. But this is just a way of constructing keys in such a way as going to allow you to stay in key value as much as possible. So this is just an example. If I log into a site with my, I type in my username, then I can go look up that document in the author collection by the username. And from there, I have the username token now. And I can look up another document. If that user wants to click on view my blogs, then I know the username already. So I can just look for a document called author colon colon mat colon colon blogs. And that will give me a document that contains a list of my blogs perhaps and so on and so on. So it's deterministic in the way that I can create these keys the same way every time. I don't have to go through a index process through SQL query to get to a list of blogs for instance. So that's kind of what I was going with there just as an example. And this is something that is not couch-based specific. This strategy can apply to any key-based, key-value document or document database. Awesome. So just defining the carnally in the SQL really matter and is it not easy to determine visually in the SQL as an ERD? Well, so to answer the second question, visual ERD, I did mention a couple of tools. And I used to demo these, but they weren't, most people weren't really interested in them. So I kind of skipped over this, but a hack-a-laid demo right here. Hack-a-laid is what I used to demo because it is a very visual tool that shows those type of diagrams in a no SQL environment that shows the relationships between the different types of documents. So I definitely encourage you to check that out. A hack-a-laid and Erwin both do that kind of visual representation of the traditional ERD sort of thing. As far as cardinality, I think it does really matter. I guess it depends on your use case, but if we're talking an unbounded type of many or one-to-many type of relationship, then that's something you don't want to necessarily embed. I went through that here like a one-to-many, right? So if there's a lot of purchases, it's probably something we, if it's going to be unbounded, we probably don't want to embed that because we're going to run into that 20 megabyte limit, right? So if that's what you're getting out with cardinality, if you have something else you want to specifically talk about, I'm happy to do that. Awesome. Also, can you please elaborate on semantic with no SQL database? So I'm seeing that question. I'm seeing a capital S there. So there may be something called semantic that I'm not familiar with, but I did talk about, yeah, key value recommendations. And semantic is just a fancy word that says it has meaning to it, right? As opposed to a GUID, which is just kind of randomly assigned or some sort of ID number, which is arbitrarily assigned, having a semantic key is going to give you some of these benefits where it's human readable, where it's, you know, in some cases, a natural key or you can use it to build deterministic key value access. That's all I meant by that. Unless you mean some specific semantic product, which I don't know about, that's what I meant with semantic. I love it. Well, you let the questioner add any additional comments if they would like to expand on that. So, Matthew, what types of joins does Couchbase support? So Couchbase supports all the typical joins that you'd used to from relational world, inner join, outer join. I showed you Nest, which is an interesting type of join that's available to JSON data. So inner join, outer, left outer join, you know, I think that about covers it. I don't think we support right technically, which is just the opposite of left. Yeah, I don't I'm just there's also, you know, multi, you can join on multiple fields, right, just like you couldn't relational, you can join on the keys, you can join on the combination of those things. Yeah, someone else just followed up sub query, table expression supported. Yes, you can absolutely do sub queries, you can do common table expressions, the width syntax is there, for instance. Very cool. So I think we have time for a couple more questions here. So do we have an option to restrict data access? Restrict data access. Okay, so as far as authorization, yes, you can create users that have permissions to, you know, just certain buckets, just certain scopes, just certain collections. Absolutely, that's something that's built into Couchbase. And you can restrict, you know, do I want read access, write access, do I query access, all those sorts of, you know, full text search access, you can restrict all those things individually. And has Couchbase, have you been using it in the healthcare medical patient database, or does the 20 megabyte limit become a problem for certain imaging types? Well, so I know Couchbase has been used in the healthcare industry, we have several customers. And I believe if you're familiar with the FHIR data format, FHIR, which is a standard JSON format for the healthcare industry, we, you know, we're a natural fit for that because it's a JSON database. Now, as far as imaging docs, the 20 megabyte limit would be a problem, you know, if you wanted to store, you know, large binaries in Couchbase, for instance, which again, I would recommend against using Couchbase for that. And I would say Couchbase would be good for metadata, right? And actual text, you know, data that you can put in JSON format, be very useful for that. But the actual images and videos and scans and things like that, I would, I wouldn't necessarily want to use those for Couchbase. Not saying our customers don't, don't store that type of stuff in Couchbase, but it's just really, I don't think it's the best use case for them. I love it. So I'm going to slip in one last question here. What, or how do you split the data across nodes? Do you use partitioning to define which data is stored in each node? So that's a really excellent question. And thank you for asking it. Couchbase is kind of unique in those SQL databases in that we do use sharding, which is another word for partitioning. However, the sharding is completely automatic. So as a developer, as a DBA, you don't have to go in and define those shards, those partitions and manage those and rebalance those yourself. Couchbase will actually use a hashing algorithm based on the document key that will then automatically assign it to one of the shards. And those shards, it's a fixed number of shards that's spread across all the data nodes. So it's completely automatic and not something you have to deal with as a developer. And that also means that every data node in Couchbase can handle both reads and writes. So it's not a primary, secondary type of architecture. It is a shared nothing architecture or master list architecture, just relatively unique in the database world. That is very cool. I think that's the first time we've been asked that in since you've been doing these webinars with us. It's very cool. And actually, there's time for one more question, because I got to get them all in Matthew, you know. Data masking and internal encryption, data at rest capabilities. Okay, yes. So with Couchbase Capella, this is a fully managed, a cloud based, you know, database as a service. And so this takes advantage of whatever data at rest capabilities that the cloud provides, right? Whatever cloud operator is on provides, right? So I'm on AWS. So I'm not an AWS guy. I just, I don't know what it's called, but there is a data at rest service that uses to store that rest. Yes, data masking. If you want to get more granular, the Couchbase SDKs support something called field level encryption. So if you're storing, you know, say 20 fields of JSON and you want to encrypt one of them, you can also do that. And that's another form of data at rest encryption of a specific field, right? So even if that, all that data gets somehow elite as the raw JSON, it's still going to be that field is going to be encrypted. Yes. And of course, data in motion will be encrypted as well TLS between the nodes, to and from Couchbase cluster itself. Yeah. Perfect. Well, Matthew, thank you so much for another great presentation with us. And thanks to our community for being so engaged in everything we do. I just love it. So just again, a reminder, I will send a follow up email by end of day, Thursday for this webinar with links to the slides and links to the recording. Matthew, thank you so much. And thanks to Couchbase for sponsoring another great webinar. Hope you all have a great day. Thanks, Shannon. Thanks.