 Hello, and welcome my name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We would like to thank you for joining the DataVersity webinar today, which is data modeling and relational and NoSQL sponsored today by Couchface. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. Excuse me. For questions, we will be collecting them by the Q&A section, or if you'd like to sweep, we encourage you to share our questions by Twitter using hashtag DataVersity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To find the Q&A and the chat panels, you can click those icons at the bottom of your screen to activate those features. And just to note Zoom defaults the chat section to send to just the panelists, but you may absolutely change the chat with everyone. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to our speaker for today, Matthew Groves. Matthew is the Product Marketing Manager at Couchface and is a guy who loves to code. He's been coding professionally ever since he wrote QuickBasic, a point of sale app for his parents' pizza shop back in the 90s. And he's the author of AOP in .NET published by Manning, a Perl site author at the Microsoft MVP. And with that, I will turn the floor to Matthew to get the webinar started. Hello and welcome. Thank you very much, Shannon. I hope you're okay. Everything's all right. Yeah, of course. I start choking right when I start. I'll give you some time to take a quick break and make sure everything's okay. But thank you very much. And welcome everybody. Thank you for attending. We're going to talk about JSON data modeling here. And we're going to focus a lot on moving from a relational data model over to a more flexible JSON data model in a, you might see in a document database like Couchface is a company I work for. So this is, this is probably the view that most developers are familiar with when it comes to modeling data. But if you're looking, if you're interested in and you're looking at some of the non-relational databases, such as Couchface out there, you have to start thinking a little differently about modeling. Now, fortunately, Couchface does give us a lot of familiar tools and concepts that will map directly from the relational world into Couchface. So ultimately, we're going to want to take advantage of the unique capabilities of JSON and NoSQL and Couchface and do some refactoring. So that's the journey we're going to go on today. I'm going to spend a little bit of time talking about why NoSQL is important, why people are using it. We're going to talk about JSON data modeling in general, and then we're going to look into the things that affect the way we model our data. So one of those is how we access data, and then how we're actually, if we're going to move data from another source like a relational data source, how we might go about migrating that data. And then I've also got a demo for you today that shows some of the tools and some of these concepts in process. I'll be a live demo for you. But why NoSQL? Because we've been using relational for a long time, and SQL is a really great language to dealing with data. But there's some things that relational doesn't really handle that well. It doesn't really handle change that well. So changing of the schema, for instance, both a logical and physical type of schema, changing of hardware, changing of capacity. So NoSQL is designed to help solve those problems, the problems of agility and scalability, performance and availability. I'm seeing some great jokes here in the chat. Three relational databases walk into a NoSQL bar. They left after five minutes because they couldn't find a table. That's great. So yeah, there's no tables in a non-relational database. But as you're going to see, tables do map very neatly to some concepts that are new in CouchBase 7. So let's just talk about what NoSQL is first. NoSQL is such a broad umbrella term. I really don't like using it. I'd like to use something a little more specific. And even the term itself, NoSQL, databases that lack SQL, that's not even really true anymore as we're going to see today. But this is kind of a broad landscape with different types of NoSQL out there. Most of the most popular NoSQL databases usually have a multi-model approach. So it's more than one of these different types of models. We're going to focus on document today. But as you see, as you're going to see, CouchBase can also support a relational style model, text, search model, analytics, and all kinds of other things. So it's really not as simple and straightforward as it used to be. But this is just kind of a little bit of history. NoSQL was kind of called that because it didn't use SQL to access your data. It was no SQL. But of course that buzzword has lost a lot of its meaning and it's changed a lot. But let's just think about document databases at the very simplest level here. Let's think of them as like a specialized key value store. In fact, most document databases, you can treat as a key value store as well. Just the value is in a known format, typically JSON. So you're going to write some code. You start with a key, a known key, and you ask the database to give you the document for that key. And you go in reverse when you're updating and when deleting and things like that. So that's just kind of the really basic intro to it. And there's a lot more built on these really simple key value stores that make document databases a lot more valuable as a general purpose database. We're going to talk about that as we go along. One of the reasons that NoSQL has taken off is because of scalability. And so when I say scalability, I mean the ability for you to increase the resources of your database to deal with more web traffic, more operations, more reads and writes and large amounts of data and so on. And so one thing NoSQL does well is called horizontal scaling, where instead of just having a bigger machine with more processors, we just add additional machines to what's called a cluster. So as we need more capacity, we can add more nodes to the cluster is what it's called. And we can keep doing that as long as we need to, to add more capacity. And we can just use all kinds of kind of, what's the word I'm looking for. I'm just, you know, relatively inexpensive machines is what I'm saying to keep adding to this cluster and we can actually scale down as well if we don't need as much capacity maybe in certain times of the years certain times of the day, we can scale down that capacity. Commodity is what I was looking for commodity servers to scale out. Yes, thank you, very, very good. Great minds think alike. Another reason is flexibility. So this allows you to easier, more easily manage change in business requirements and more easily manage the change in the structure of data. And this can be very useful in lots of situations. So sometimes when you're pulling data together, integrating data from different sources that flexibility really helps you don't have to worry about if someone makes a small change, whether that's going to blow up the schema and blow up the process. The document database means that you don't have a rigid schema that the database is going to force you to follow. So the developer gets more responsibility in that situation. Now that being said, what we're seeing here on the screen, this is not an ideal situation where we have documents that have both name. Some of them are named some of them are first name last name. So we just often discipline about our data and come with a plan in order to either handle the situation or to make the data more uniform. Availability is another one. This is a picture I took of my wife at movie theater, many years ago. And I told her I was going to put her on my slide decks because she's demonstrating availability so well. The idea here is if, if you have a cluster of databases as we do, if one of those machines goes down. The others will still be in place to take over the workload. And you can see the movie theater here. One of the soda machines is down for maintenance of some sort but we're still able to get soda from the other machines that are available in the lobby. So that's what we're talking about what comes availability. There's different levels of this. So if a whole data center goes down, we can be available and other data centers. If the internet goes down we can be available on mobile devices and so on. Performance is another one latency and concurrency are typically talked about here. No sequel can be optimized for certain access patterns couch base specifically has a memory first architecture that makes it very, very fast, you know, with those key value operations. I'm not going to really go into a lot of benchmarks here today we have several on couch base if you want to check those out against competitors, competitors in the cloud and so on. Use cases are increasing all the time use cases for no sequel. It's becoming more and more versatile. And there's more features that allow it to handle use cases that it couldn't pass so some of the more traditional ones are caching session user profile for document databases and certainly catalog and content management and personalization are important. But over in the bottom right of list you'll see some interesting use cases show up there as well so finance fraud monitoring. We're seeing some of that some of the features of no sequel especially in couch base with acid transactions allow all kinds of use cases to run on no sequel database. So, those are a lot of the use cases but usually the catalyst is one of those earlier reasons so performance flexibility scale, and so on, and just a quick quote here before we finish up the sections that different isn't always better, but better is always different. So with that in mind we're going to get on to actually modeling just a quick reminder here that if you have questions you can throw them there in the Q&A box there on zoom and we'll get to those at the end, or you throw them in the chat and we'll try to get there towards the end as well. So let's start talking about data modeling data store is Jason it's a little different than storing in tables I want to take you take us through an exercise and this. So let's just look at a whiteboard exercises look at a customer data we haven't even decided technology yet we just want to get it on to whiteboard and have something to discuss so this you might do as a part of a proof of concept for instance, I'd recommend proof of concept for evaluating any technology, including databases, but there's four different things we see here in here we have a rich structure. So our customer here has attributes potentially sub attributes so name is an attributes, or if we had first name last name those would be kind of like sub attributes. And then we have attributes to other data, so other customers via connections or or maybe two products via purchases. And we have a value evolution so maybe we'd start with one purchase, and add, add more as, whoops, and add more as Helen continues to make purchases we'd add those to the purchases collection there, and structure evolution is the last one maybe a single credit card number, but we want to add multiple credit card options multiple billing options for Helen so we need to maybe evolve the structure to deal with that I'll talk about what I mean here. The next slide. Let's look at modeling customer data in a relational database I'm not going to go into too much detail because I'm sure many of you familiar with with this process. But we have all those things represented here in a relational database or rich structure, you know customer has columns that contain attributes or sub attributes. We have relationships so the customer can go through the connections table to other customers. We have value evolution. We can add more rows to the purchases table and structure evolution is perhaps the, the trickiest one for relational data, the one with the, that's the least flexible option. So, again, maybe we started with a credit card number column in the customer table, not a great design but maybe that's starts out that way. And we get a requirements that we need to take multiple billing options so now what we have to do is create a new table, migrate that data over at a foreign key, remove the columns from customer and a lot of work to change the schema and that's for a relatively minor change. We have a large amounts of data in our relational database this could mean downtime or coming in at 2am on Saturday, which I know everyone loves to do to make that schema change. Whereas as you're going to see the document database does not have such requirements to change data and change a change the flexibility of data. So let's take a look here. We'll do a side by side we got the relational model on the left hand side of the screen and the document model on the right hand side of the screen, just sort of take you through this. This is a very simple translation here we go from just a row of data and a table to a piece of JSON data in a document. This is the document key kind of corresponds to the primary key and the values just go into JSON values there so relatively straightforward if our data was this easy and this would be a very, very easy transition. Now let's add a purchases table into the mix so we've got a customer and this customer has one or more purchases. And this is where we introduce a price or a foreign key into the purchases table that points back to the customer that the purchase belongs to. So now we've got two pieces of data in two different tables in the document model we still have just the one document but now I have a purchases array containing those purchases in this case just the one. And so it's a single piece of data now no need for a foreign key because the data is now domestic. And if we want to value do a value evolution we just add another road of purchases and another item to the purchases array in the document. But again, we're still at one piece of data in the document database and three pieces of data in two tables in the relational model. So it's kind of like you're taking your data your entity and putting it through a paper shredder, and then you have to reassemble it. Every time you want to access that whole entity of customers and purchases and there's lots of tools for this you may have heard of like hibernate or entity framework or ms they're called that take all the shredded up data and put it back into entity form. But of course with the document database we don't need such mapping we can just serialize that I'm sorry. D serialized head directly to our in memory object. So that's that's a benefit of storing documents like this as well. The structure evolution. This is where oh no not structure evolution yet. We're talking about connections now so we have a customer map to another customer, perhaps with some relationship information like brother or father. And we do this very similar to this in the document database we have a connections array. And again we don't need the foreign key because it's all domestic now, but this can point to a separate customer document. So for instance xyz 987 points to another document for Jane Smith's brother. And so that's what we're looking at here just an array of a another document key and any information about that relationship could be in there. All right so let's put it all together in one big picture here. We have the customer contacts purchases and connections tables. So we've got seven pieces of seven rows of data in four different tables. In the relational world and we've got just a single document on the right in the non relational document world. So that is kind of the comparison there. Now I mentioned the structure evolution. So we want to add the multiple billing options so we have to go through and refactor our schema to do that and that can be a very time consuming process this example here is relatively simple. This is only five tables, but I'm sure many of you worked in schemas with many more tables and much more complex schemas and when you have to change those that can be very disruptive to production. So for a document database. All I have to do is start writing documents with the billing array in there instead. So, one thing you might not have noticed here this is kind of subtle is that this is CBL 2016 the document there document key on the top right CBL 2015 is still in database. The documents can exist in our database. So that's what I mean by flexibility we can start making changes and just write the new model as we go. Now of course there is some logic with the deal with in terms of what do we do with the data that's in the older type of schema and there's lots of options to deal with that in terms of logic in the application that can deal with it. So we have zero or migration tools that can, they can do that as well but the important thing here is that to deploy this to production we don't we have zero database downtime where we have to deploy a new application, perhaps, and maybe run some scripts but we have zero downtime this can all run without having to come in to anyone Saturday. So some of those approaches, by the way to versioning want to go through those because I usually get asked about this like how do you handle data being in different formats so the first is kind of a simple approach based on version numbers. Both documents exist in the same database but you can do different things in your code depending on the explicit version number. And the version can be more than just a number. It doesn't correspond to a dot net type or an assembly however complicated you want to make it. This is approach that you can't really take in a relational database because they both can't exist in a relational database. Second approach is like a big bang reversing. This is just a big update command we can run or series of batch updates to change the data from one format to the other we can kind of do this in relational but it's multiple steps, changing again possible, possible downtime. Then the third approach, I like to call cooperative reversing this is a little more complicated but it's a very cool if you can pull this off is that a web application access is a piece of data in the course of normal operations. But while we're at it, we can go ahead and change it to the new format so as people log in, use a system, the data gets gradually reversing to the new mod to the new model. It kind of takes this approach with relational but again it gets messy because you end up with a kind of a middle states where you might have a name, first name and last name column in the process you might have a lot of nulls in the process there. So there's some examples of different approaches you can take or a combination or something else. So there is a conference coming up couch base connect conference and there'll be at least one or two sessions on just the versioning of data, the migration of data to handle the flexibility of Jason so if you're interested in that I'll have information for you at the end of today's session. Some modeling tools to help you along the way if you want to take a more rigorous approach to this something like hackalade supports lots of no SQL tools including couch base Erwin also supports couch base. Some of the tools you may have in your enterprise already, or something as simple as Jason editor online.org it's a no frills offering that can help you with modeling and diffing and things like that. Okay, so that's just kind of a crash course into Jason data modeling compared to relational data. One thing that's important to point out though is that the way your data is modeled is not just about a whiteboard exercise. It's also about how your application is accessing the data. This can affect how you want to model it. So with relational you only really have one way to access data that's just SQL see only way you can, you can access it. Someone commented earlier that no SQL stands for not only SQL. And that's kind of a good point because with a no SQL database we have multiple ways to access data, including SQL. I guess that's kind of ironic. We'll get to that. So just a some examples of this key value as I mentioned earlier with the happy face diagram. This is a key already. It's a really simple and extremely fast way to access that piece of data. This is a C sharp example here but it's going to be similar in other languages. So the first function is, we have an ID, and we're going to get that JSON document by ID and we're going to return it as a shopping cart object in C sharp it's going to be serialized to that. And similarly create shopping cart is we're just saying collection insert and giving it a new ID in this case like good and shopping cart object. Notice I didn't have to write select or insert just using the key value API directly. Since key value so fast and easy it's going to benefit us to use as much as possible so if you can do this. This is this is what I'd recommend so you can some guidelines to do this is use a natural key so a key that makes sense. That is a semantic that has some meaning to it so for instance it may be an invoice number or a driver's license number something like that something that's that's really not going to change, and that it naturally fits the piece of data that it's a key to. And that also makes it human readable as well so for browsing and see bunch of goods that doesn't really help us too much. I'm not saying you can't use good we certainly can if you if you want to but if you use natural keys this is a good way to go about it and make them deterministic as well and what do I mean by deterministic so. Let's just think about a website that is a blog for instance and has multiple authors. I'm going to log in to the back end, and I, my username is Matt. So just by giving the system that information it can look up a document called author colon colon Matt. And from there we can follow a trail of other documents, these documents would contain keys to other documents like Matt's blogs, or you know just implicitly we can say well I'm logged in as Matt, Matt wants to see his blogs will let's find a document called author colon Matt colon blogs, and maybe that would contain a list of things like C sharp nine features, and that would be a blog slug. For instance we could look up and we can look up comments and so on so we can walk through this chain of documents with only key value access. We haven't touched any sort of sequel query, just yet. If we're looking at key value this is a good strategy for just about any no sequel database that is used this has keys and JSON values. And so some rules of thumb that I want to go through when you're modeling data. This especially applies to the key value API so if we have a relationship that's one to one, or one to many. Then it's probably a good idea to start with your related data as nested objects so here we have Jane Smith and her purchases. They are nested. So they belong to Jane Smith and that is a good strategy to take for that. If the relationship is many to one or many to many for instance, you can store data as separate documents so instead of having those documents nested we just have keys to those documents in the connections array. And another thing look at is your data reads. If your reads are mostly going to be the parent fields of that relationship. Then again it might make sense to store those as separate documents otherwise we're going to be getting Jane Smith's profile and Oliver connections and purchases and everything, and maybe we don't want to transfer that much data over the wire. So if your reads are mostly parent and child together for instance a shopping cart for might be a good example. It makes sense to store this as nested objects. This means we can get them all with one look up instead of multiple look ups. And I think you see where this is going if we're if we're doing writes that are mostly parent or child, not both. We can store them as separate documents again because we're going to be writing one or the other in no sense passing them. We're getting over the wire, and finally just finished up for completeness if the rights are mostly parent and child together so again shopping cart user profile that sort of thing store them as nested objects so here's the screen I usually pause on when I'm in person let people get their cell phones out and take a picture. This is kind of a good rule of thumb guide to modeling your data to best optimize for a key value lookup strategy, and this is not the only way we can access data in a document database but it is a very fast and efficient way so if you can go this route, go this route as much as possible. Now since you're all at home you can just hit the print screen button and get a screen shot of this. Okay, one other thing to consider with key value lookup is something called sub document so now we're getting into an area that can vary from database to database can, you know, does the database support a sub document API. So if it does then you have even more flexibility. Some databases have them and call them something different some of them don't have it at all so make sure you know that going in I know couch base does for sure have sub document access, but the idea is this if I only need to access a person's address or update their address, then I can just identify that portion of the document that I want to read and write and then I can leave the rest on the server and just make changes or read that specific part. So this can be very helpful if you have a large documents doing a lot of reads and writes nearly a small portion of data for just want to flip one, one bit from true to false for instance, there's no sense getting the entire large document flipping a bit and then writing the whole document back. So that's something else to look for. Okay, so we've been talking about key value access up until now I want to start talking about other ways to access data as most no SQL databases will have at least one way to access data besides key value. And so what does that have to do with modeling, again because modeling doesn't exist in a vacuum, you have to think about how you're going to interact with your data. So here's some examples from couch base. Just the three I want to highlight there and green the other two exist but I'm not going to spend time with them today. The key value you see there goes directly to the documents, no overhead involved in couch base we have something called a nickel and one ql which is a full sequel implementation. My favorite thing about couch base and it really can't go directly to the documents in some cases it can but most of the time it's going to have to go through indexes and a query parser and things like that. So there's some overhead involved there. Full tech search is I'm going to talk about this very briefly today as well. It also has to go through indexes. It also has mobile capabilities and analytics capabilities and lick stuff is very cool actually. My favorite baseball team who's just got eliminated actually since that he read are using analytics at their organization and Domino's pizza, also using analytics. These are things you can learn more about at the connect conference is coming up. Anyway, into the query language so couch bases nickel and one ql. It's a, it's powerful. It's very flexible. It's declarative nature and it's very familiar to developers because developers are used to joins and selects and CTS and sub queries and all those kinds of things. But so just for example, if we, if we don't have. If we want to look up users that have either a visa or mastercard as a payment type really can't do that via key value look up. So we need some sort of query to look at the secondary values of the JSON data. So that's what's happening here we're saying give us all the users who have card type of visa or mastercard. So if we step out of key value access we have to involve other processes so parsing the query, most likely using indexes, and in the end it's going to use the key value look up anyway, again behind the scenes to get that data so there's some overhead involved but sometimes it's necessary in order to get to have a flexible query that gets you the information you want. So it's important to dive into these queries and understand how they work and to better optimize them just like it is in relational world. So as an example here's a couch based SQL query I executed this one and I think it ran in 1.2 seconds just put a benchmark down there. It's using an index on the type field. But notice the line for it's also looking at the name field. It's looking at the visualization of this query plan in couch base to see which parts of the query are taking up the most time. So we can see here it looks like the fetch statement is taking the most time 187 fetches. And unfortunately, it's only it's filtering out most of those in the next step in the filter so that may be an area of waste we want to look at. In our query if that's slowing us down. So in couch base is actually also an index advisor. So you can execute a query and say give me some advice on this query and it'll tell you what indexes it's using right now. And what indexes it recommends that you use to speed up the query making more efficient. So there's a recommendation there I went ahead and created that query and the same query went to 146 milliseconds. It's about eight times faster than we started with so that's good. And if we wanted to eliminate that fetch step, then what we can do is see this line one, most of you who are familiar with SQL are probably cringing at that anyway, where we're selecting star. If we select just the fields that we need for instance maybe just one or two fields, we can actually index those fields and then skip the whole fetch step, get the data right from the index. So we can speed up even more. Okay, so that's SQL, a full text search is also available in couch base and this is this revolves around text, yes, as you might guess from the name, but also some things that you may not guess from the name so it can also do some geospatial searching so you can search by say a radius around a location latitude and longitude. To actually full text search uses a different kind of index than SQL uses some of the code inverted indexes, whereas the SQL language uses a modified be tree. So anyway some things like stemming language awareness facets ranking can all be used here used here from full text search. In the end, the results are all ranked, and they're all language aware, so I just search for the word submarine, and I got this as my results you can see it's highlighted there the search results and it's determined that that first result is more relevant than the second result. So this is something I could use instead of that like keyword for instance if I'm doing text base searches. All right, so here's another cell phone screen for you. If you're looking at a no SQL database this is kind of the, the, the, the three steps I'd go through is if I'm trying to figure out how best to access data I'd start with key value if I can do that. Then that's where I'd go. Then if I'm doing something with text, I would look into full text search instead of a like or this is kind of a weird edge case but if there's a large amount of disjuncts I might use this engine as well. And finally, nickel or SQL is going to provide the most flexibility. So it can query pretty much everything else but you must have good index in there so make sure you're checking out your indexes and optimizing those the best you can. So speaking of optimization, we're going to get into that a little bit today we're going to go through an example of migrating data over. And I say migrating but you know it can also mean synchronization we're not necessarily throwing out a relational database or at least not right away throwing it out. So we often have to sync them so either one of those processes think of it as migrating or synchronization. So there are a lot of options to migrate and one thing that couch base seven has done the most recent release of couch base is enabled. Number five and number four on this list. So oftentimes you'll you'll see like we need a new database so we have to rewrite the whole application although all over again to take advantage of that database. And that's a high efforts high risk potentially expensive thing to do not always a bad thing because you get to take full advantage of the capabilities. But it might be nice if we could just start with the data as it is as it is in relational database and just hosted in a no SQL database with as few changes as possible. And just see how that works and since couch base has SQL has joins it has asset transactions. That's actually possible right now. And then we can optimize later and I'm going to show you that in this demo here we can take a look at our model and and combine and nest those objects when it makes sense to. And that's going to improve our data access and with some data modeling changes to meet our performance goals. So anyway, some of the tools you can use to do this kind of migration is a lot of them. Some of these you may have heard of may have at your organization already some of them you haven't heard of a couple I want to point out is a glue sink there that's a commercially supported tool that supports real time synchronization between relational databases such as SQL server and Oracle over a couch base which is a very cool tool to throw in there if you want to if you need to sync or you want to start migrating away. Another one I want to point out is Apache knife I it's a very cool open source project I don't know if you've, I feel like it's not getting as much attention as it should but it's a UI that allows you to visualize the data flow in your enterprise, set up processes to stream data around different databases and different data sources in real time. Very cool. There's lots of other tools available there as well. You can also try to build your own which is kind of what I've done is use some coding or some scripting to build your own kind of migration or syncing process. So it can be as simple as a command line tool CB import it comes with couch base that can import CSV or JSON data right into couch base. SQL server has SSIS Oracle has golden gate and there's lots of other command line tools, Python PowerShell bash those sorts of things they can take advantage of rest API as well. But here's kind of how I think of the process and back to that one through five those five different levels. This is kind of like the level five approach this is the keep it really simple approach the kiss approach if you will. And so this you can start with this approach and just treat the no SQL database as if it were a relational database. Don't do any modeling or any remodeling just import the data as it is, create the same way with SQL, you know as many joins as you need, and have lots of that shredded data those pieces of shredded paper have them everywhere in your data. So once you've got that in place, you can then start to transform the data, usually using those modeling techniques that we've discussed to improve performance, reduce the need for joins, and possibly even reduce the need for query, you can go back to the key value API. And maybe just want to start with this one, like one root entity at a time maybe start with the, the user profile model and then go from there. You know, start, start where it hurts the most what what part of your application is the slowest and needs refactoring that's where I'd start with this kind of remodeling. And then we're going to get to level four, and then possibly even level three and two at this point, when we transform this data as we import it into like a staging area and we transform it into a more optimized form of data. So that's a really simple way to do this. We could do export from relational CSV, import from CSV into a staging bucket, have some nickel scripts to transform that data into the optimized bucket and insert that results that transformation into that new bucket. So that's the really simple type of process that you could go through and it can get more complex than that. You can, of course, use more complex tooling if you want to. I think the last thing, in terms of your migrating your application over we need to think about aligning your data model, your migration approach and your expectations to what you're actually doing in the database. So if you don't model your data to take advantage of JSON and no sequel, you probably not going to realize the full benefit right away. So it's fine for a proof of concept but ultimately you have to start optimizing that model. So in my mind I want to show you something that I've been working on. This is a build your own, although it's open source anyone can use it if you want to. But this is something I've been working on for initially level five and then I started to make it to level for approach. And this is a tool that I've written called SQL server to couch base. So let's start with, hopefully you can see my screen here still. This is a SQL server management studio and I've got adventure works database loaded up here, which is a popular SQL server sample. And it contains a lot of data about I think it's a bicycle shop is what it is. But you can see here we've got all these tables here. And each table lives in a schema like human resources or person or production. Each of those tables has its own name so address table lives in the person schema address type lives in the person schema and so on. Now what I can do with couch base is, I'm not going to, I'm not going to show you this process but what I've got over here in couch base is I've got a, what's called a bucket. So bucket is kind of analogous to a SQL server catalog or database. And so you can see it has 760,000 items in it which is what's in adventure works as well. And I've my code my process is copy this data over as is so it's not done any remodeling. It's just lifted and shifted that data from tables over to couch base. So if we dig into the scopes and collections. In couch base, instead of a schema we have something called a scope, but it's a very similar organizational concept. So you can see I have a person scope I have a human resources scope and so on, even a default scope that that will match like the dbo scope in SQL server. And then inside of each of these scopes, there are collections. So these correspond roughly to the SQL server tables. So you can see I have a person collection inside the person scope, email address collection inside the person scope and those correspond to email address table, and so on. So I want to show, let's go over back to SQL server here and I'm going to query some data out of the address table. So this is just getting 1000 records out of here. And you can see each row, it represents an address with city states zip code geography location a good and a modified date. Right. Now I have a program that that runs and copies this data over. And if we look at person address person scope address collection in couch base. We have two documents, and this will roughly correspond, each document will correspond to a row of data in the relational database. But it is JSON though. So if I edit one of these you can see it is JSON data, instead of a row of data. And the data pretty much looks the same. You know we've got it's JSON data types instead of SQL server relation relation types. Sorry, SQL server relational data types. So something like the geography location in SQL server is broken down into a JSON and that's the JSON object so latitude and longitude which will be perfect by the way, for a full text search from doing like a geographical radius search for instance. But anyway all the data is still there. It's just moved into JSON data format in couch base. I'm not going to go through this program we don't have time to go through this as much as I'd love to and I'd love to get any of your thoughts or comments on this so I'll give you a GitHub link. When we get back to the slides where you can check out this code run it for yourself. Make suggestions criticisms I would absolutely love to get your feedback on this tool, especially if you're looking to go through this process yourself. You know I'm creating this library to help developers move their data from relational to couch base for whatever purpose if it's a proof of concept if it's for fun if it's for, you know serious research, any of those reasons. I would love for you to check out this tool and try it. So what I've got here, if you look on the right. I've got a SQL server to couch base project so that's a dotnet library that you can bring in and use and whatever console application you want to or web service whatever you want to do. And then I've got a console example that uses the adventure works data set. And I'm just giving it this configuration file here. I'm selling here's how you connect to SQL server the connection string for that is my local host. I'm going to connect to couch base so again local host and some credentials, some other defaults. And so this another setting so we're not going to get into but this is basically going to start looking at SQL server and say okay I see human resources department table. I need to create a human resources scope if it doesn't exist, I need to create a department collection if it doesn't exist. I'm going to go through each row of data and make that into a JSON document in couch base so it's going to do this all for you. You don't have to create the structures yourself by hand, it will just do them create them automatically. It by the way it also I'm not going to go into this much either it also is going to look at your SQL server indexes and creates versions of those inside a couch base because again couch base uses SQL query language so it can also create indexes in a very similar way. As you can see here, it's also going to create users and we're not going to get into that very much getting a little sidetracked here. Okay. One other thing that I can do so at this point we're basically level five we're copying the data and structure right from SQL server and dropping it right in couch base as is. Now, in the long run as I said you might want to consider refactoring at least some of that data to take advantage of JSON and nesting it when it makes sense to. So that requires some human thought to do that that's something really hard to automate but once you figured out that hey I want to collapse these tables into a single document. Well then this program gives you the ability to just specify that in a declarative way. So this is called do normalize maps and we're going to skip the one to one not terribly interesting. Let's say we have many to one relationship here. We have a person phone table, and a person table, and basically adventure works is telling us that a person can have more than one phone number. Right so the way this is represented is we have a person person, and we have a person phone number table right here. So we have two tables, and it's joined together by a foreign key. Every time we need to combine those pieces of data together. So if we want to optimize that towards a no SQL model, we can just specify here that I want to, this is going to be the from move the person phone number, move the data from person phone number to the person document and this is the foreign key to do it with. I've actually done this already again I don't want to let you sit around and watch my script run. This is not terribly exciting. Well I'm going to show you the end results here if we drill into the person documents and edit this document. You can see that the rose two through 14 there are the original data from the row of the row of data in the person table, and then lines 15 through 25 are the phone numbers that have been embedded into this document. By just by nesting this data. And then I've also done the same thing email addresses I'm nesting those as well. Notice that it's an array. So line 15 through 25 that's an array. So if this person had more than one phone number. It would be listed here as an array. And what we've done here is we've combined a person, their phone numbers and their email addresses into one single document. So if I want all the information to show to a user perhaps hey you can edit your information keep it updated I can get all of this data by key the key happens to be just one here. I can do a single operation to get this document by key presented to the user and allow them to make changes to it, and by writing it with just one key so I'm avoiding sequel here. I'm avoiding the need for joins or acid transactions, and speeding up my, my applications and taking pressure off of my sequel of my query engines, and taking pressure off of the overhead that's involved with acid transactions. Although I do want to reiterate. This is something you should do. But if you can't do this just yet, you can definitely still use sequel joins and acid transactions in couch base. And that's something that couch base has had joins for a long time and has introduced acid transactions just recently. So that's, that's the process in a nutshell. We've seen a level five and level for migrations from relational over to couch base. So I'm going to take us back over here to the slides finish up and we'll have a few minutes for questions I think, and it looks like there's a lot of them so interested to see what we've got. I guess Shannon has recovered okay for just about ready to wrap up here. So they say you only remember three or four things from any given presentation. And so here they are. One is I want you to pick the right application. I want you to focus on an application or part of an application like a microservice for instance, that would be aided by no sequel. So do you need flexibility, higher scale, better performance, lower costs even or flexibility, then that's the right place to go to try no sequel. And before you just dive right in I'd recommend doing a proof of concept, you know go through the modeling exercises, come up with a focus on a very specific area and success criteria to know, you know what is it we want to accomplish and have we accomplished that yet. And you can use this opportunity to review the architecture and say okay well here's what we've learned by going through this process that we can apply when we go through and do the full refactoring or full change over to couch base. And you can use tools like hackalade or when to help define those, especially with a larger team. And you can feel free to use the tool that I'm I've open source there on GitHub to do this as well. Always match the data access to the requirements remember sequel relational world you can only use sequel with couch base you can use key value full tech search sequel and some others I didn't even cover today. So use the one that's the right fit for the best performance. Next steps for you I would love it if you download a couch base seven. We just released it recently to GA. The enterprise edition is free for you to download and try out. And it's great for your larger applications we also have a free community edition that's fine for the smaller applications. We have a free conference coming up completely online conference completely free diving into a lot more details some of the things I've touched on today. Highly recommend you check that out. And then finally if you want to check out this tool if you're interested in this tool or where you're going through something similar. I would love for you to try it or at least leaves and feedback criticism suggestions whatever you'd like. Some of the coolest things I've built into this tool have come from the community, making suggestions you're asking questions so even if you just have time to throw a two sentence comment up there on GitHub. That's very much appreciated so you can check that out GitHub mgroves and it's SQL server to couch base right now I'm in the middle of a refactor that will allow it to work with databases other than SQL server so maybe Oracle for instance or Postgres but for right now it's just SQL server. Okay, I think we have about I've exhausted all the slides I'm open for questions now if Shannon is available. I am and I did not risk in taking out drink of water, which usually clears my throat but was not so successful earlier. Lots of great questions coming in if you have questions for Matthew please feel free to put them in the Q&A portion of your screen. So diving in here. Can we build logical models on top of no SQL data models. I'm not sure what exactly you mean by I think I need some more information there, what you mean by logical models. But, you know, I would say just in generally speaking, you know document databases, the ones I've shown you today are in the no SQL world probably the best general purpose database. There's something you could do in a relational with relational data you could do the same thing with with a non relational JSON data. I like it and no SQL models are only suited for product data model. No, I wouldn't say that I would say like I mentioned in the slides there's lots of different use cases. I just happened to use. I think the adventure works because it's a bicycle shop. And I use that in my examples because I think everyone can relate to, you know, going to a website and purchasing things and seeing their history and seeing the catalog and things like that. But yeah, there's lots of different use cases for no SQL out there. That's what you need. So, Matthew, one big difference between document and relational and your customer example, you have read the whole document to update at a customer connection then write back the document slower for a heavy oil TP application but fast for a system needing everything about a customer is that correct. Yeah, right. And so, so Robert had a follow up here because I think he asked this question before he saw the sub document line there is. You know, model it in a way that would reduce the total size of the documents, or you can use a sub document as he mentioned there as well to just get a small portion of the document. And finally, since we have SQL available, you can, you can use a select to just identify the few fields you want. You don't have to return everything from those documents. So, so yeah, I think Robert's kind of answered his own question there but but yes, absolutely that makes sense. Oh, yeah, I'm so you're sorry. One of the SQL provides a lot of flexibility. Do we have tools available that ensure data quality natively in the database like in the relational world, things like a lot of values, referential integrity and things like that or does this need need to be addressed at the application level. Yeah, very common question I see a lot. The answer is that there are some no SQL databases that provide some tooling for that. It's not quite the same as referential integrity. That's referential integrity is one of the things that makes the scaling so difficult and one of the things that no SQL was invented to kind of work around in the first place. And so definitely the application level you can you can address it there that that being said there are some things in the no SQL world that are coming along couch base has something called the eventing service that allows you to write some logic on the server side to respond to data events like data being created or updated or deleted that potentially could be used to accomplish the same sort of thing. And or might be actually in the future being used as a basis for a server side validation is the best way I would have described it as not not so much the referential integrity sort of things that you need more in a relational database because your data is split up into those shreds. And the questioner added a comment as you were giving the answer something like a trigger. Yeah, so something like a trigger so in couch base specifically the eventing service. It's not, I wouldn't call them triggers is there's there's some differences but yet it can respond to data change events with custom code that you're writing on and that can live on the database cluster itself, not not in the client code. So it's a no SQL database not suited for creating a data warehouse model. So, one thing I didn't cover today is couch based analytics. That is something that can be used for something like a data warehouse I think it's probably closer to this kind of a spectrum between operational and completely analytic data warehouse type of operations and I think couch based analytics and at a good place right in the middle of those where it can execute complicated queries on your data without impacting performance of your relational data. It can also, we started adding some features that allows it to query from Azure blobs and Amazon S3 stores so you can combine different sources of data and query them all with SQL. And this is SQL that's made for flexible JSON data so it's really SQL plus plus it's SQL plus some more functionality to deal with JSON. So, I think we're getting there I wouldn't say it's not suited, but I certainly think there's lots of things you can do with analytics that you might have relied on traditional traditional data warehouse for in the past. So, how would you handle metadata document definitions and attribute definitions the stuff that would be in a relational catalog if the DDL was generated with the comments. Yeah, I don't think I've gotten this question before. Certainly, you know, the way I did it with my tool was to just just go ahead and create them right, but one thing I get and maybe this is something I should, you should go ahead and create an issue for and my, my GitHub repository is don't actually do the work but create some scripts that would do the work and and that might be something that you know we could put comments in there in the same way that we do with with DDL's. So that actually might be a very good suggestion for that but in terms of them living in the database. There, there is actually, I don't know if it's really meant for this but if you look at a document there is a metadata field here, and you can put metadata on each document now. It's not really recommended that you do this, unless you have it like a framework specific reason to but it is possible to store metadata there document level if you if you want to. I love these questions. I love it when you mean it's something new. That's great. All teams use analytics in MLB probably. Oh gosh so someone must not like the shift I guess but so the Cincinnati Reds actually they use couch based analytics not for on field analytics but for marketing analytics, I believe. So it's it's things like, you know, offering, you know, promotions and coupons to ticket holders and buyers that's that's what the reds are using it for so yeah, don't get me started about the Reds or baseball I'll talk about all day with you there. So is there a compression in couch base. I believe I believe there is compression in couch base. Let me just double check here I think it's actually an advanced setting. Yeah so we have different compression modes here. You can just turn it off if you want to and you have passive and active which have different performance implications but yeah that's just a checkbox basically for for compression. Okay, are you still there Shannon is everything still good. I'm still good I'm just talking to the meat button. Any questions coming in I think we have time to slip in one more here. And any questions we don't have a chance to get to make sure and get over to Matthew and couch base so you all can keep those coming in an environment using messaging like Kafka does couch base have a facility to get put in Avro. Okay, so I know couch base has a Kafka connector available. So you use Kafka as both a source and a sync. I'm not well versed enough in Kafka to know if it's if it's Avro specifically I do know I mentioned Apache NiFi earlier today that the couch base plugin which is not it's a community plugin it's not supported. I think either supports Avro or NiFi can do the Avro conversion there. So really the answer is I don't know, but look at the couch base Kafka connector documentation and that will probably give you the answer you want. I think that leads us time even for one more. And can you talk a bit about couch base and acid transactions can couch base be deployed across multiple as your clouds like these facts it's so how is acid employed. Okay, yeah I can talk about acid and couch base for probably a whole hour I've got a whole different session. Maybe that'll be a future webinar but no couch base supports acid transactions. It's kind of a unique implementation of it and that it is is all client driven. And that's for with the exception of the nickel language SQL language has a begin commit rollback that was just added. Can couch base to be deployed across multiple Azure cloud they can be deployed across multiple clouds of any kind really. In fact, that's one of the great things about couch base is this thing called XTCR cross data center replication. So you can have a cluster running an Azure East and Amazon West, for instance, and sync between those data centers if you want to. I don't know if data sex, how data sex does that but with couch base you would deploy two different clusters across different data centers and sync between them. So then the question is how is acid employed. If I think the implication is how is it employed across multiple data centers and the answer is it's not. That is extremely hard problem to basically lock data centers across a planet for instance and we're not we're not there yet. So you'd have acid within one cluster but not within the second cluster. I should say not across both clusters. Matthew thank you so much for this fantastic presentation as always really appreciate it. And thanks to our attendees who've been so incredible and engaged in everything that we do. And also to follow up the email by end of day Monday to all registrants with links to the slides and links to the recording of this presentation. Well thank you Matthew. Thanks to catch phase for sponsoring today's webinar and hope everyone has a great day. Thank you. Thanks all.