 Hello and welcome. My name is Shannon Kemp and I'm the executive editor of DataVersity. We'd like to thank you for joining this DataVersity webinar, the Why, Win and How of NoSQL, a practical approach sponsored today by Couchface. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share the questions via Twitter using hashtag DataVersity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Dave Saglow. Dave is director of technical product marketing at Couchface, a 30-year veteran of database industry. David previously spent over eight years in senior product management roles at Oracle. Most recently, he was the product lead for Oracle Loseagle Database, Oracle Database Mobile Server Products. Before that, he was VP engineering at Sleepy Cat, the company behind Berkeley Database, and held senior technical roles at Infirmix, Lustra, and Versata. David started his career as a developer in the oil and gas industry. And with that, let me turn the floor over to David to get us today's webinar started. Hello and welcome. Well, thank you, Shannon, for the introduction, and thank you, Dadaversity, for hosting this webinar. So as Shannon mentioned, I'm an old-time database guy. I've been around databases probably since about the mid-80s. So I've seen a lot of database technologies come and go. This is actually my first marketing job. I'm a techie by heart. I've named the technical job, and I've probably done it. And, you know, having been around for a while, from my perspective, you know, technology is only as useful as it is when it's deployed solving real-world problems. Otherwise, you know, new technology can be an interesting research project, but it's not really changing the landscape. And that's part of the reason why I'm involved in NoSQL, because I think NoSQL really does change how we look at databases and how we use databases. And that's why I'm kind of excited to be in this space and have been for a while now. Today, we're going to be speaking about NoSQL from hopefully from a practical standpoint and giving people touch points on, okay, how do I get started in this space and what does it mean, especially for someone familiar with relational technology, how do I make the shift over to relation or over to NoSQL? I work for a company called Couchbase. If you're not already familiar with it, we're the folks that make Couchbase server and Couchbase mobile, those are NoSQL products, databases for both server-side database management as well as device-side NoSQL, and there's a synchronization server, sync server, that can move data from mobile devices to the server as well. They're both open source, Java, script, object notation or JSON document databases. And we've been around for about six years now and we're happy to be the providers of technology to a wide range of enterprise customers across the globe who are doing very interesting things with customer-facing NoSQL applications that have significant technical requirements. Historically, Couchbase, if you're familiar with us from a few years ago, we were often deployed as either a distributed cache or a key value database, but over the last couple of years, almost all the deployments we've had has been as a document database with queryability and indexing and views on top of documents. And we'll talk about more of that in detail. I'm going to be talking today about kind of what is NoSQL and why people are using it and why they care about it, and then kind of how to make that transition from a relational mindset to NoSQL and how you approach that, and some practical tips on kind of how to even think about migrating the data. And at the end we'll have open Q&As. So without further ado, I will dive into, you know, what is NoSQL? And unfortunately, we have one of the worst names in the industry. It actually doesn't mean that we don't have a query language. It doesn't mean NoSQL. It doesn't mean that we're better than SQL somehow. We have more than SQL functionality. What it fundamentally means is that it's a non-relational approach to database management. For the last 30 years, relational databases have kind of defined how we think about and manage data, and relational databases are great. But there are some things that relational databases don't do terribly well. And that's what NoSQL tries to address, is that space or area that could be done better if the approach was different. And what's different about the NoSQL approach is that databases are not monolithic instances. They are distributed. So NoSQL databases are built to basically scale out across a set of commodity servers. And the data itself, instead of being in one place, is actually partitioned into smaller groups and replicated out across those commodity servers for both performance and high availability. So unlike relational databases, you don't have one place where all your data is stored, even a network-attached storage. For example, you distribute your data across a whole bunch of different machines. NoSQL was also founded on a principle of either a schema-less or schema-flexible model where you can essentially change the data, how the data is stored, or what kinds of data you're storing, pretty much on the fly. And some of the NoSQL vendors, couch-based space included, store JSON, JavaScript, Object Notation documents as the fundamental way of storing data. And we'll get into more of that in detail. Until about two years ago, if you were talking NoSQL, you were probably talking about a database that's specialized in one thing. For example, there were key-value databases, and there were document databases, and there were graph databases. But over the last year to two years what you've seen is an emergence of NoSQL vendors that span more than one model. And essentially what they've taken is their core competency and expanded it to include other types of data models so that the NoSQL repository is more broadly useful for different applications. So for example, couch-based is a key value and document NoSQL database. We can store and manage both key types of data. It was interesting in the 2010, 2011, when NoSQL first kind of started getting people's attention. The big relational vendors all said, well, nobody's ever going to run NoSQL in production. That's just a pipe dream and that's not actually going to happen. And then a couple of years later they backed away from that and said, no, no, nobody's actually going to use it for mission critical applications because mission critical, you need relational. And they've had to back away from that as well. So I'm somewhat amused having been in the relational database world for a long time to see how quickly NoSQL has not only been adopted and rolled out in production, but the kinds of mission critical applications across multiple industries that it's being used for. So look at some examples of that. That has basically a publisher of different types of media content and their whole business revolves around this media content, content management, and publishing system. And they looked at the system and said, well, we're using a relational database, but we really don't need a relational database for this. A document database would be really better suited to the types of data we're managing. So they completely replaced all the relational stuff with NoSQL in order to better manage and better provide content management and publishing. eBay, on the other hand, eBay said, well, we have all of these different functions and capabilities. And in some cases, NoSQL makes sense, like, for example, token and session management. That works really well in a key value database. We've got our listing services where people are creating and editing listings and putting expiration dates on them. That's all in relational, but what we really need is something that can accelerate the performance of that relational system of searching for listings. And so what they did was they combined relational and non-relational for queries, with Couchface for queries to accelerate the performance of their application. And Marriott is yet another example of a customer who looked at their infrastructure that was built in the 80s and 90s and said, oh my God, we just need to start writing this over. We need to reimplement our infrastructure using big data and NoSQL. We need to move our technology from the 20th century into the 21st century, and we're going to kind of do it a piece at a time. Right now, Marriott is in the middle of a huge transformation moving all of their infrastructure from technology that was predominantly relational based to a technology that's based on big data and NoSQL. So why are customers doing this? Why are companies enterprises taking on the expense and the complication of moving from relational to NoSQL? And fundamentally it has to do with the digital economy. As we all know, the number of customers that we're having to interact with, our end users, our end customers, has increased astronomically. And people, for example, no longer stand in line to go to a bank, I know I don't. People go online. So instead of dealing with customers one at a time, companies are now having to deal with thousands or tens of thousands or hundreds of thousands of customers at a time. The volume of customers that they're dealing with has increased dramatically. The amount of data that they're collecting about these customers, about me, as a consumer, has also increased dramatically. When I was standing in line at the bank, or when I went to my local grocery store, they might know my base, and they might know me based on my credit card, but that's about all they know. But if I go to Amazon.com, they know tons about my buying and purchasing habits, my web surfing habits, that my preferences. And so all of that information, that richness about customers and interactions with customers is now being stored and leveraged to drive the customer experience. And this is kind of part and parcel of what the digital economy is. It's not just offering services and goods digitally, which is part of it, but it's using the information that we now have about customers to make that experience better. And those changes in technology and in customer behavior are in turn driving technical and business changes that are driving people to look at what their infrastructure looks like. And those technical changes are all about how fast can I build and innovate applications? How fast can I change the operational characteristics of these applications? Think about, for example, a Pokemon with Pokemon Go recently, or think about a Walmart. They need to be able to go from almost a standing start to huge scalability in a matter of days or weeks. And this is not something that, for example, relational databases were well designed to handle. And so they're looking at cases of extreme scalability and flexibility as requirements for a new kind of data management technology. And from a business perspective, businesses are looking to IT and they're saying, look, I can't wait 12 months or 18 months for my new application. I need this application 30 days from now, or we're going to get scorched by the competition. Or I need this modification to the application and just start tracking this kind of information. And don't tell me about 6 months planning horizons or 9 months or 12 months to deploy this change. I need this deployed in 30 days or less. So the need for agility, faster time to market the need to reduce their cost, their operational cost and increased services by increasing revenue by increasing services and offering new services is kind of what's driving from a developer point of view. Whenever somebody comes in and says, oh, let's start using NoSQL, kind of the first question is, well, how should I think about NoSQL? Should I be thinking about it as a replacement for my relational database management system or as a complement to it? The simple, non-sible answer is, it depends. Kind of the rule of thumb that I use when I talk to customers is if your repository of record is in NoSQL, that's a great example of why you might replace a relational database with NoSQL. And good examples of that were, for example, session management and token management at eBay. They looked at all of their data for sessions, for web sessions and for tokens. We're all stored in NoSQL. Clearly, they didn't need a relational database there. Same thing at Genet. Genet looked at their database and said, well, all of our data that we're managing for our content management system is in NoSQL. We don't need a relational database in this picture. However, the other side of the equation is, well, if I have a lot of legacy functionality or I have a lot of existing functionality that relies and depends on relational, how can I add NoSQL so that I get better performance, scalability, and availability to enhance or complement the existing relational technology I have? And a good example of that is, again, eBay, where they looked at, oh, our listing service, how we manage listings on eBay, is all done using a relational system. What we really need to do is accelerate the performance and availability of the system for queries. So rather than trying to scale our relational database, we're going to go out and add NoSQL as kind of a front end to that to help us with our query and the number of people that are doing searches on eBay. What's true is that customers are using both relational and NoSQL within the same organization and sometimes within the same application. So I have yet to find a customer who says, I only use one versus the other. What we see is customers say, no, I use NoSQL for this because it's great for that. And I use relational for this other thing because it's great for the task at hand that I'm trying to solve over there. Adding to the confusion for developers is that NoSQL vendors are adding capabilities to their product every day, every week, every month, trying to make the products more generally appealing and more generally applicable to solve business problems that their customers have. On the flip side, relational database vendors are starting to implement successful NoSQL features in order to bring those features, those functions and applications under their umbrella, things like sharding, support for JSON, distributed processing. So from a developer standpoint it becomes a question of what are the technical requirements of my application and which kind of database management system is really going to address my needs most efficiently, right? From a technical perspective, and I've been in technology for a long time, you can use either. The question comes down to which one is more appropriate and more efficient for the problem I'm trying to solve. For customers that have kind of made that migration, they've used NoSQL and they've either used it to complement or replace their relational databases. There's a list of reasons why they might do this. What's interesting to me is this list was probably, you know, was generally true and continues to be generally true, but what has shifted is the focus that customers use in the 2016 Black Duck survey of NoSQL databases. They ask their customers, why are you using NoSQL? In 2015 and before, the number one reason was lower cost. People were migrating to NoSQL because it was a heck of a lot cheaper than running the same kind of solutions on a relational database. However, that's changed. It's interesting that in the 2016 survey, most of the enterprises that responded to the survey said their biggest issue was time to market and flexibility. The ability to innovate and change their applications quickly outweighed the actual license and operational costs. Don't get me wrong, operational and licensing costs were still a big part of their decision matrix, but agility and flexibility was the number one reason why they were saying they were moving to NoSQL. So, you're a developer, you're an architect, you're an enterprise, you've decided to kind of dip your toe in the NoSQL space. How do you get started? How do you find that application that is going to be most amenable to using NoSQL? And how do you approach some of the technology disparities between what you already know and relational and what you want to be able to do in NoSQL? So, let's talk about that and explore that a little bit. When I talk to customers about where to start, clearly I try to steer them away from the monster legacy application that's a bunch of inherited legacy code that looks like spaghetti. Because really the first task there is to start to clean that up and separate that. And in fact, that was in part the problem that Marriott had. They had this huge complex interwoven system that was very hard to basically say, oh, I'm going to use NoSQL for that. But what they could start doing was identifying modules and areas of the project which were amenable for using NoSQL. So they looked for the services that they could implement that would make sense. Oh, I can implement this particular function using NoSQL and gain a lot of throughput. Looking at it from a different perspective, the first question is what are the applications that are going to benefit most from the technology advantages of NoSQL? And then secondly, how do those applications fit into my overall existing IT infrastructure? And some examples of how customers have approached this migration is looking at NoSQL initially as potentially a caching service. This is eBay's listing service. We need faster queries, rather than growing my relational database, let me put a NoSQL database in front of this and make my listing searches that much faster. I could look at independent applications that have a specific function or a narrow focus and say, you know, that particular application would benefit from using NoSQL. Let me go try that and see what the benefits are. Another example is if I have a SOA-based architecture, finding one of those services within SOA that I can leverage within a large application or a specific service that's used by many applications across the spectrum. Again, eBay is a good example where any kind of web session, their tokens and session information are all stored as services inside NoSQL and they can easily change out the components behind the service without actually changing the overall application. What they get are the benefits of scalability and performance without having to actually modify the front end of the application. The application still calls the service and the service just uses a different persistence layer. If you've identified kind of the application that you want to use, the next challenge is to think about how to model your data, especially if you're going from a relational model to a NoSQL model. One of the truths about NoSQL is that every NoSQL implementation is different. The best thing to do is start out by demystifying some of the differences in technology or terminology. I'm going to use CouchBase as a specific example because each NoSQL product tends to use terminology a little bit differently. In the CouchBase world a cluster is a collection of servers and buckets have documents and those documents have an object ID and we support counters inside documents, views on top of documents, and we actually support a query language very similar and based on SQL called Nickel. So if you were coming from the NoSQL, from the relational world, you kind of make that transition in terms of technology and hopefully that helps as we go through the following slides. The second thing you need to kind of get used to is in the NoSQL world, especially in the document NoSQL world, you're moving from an environment with a fixed defined schema to an environment where it's not fair to say that it's schema less, what's fair to say is it's a self-describing schema where you have an attribute name and a value and documents can be composed of any set of attribute names and values and documents can actually be different from one instance to the next. So you're not dealing with an environment where every document has the same structure, you're dealing with an environment where each document has the necessary information to be self-describing about what it contains. Even within a document type, so in this example we have two records that are user records. In one case we have a set of attributes that include both the billing and shipping address and then an example next to it, it has the billing and shipping but under shipping it just says same. So you can have objects or documents of different types and even within a type you can have different attributes and different values for those attributes and when you return a document from the database, you get all the information necessary for the document to describe itself. Because NoSQL allows you to put highly variable document structures inside a bucket, what we recommend is that you use a document type or object type if you will to say what kind of object is this and if your schema or your document or your objects are likely to evolve very quickly, you might want to tag them with a version number as well so that your application can make smart choices about oh I'm expecting this attribute will be there or not be there based on the version. So you're not enforcing schema in the database like you are in the relational world, you're essentially allowing flexible schemas in the database and you're allowing the application to parse and manage that data based on what's coming back. When you think about schema evolution or data structure evolution in the relational world let's say I want to start tracking Twitter handles in my user profiles I didn't track that before. Well the first thing I have to do is an alter table and add a column in order to add a column to that table. The next thing I need to do is change the application service so that it can accept Twitter handles and write them to the database and then I need to change the API, the web form where I ingest information and pass it to the service. In the NoSQL world what you tend to do is just change the form, you change the web form where you take the information and you pass a different kind of or different flavor of the JSON document directly down to the database. I didn't actually have to do anything on the database, I just started sending it a new structure, a new JSON object with different fields in it. One of the primary access methods for getting to a document is what would be called in relational world the primary key. In the NoSQL world, in the CouchFace world, it's the object ID and this is the key or the object ID that lets us find a particular record or instance. And because the key can contain anything you want, there's some general recommendations on making keys efficient. And mostly it's interesting we talk to customers, the biggest thing for them is making their keys easily readable and understandable rather than making them very complex structures that only a developer can understand how this key is structured or how to find data in the database they tend to make document IDs much more natural and human readable so it's very easy to search for things. And you can either have keys like the examples we have above with just text in them or you can embed a user ID for example. So instead of saying author Shane, I might have a key that says author 0101 and I can actually embed numeric values inside the key but it allows me to look for specific documents within the database very, very quickly. Once you've kind of defined what your primary key is, your next step is to think about how are you going to model the relationships that you have in the database and the two things you want to think about is what kind of relationship am I modeling? Is it a one to one? Is it a one to many? Is it a many to one? And then how do I normally access the data? Many customers or many developers are used to thinking of data as kind of a top down approach where you say authors have blogs and blogs have comments but if I mostly access the data from a bottom up perspective then a relationship where comments belong to blogs and blogs belong to authors might make more sense. So I kind of get to rethink or think about the model that I want to use for storing the data and where I put my foreign keys based on how I access the data. The second thing besides relationships that you get to think about with NoSQL is JSON is a great language for giving data objects and it's very flexible. It gives you the ability to say I want to have an object of this type and then an object of this other type and those are related objects and I can go access them. So for example in the left hand document database column we show a model where we're storing multiple documents. Each one is related to the other and the key reflects that the primary key that we've chosen but there's actually three documents in this database. There's a user document, there's an address document for billing and there's an address document for shipping and I've stored those separately maybe much as I would have done in a relational database and I can still do joins within touch base to join different objects and we'll talk about joins more in a moment or instead of making separate document types and storing each document separately I can actually build a nested model where I have a single document in this case it's a user document and within that document I embed all the nested structures that I need to completely describe the object. So I have a choice to make as an application developer, as a designer, do what I want to have separate objects, separate reference objects, or nested objects within my JSON. And basically your decision matrix for that is A, what kind of relationship you're building and B, how do you typically access the data. So you could use this as kind of a cheat sheet to say if I'm modeling this type of relationship it's often more optimal to use either nested or separate documents for certain types of relationships but if I'm doing reads where I'm always reading only the parent then I don't necessarily want to embed all the children or all the nested information with the parent because that costs me more IO. So I can do more efficient IO for example if I'm always accessing reading and writing just to the parent if I do them as separate objects if I'm always reading and writing the parent and the child then why do I want to go to disk twice or why do I want to go to two different places on the disk to find the data I want actually there it makes sense to build things as nested. So looking at how you organize your data is mostly driven by what am I modeling, what kind of relationship, and how do I typically access that data and there may be cases where we combine them. So let's imagine for example an application that has blogs and comments on blogs and we're trying to make a decision of how do I model this data well you know I've got blog entries in the hundreds maybe thousands so I'm definitely going to have an object for a blog now do I want to have comments for every single one of those blog entries every single comment in the blog entry itself and it really depends on what my data looks like because if I have a very controversial blogger I might end up with hundreds of thousands of comments or millions of comments on their blog and that's probably I don't want to have all of those comments in a single blog entry so what I might choose to do is say oh I have threads and a given thread might have a few dozen maybe a hundred comments in it but I don't actually have that many comments per thread so what I might decide to do is have blogs and threads as separate objects and then within each thread have a nested set of comments associated with that thread and that might be a way for me to optimize both access of the data. I'm typically showing a thread at a time as well as better modeling the relationship between blogs and actual comments and threads so if you've decided kind of how to model your data the next question is you know no SQL database how do I go and get the data how do I manage that data and we're going to look at three API's we're going to look at the key value API the query API and the view API the first thing you do when you talk to a database is you have to establish a connection and that's very similar to relational database technology where essentially you're telling it oh go talk to this node in the cluster and what couch base in particular does this is not necessarily true of all the relational databases although no SQL databases couch base will talk to a node the cluster and that node will tell it all of the other nodes that are in the cluster and the client driver will build a map in memory that maps to each one of those nodes basically it'll build a map of all the IP addresses and ports that it needs in order to address any node in the network any node in the cluster and then it will automatically direct traffic to the appropriate node without having to go through proxies or routers or anything else most of our customers use the native couch base SDKs to connect to the database and interact with it although we do have a standard certified JDBC driver for customers who would prefer to do that once you've kind of connected to the database so you've decided what your data model is you've decided how to manage your connected to the cluster the next thing is am I going to treat objects as just strings or am I going to treat these documents as JSON objects as JSON objects themselves it's simple to treat them as strings but then you kind of are left to develop your own serialization and deserialization techniques inside your application whereas if you decide to deal with the JSON objects as document objects you can use standard JSON libraries to basically update add attributes, update attributes, remove attributes from the document itself directly which generally results in less code and less overhead and a lot more flexibility so we talked about three different ways to access the data we talked about key value query and views we're going to look at key value first and as the name implies key value is a very simple interface it basically says here's a key go get me the value out of the database so it depends on that primary key all I can pass in to look for is the object ID or primary key of that document and then a write is basically here's the key here's the value go write that thing so we're dealing with an entire document and it's associated key in Couchbase and most other SQL databases this is really really fast this is what they're optimized to do but it's a document at a time interface and it only can use the primary key essentially for reading and writing records in this example we're dealing with a full document here for example place assumes that the entire address object is contained in that key we can also deal with objects which are nested so we chose a model where we're testing structures in which case we would get the record and then use the unpack attributes of that object update them and then push that data back to the database as an updated object Couchbase also supports the ability instead of running full documents back to the database I modify it and then I write the entire document back to the database you can actually do in place modifications using the sub document API and this is very convenient if you have for example very large documents and all you want to do is add an attribute or add something to an array delete a particular attribute and this is an example of kind of the functions that would be available if you were going to use the sub document API to modify an individual attribute within an existing document the second way of accessing data that we talked about was the ability to do queries and Couchbase's nickel is based on standard SQL and unlike some of the other vendors in the no SQL space anything you can do in SQL essentially you can do in nickel it's not 100% compatible but it's based on SQL plus plus so a lot of the structures and commands that you're used to seeing in SQL will be virtually identical in nickel so for example I would do a select in order to go get data from the database Couchbase supports both inner and outer joins and we've added this on keys clause to basically tell allow the application developer to tell the optimizer which key it should use when joining two documents so you'll notice here that the from clause in the second query and the third query is actually joining two different document types and the on keys specifies which attribute within that document should be used for the join clause when I have nested structures so in this example I have users and accounts as separate objects and I'm joining them together in this example users is a single object and inside users there's a series of accounts what nickel allows you to do is use the where and satisfy clause to basically search inside an array or a set and qualify the records based on the contents of that array or set so it's an extension to SQL that lets you look inside nested objects you can also use the query language to perform CRUD operations create retrieve update and delete using simple insert select update and delete statements those will absolutely work and some of our customers use those these will not be as fast for example as doing CRUD operations using the key value API because the key value API uses the primary key or the object ID to go get the data and nickel you could do CRUD operations based on any value in the document so it is more flexible but it's not going to be as directly efficient as the key value API for CRUD type operations you can also use higher level modeling languages like link that essentially take a specification of a query and they turn it into nickel and that's essentially the way that our JDBC and ODBC drivers work as well they take a specification of a JDBC query and turn it into nickel in this case you're specifying a query using link and what happens in the link driver is it turns that into a nickel query you can't really have queries without having indexes so in couch base we support single attribute compound attribute indexes functional indexes and partial indexes with a where clause as well as two optimizations we've added recently for memory optimized indices and something we call covering indices which allows a query to be completely covered within an index and we never actually examine the source document we resolve the entire query within the index itself. The third access method we talked about were views and essentially views are a way of summarizing or transforming data but at the same time automatically updating it every time a mutation or a change happens in the source document. So customers are used to thinking about views as essentially a one-time object that gets interpreted so if I had a view that I defined as a select state and count star from users group by state I'd get something that looks like the thing on the right hand side and that would be executed every time I call the view. In couch base what we do is we use map reduce code that you'll see on the left hand side to produce a document again a JSON document is the output of a view that contains essentially the information that we want and you can now perform since this is a document all kinds of queries on top of that summarized document. In this case for example we're doing a query on a view and we're creating the view and it was saying get me all of the users or get me the count of users who are in the state of California instead of actually doing the count it just goes to the view and retrieves it and if there are new users that are added to the system they automatically update the views. The view is always kept up to date. So when you think about accessing data you're going to think about what type of data access to use that's going to provide you with the best functionality and performance for the requirements in your application. I'm going to talk briefly about installation and scalability but I want to be able to get to questions so we're going to go through this fairly quickly. Installing couch base is very straightforward basically you install you download it and install it you bring up the web based console so this is an HTTP session define a few things about the server save it and then as needed you add new nodes to the cluster and you just have new nodes which are essentially a download and install by installing an existing cluster and once you've joined the cluster you can tell the cluster to rebalance there's a big button on the UI it says to rebalance the data across the new nodes and this is essentially how you would both install the cluster initially as well as how you would scale it by adding by installing one node which gives you a single node cluster and then adding additional nodes to the cluster and rebalancing that as needed. You can also define multiple data centers which essentially is the ability to create multiple clusters and have automatic replication and data synchronization bi-directional between those clusters so you can have a cluster in New York and a cluster in London and have those two clusters talking to each other exchanging data in the background and couch base offers the choice of either scaling out in the model at the top where essentially every node in the cluster contains the same set of services and as you add more nodes you're just adding more capability incremental capability to the cluster or the ability to scale up which is to designate specific nodes as belonging to specific services so an application that has lots of indexes and lots of queries might have a bigger query and index service and a small data service than an application for example that's managing lots and lots of data but has fewer concurrent queries or fewer indices to find over that data and the final topic I wanted to touch on a little bit was migrating your data so you've made the decision to go to NoSQL you're moving from relational how do you think about data migration and there's a little cookbook here that kind of talks about how what are the requirements for moving data over and paying especially attention to data governance is important if you need to know where the data came from picking your strategy whether this is an incremental load or a one-time data load and whether it's single-threaded or multi-threaded and then you get to kind of pick your tools how to do the data migration either using existing tools like Informatica or Talend or building your own tools say using PHP Python, Hadoop, Spark, etc probably my strongest recommendation is if you're planning for doing data migration especially if it's large data migration plan for failure and restarts the biggest problems customers run into is they build data migration plans that don't include oh I ran out of time or I ran out of resources or a piece of hardware failed and now I have to restart it so building an infrastructure or a plan that plans and includes failure modes and recovery from those is fairly crucial and there's an area up there for keeping it simple with catch base if you're just looking for the easiest way it's the simplest way to get data from my relational system into NoSQL here's a couple of ideas on how to approach that and get that to work at your desk. There's also an interesting question that we hear more and more from customers who are using both relational and NoSQL for the same application which is how can I synchronize either data being written to NoSQL or data being written to relational where's my repository of record my database of record how do I get data over to the other side and one of the here's a couple of approaches that we've seen in our customer base where when touch base is the repository of record changes to couch base might be sent via our database change protocol stream to Kafka and Kafka in turn sends it to a relational database that's certainly one way to approach it and when the roles are reversed and the relational database is the repository of record often customers will use a connection to a light golden gate or SQL anywhere to essentially publish data to NoSQL on an ongoing or intermittent basis. So kind of before we talk about this let's go ahead and go to questions Shannon did you have any questions that came in while we were going through the slides? We have quite a few that have come in already and of course the most common question are people asking about the slides and the recording just remember I have sent a follow-up email by Ender Day Friday with links to both the slides and the recording and anything else requested throughout. So to getting to it David what are the qualifications we should look for in a NoSQL database or is that an outdated function? NoSQL DBA It's not really an outdated function I mean managing schemas and managing performance and managing clusters is not a problem that goes away with NoSQL it just changes. So understanding database not necessarily internals but database organization and certainly the requirements of the corporation in terms of data governance and how data is managed and made available to multiple applications is probably a key aspect of it. In the relational world the database controls the data right controls the schema controls everything. In the NoSQL world in many regards it's the application developer who controls the data. So the role of the DBA has kind of changed from enforcer to some extent to curator and the needs for a database administrator to understand what's in the database where did it come from how can other applications access it is still very important even in the NoSQL world. Love it. So why is JSON such a critical requirement? JSON has kind of become the standard for data transmission between applications as well as data persistence. One of the biggest problems in the relational world is applications manage data as objects but then they have to transform those objects into something else in order to store them in the database and it's this technology impedance between the way the application sees the data and the way the database sees the data that causes all kinds of performance and scalability problems. One of the nice things about NoSQL and especially about JSON is the application and the database can both look at and manage the data in the same way. The application isn't transforming their objects into something else in order to send it to the database and the database isn't constantly trans of the equation makes for a much more flexible and a much more performance overall architecture in the system. And where are traditional relational database systems still relevant apart from simply supporting legacy? I mean if I was building a 10-way join, if I was building a complex application that required analytics in the database or required 10-way joins or star joins, I would probably still do that in relational. Relational databases are going to do what they do in terms of transaction atomicity in terms of multi-table configurations in terms of analytical functions inside the database. I mean it's what they've been built to do for 30 years. If I was looking for a product that allowed me to store data in lots of tables and then do analytics on that inside the database, I would probably be looking at a relational database. It's what Oracle database does well. So I wouldn't be replacing a relational database with no SQL if what I'm fundamentally looking for in my application are relational database capabilities. If the relational database capabilities are more than I need, and what I really need is fast data management of read, write, and simple joins, then I would look at no SQL every time. Perfect, thank you. And the next couple of questions we get quite often, and I've only got a few minutes left, but let's see if we can get through them. The first one is, I like to understand flexible schemas, but can you talk about data quality control location in a no SQL stack? So it's an interesting challenge, and it hasn't gone away. It comes up often when I go to conferences and talk to folks. So because the enforcement of structure and data cleansing and data governance has moved outside of the database and moved into the realm of the application developer, there's still a challenge, and understanding that has progressed. So some of the things you're seeing in the SQL databases now are things like schema inference engines. Cachebase recently released theirs where the database can look at the data in the database and look at the documents and it can say, oh, I have documents where 90% of them look like this. So databases, even no SQL databases, are starting to get smarter about providing tools that allow it to summarize kind of what the structure is inside the flexible documents from the database. Some products also offer the ability to essentially say, here is, these columns are required, and these columns are optional. So there's some schema nest starting to seep into the no SQL products. It's still an evolving space, and I would say that data governance, data cleansing, and data standardization, if you will, is still kind of a process that's evolving. What tends to be true about no SQL, though, is that because it's used in SOA like applications, it's often the case that the structure of the data matches what the application requires, and that's good enough. The structure of the data that I use to persist a web session is going to be different if I'm on eBay, or I'm on PayPal, or I'm on some other property. But the requirement to store basically session information doesn't change. So it's a combination of yes, this is an evolving space, and many of the applications that use no SQL are for application-specific formats and use cases where the data model being flexible and adaptable to the application is much more useful. Thank you for one more question. And this question is likewise one of the most common questions we get in relation in regards to a no SQL database in addition to the data quality one. How do you model it? What tool is used to model the data, and how is the model communicated, approved, and maintained, or is there a tool? There aren't a lot of tools yet. It's an evolving space where there are... What's happening now is data visualization tools are starting to emerge in the different no SQL products that allow you to look at what is the schema and visually, graphically, say I want to build a query that looks like grab this attribute, grab that attribute, grab this other attribute, and I want to put these qualifiers on them. So I think mostly what's happening right now is visualization of the query statement and the querying of databases. The modeling itself, there's not a whole lot of tools to do that. And I think that's a space that many of the current relational tools builders will probably start to explore to possibly normalize, certainly visualize what's in a no SQL database. Unfortunately, there's because of the flexibility I think of JSON, people still have lots of options and may make different decisions as we talked about earlier in the slides based on the types of relationships and modeling and the primary access patterns for those relationships. David, thank you so much, but I'm afraid that's all we have time for today. We're right at the top of the hour. And thanks to our attendees for being so engaged in everything we do and for all the great questions. I'll get those questions that we haven't had time to get to over to. You gave it a rest of the team just to see if there's anything you want to address in the follow-up. And so thank you so much and thanks to CacheBase for sponsoring today's webinar. Again, just a reminder, I will send a follow-up email by End of Day Friday with links to the slides, links to the recording of this session. So I hope everyone has a great day. Thank you, David. Thank you.