 Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor of DataVercity. We'd like to thank you for joining this DataVercity webinar, Top 10 Enterprise Use Cases for NoSQL, sponsored by CouchBase. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVercity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Shane Johnson. Shane is the Senior Project Marketing Manager at CouchBase. He is a former developer and evangelist with experience in Java and distributed systems. He has consulted with organizations in the financial, retail, telecommunications, and media industries to design and implement system architectures that relied on distributed systems for high-performance data access. And with that, I will give the floor to Shane to get today's webinar started. Hello and welcome. Thank you, Shannon, and hello, everyone. Just a small agenda here. We're going to open with a little bit of a description of CouchBase, some of the trends that we see driving enterprises to consider NoSQL databases. A little bit about our product and how it's helping, and then we'll spend the majority of time on these cases themselves. And I promise to go very quickly through the boilerplate here. CouchBase NoSQL is what we do. We have offices in North America, EMEA, APAC, with tremendous growth, et cetera, et cetera. As far as what's driving enterprises to not only consider but evaluate NoSQL, there's a lot of the things that were challenges to the big internet companies. Web and cloud, mobile in particular, big data, certainly for everyone. And just a few little snippets here to give you an idea of what that means. AOL advertising is handing billions of impressions per day. And so they need a solution that can give them not only high performance, but scalable access to that data. PayPal storing over 10 terabytes of data in over one billion documents. One of our large discussers who will talk about their use case in a little while, 800 million user profiles are stored in CouchBase server. I believe today it's approaching a billion. It may have even exceeded one billion by now. And no surprise, the ability to store unstructured, semi-structured data largely driven from a social environment. I think these are probably fairly standard and fairly common. And the other thing I wanted to touch on is really kind of NoSQL and Hadoop. It's a conversation I've been having more and more lately. In particular, we're seeing more and more of our own customers pairing CouchBase server with Hadoop. We have one use case in particular that we're going to talk about today where these two are working close together to solve a particular problem. But it's actually kind of a common theme across multiple use cases. Even though we might not talk about it upfront, there's a pretty good chance that these use cases and some of these customers, they're probably using Hadoop in the background as well. But from the most basic point of view is they're using CouchBase server as their NoSQL database for operational workloads. We're talking about interactive applications, web or mobile. Millions of customers or consumers. Performance is extremely important. And on the other side you have this Hadoop world, which is largely analytical in some respects offline and tailored to a different audience. When it comes to a NoSQL database, we're primarily concerned with our users and our customers. When it comes to Hadoop, we're going to be more concerned about data scientists and business animals, kind of an internal, external type of environment. But we'll talk a little bit more about this later. And ultimately, we had the original trends. There's web, mobile, cloud, big data, et cetera. The reason that those trends are driving to NoSQL and why it was first experienced by internet companies is because relational databases are reaching a point where they're having difficulty keeping up with the requirements. We're seeing customers who they now need submittal second response times. Not a couple seconds, not a second, but less than a millisecond. The scalability is becoming unprecedented as we just saw back on one of the first slides, PayPal storing over 10 terabytes of data. And that's not the largest amount by any means. So it's the ability to scale to not only store more data, but to serve more users as well. Feed, of course, is important. And on one hand, we talk about the ability to access that data very quickly, but the other requirement is to be able to ingest that data just as quickly. If you think about all those mobile users out there or Internet of Things, data is being generated by both machines and users in an extraordinary way. And you need a database that can ingest that data fast enough. And then kind of the move away from structured data into more unstructured data, an abandoning schema and a fixed data model to give you much more flexibility, not only in the type of data that you can store and manage, but your ability to build new features or new services and to do so quicker. And there's a couple of quotes in here. I'm not going to go into them, but they're simply supporting this notion that relational databases, they are hitting their limit. And for those enterprises that are approaching that limit, they're looking at no sequel to take the next step and to continue. And on the flip side of it, from a customer point of view, it's the scalability in the performance. Almost every time that the single biggest drivers are going to be scalability and performance. There's a little bit about where we're positioning right now, and this is probably not entirely limited to us, but certainly performance and scale is the big one. High-vibability, especially in enterprise environments, is equally as important. We talked about the flexible data model and all the benefits that that is going to give you. And then from a scalability perspective, it's not just the raw ability to scale. It's the ability to do it with ease and to do it economically. Certainly there are environments where there's unlimited money throwing at something. You can make it scale. But more importantly, is can you do it with less money, less time, less overhead? And that's one of the big things that we're pushing. So, couch-based server, how are we solving some of these problems? Now, to give you a little bit of an introduction here, we have two products. We have couch-based server, which is your back-end NoSQL database, and we have couch-based mobile. And so if we start on the left side here, couch-based server is a key value store and a document database rolled up into one, and we taught then an integrated cache as well. So this single database can effectively be a high availability cache, a key value store, a document database, or any combination of those. The other side of it was couch-based mobile, which has two components. There is an embedded database, which can run on mobile apps for Android, iOS, Windows, or even devices that are running Linux and Java. So you have this local native database that you read and write to, and then we have the sync gateway, which basically is the bridge between couch-based server sitting in your data center or in the cloud, and couch-based light, which is sitting on your mobile phone or device. And it's kind of a complete spec that goes from one end of the spectrum all the way to the other. Some of the unique things that we'll get into a little bit later, there's only one node type. So a single instance of couch-based server is responsible for all of these services that you require. Replication is a key part of availability. So we talked about high availability being particularly important, and so replication is a means by which we can do that. And another one that we're going to see pop up in a number of use cases, and in particular some of the customers that fall into those use cases, is cross-data center replication. Some of our very largest customers are operating multiple data centers and need to be able to use all of them. And one of the ways that we accomplish that is memory-to-memory replication. So whether it's a cluster within a single data center, or a cluster is deployed across multiple data centers, that replication is essentially memory-to-memory and fairly quick. Multiple industries, multiple companies, multiple use cases. There's a very broad degree of adoption that we're seeing here. And some of these use cases that we're going to get into, they're going to fall within these particular industries and we'll reference some of these customers. Before we get into the really good stuff, I wanted to spend just a moment on couch-based server 4.0. That is our new major release coming very soon. And in addition to all of the use cases that we're going to talk about today, there are going to be new use cases that come out of couch-based server 4.0. And so we'll give you a little sneak peek at what's going on here. Of the three major features, multi-dimensional scaling is multiple clickable from an operational standpoint. So if we look at the three main services, any database has to provide. That's the ability to read and write data, to index the data, and scale the query of the data. And so what we've done is we've actually pulled these functions apart into independent services. And so when you deploy a cluster, you can specify which services run on which servers. It's particularly fascinating from an efficiency point of view, a scalability point of view, and we'll have more on that. In the future, SQL for documents is the one that I think is going to lead up to the use cases. And that is a query language based on SQL, but for JSON documents. And we'll have one slide to talk about that. And 4CB, which is a new high-performance storage engine, which is engineered specifically for multi-core processors and SSDs. We're going to be introducing 4CB and Cosplay Server 4.0 to store the indexes. So SQL for documents, a couple of slides here as a preview. When we say SQL for documents, that is precisely what we mean. So the types of SQL statements you've seen before, select fields from table where, you know, field equals something, group by, you know, all those types of keywords are available to you. In addition to this notion of raw SQL, there's also query API. Kind of have a fluent API such as DSL that you can use to build queries. Or certainly you can simply submit SQL statements, JDBC, O2C drivers, as well as framework integration. And just to give you a sneak peek at what it looks like, this is it. You have two approaches. You can simply submit a SQL string. Or as I said, you can use the DSL to build your query. So everything you could do in a relational database from the query point of view is what we're doing, but we're applying it to a document database. And a final note on that. We just released a developer preview last week. So there's two URLs here. You can check out the coming in Cosplay Server 4.0 page where you can get more information on those three things I talked about. Multidimensional scaling, SQL for documents in 4CB. Or you can go straight to the documentation, which will tell you where to download, how to get started, walk you through the quick start. We certainly encourage everybody to check it out and provide feedback. Now on to the use cases. There are certainly a lot of use cases out there. I think this represents every possible use case for no SQL database. But these are the 10 most common that we see from our customers. Profile management, by and large managing user profiles. Catalogs, which could be anything, but generally product or service catalogs. 360-degree customer view is particularly interesting. We see that in enterprises where customer data is stored in different silos throughout the organization. And what they're really trying to do is to aggregate all the data related to a customer into a single place so that they can have a single view of it. Real-time big data is a fond one and that's what I'm very passionate about talking about. But that's bringing Hadoop and no SQL alongside each other. In particular, doing it in a real-time fashion. Personalization is a very broad term, but essentially what we're getting at is how can we optimize the experience for customers? It is the ability to collect information on users, whether it's demographic, behavior, search, history, et cetera. Purchases they've made, preferences they have to use that data and then deliver something that's very unique to them. Content management is probably self-explanatory. Digital communication is particularly exciting. We'll talk about some of those. Internet of things and mobile applications. Kind of in the same broader camp. And then fraud detection, which is particularly interesting. We'll go through each of these 10. We'll start with profile management. As I said, it's almost central to everything you do. If you have an application where it's web or mobile and you have people that use that application but they register and they have a user profile and you can't really do much without it. From a business perspective, that user profile has to be immediately available to your users. When they sign in, they don't expect to wait. There's kind of this immediate assumption that once I hit that button, I'm ready to go. The other one is kind of devolving profiles. So over time, new attributes are added to those profiles. Maybe it's because new features are growing world out or new services or we discovered that there's something new that we can learn about users and if we can know that, we can improve the service. And a growing customer base. Whether it's thousands to millions to billions, the infrastructure has to be able to accommodate that type of growth. If we look at it from a purely technical perspective, it's the low latency. You need immediate access to that user profile. You need to be able to do away with a schema or fixed data model so that you can speed up how quickly that user profile can evolve and fast easy scalability. As the user base grows, it shouldn't be difficult to scale the database to accommodate that growth. We talk about what we're doing from a couch base point of view. A big one is our integrated cache. So it says at the very beginning, we are a key value store in document database, but we integrate cache. And if that integrated cache, it gives us a really high performance and a really low latency. It is a document database based on JSON. So there is no schema and people can't undo, you know, modify their data model on the fly. And what we call push button scalability, particularly in commodity hardware, it is easy to add new nodes. You start a node, you go into the administrative interface for the cluster and you say, yep, I want to add that node, please do so. It is very quick and very easy. The password that we are going to talk about for this one is a Fortune 50 company. And it was one that was hinted at back on the first slide. They have a billion user profiles that they're storing. And they started out storing those in Oracle. And so they have multiple data centers and they're using Golden Gate to replicate that data between data centers. But as we also mentioned, they are running into scalability and performance issues, let alone cost issues as well. And so they had considered other node SQL databases but also moved in the direction of Couchface server to begin to offload data from Oracle, let it be served by Couchface server, and then move to turn on the cross data center replication so you can start to phase out Golden Gate. And then once you have all the data and multiple data centers replicated between each other, you can phase out Oracle Database beneath the end. So it's kind of a quick overview of what happened there. Fraud detection. From a business perspective, I think we could probably all safely assume why fraud detection is so important. But this is an environment where the customer data is changing very quickly. New accounts, new detection rules, new transactions. So it's fairly safe to say that we're getting a lot of new data very quickly and that data can change. And real-time responsiveness. I can't really exaggerate the requirement for speed because they need to check for fraud when you make a purchase. Not the day after or the week after, but the moment you're making a purchase, we need to see if it's possibly a fraudulent purchase. And very high volume of interaction, which we'll talk to when we look at the example. From a technical perspective, certainly again that flexible data model. They know they're getting new rules and new types of transactions and new types of customer and account. They need to be able to support that. Low latency again, which is saying performance is critical and high throughput. These are kind of the cornerstones in this case. So again, they're going to benefit from a flexible data model. They're going to benefit from the integrated cache. That in particular is going to give them the very high throughput and low latency they were looking for on the scalability. The customer behind this particular use case is one of the largest fraud detection platforms in the world. Their customers are essentially the largest financial institutions around and they process 65% of the world's credit and debit transactions. So to give you an idea of the scope of what's going on, the throughput is extraordinary and the latency is important. We talked about a purchase. When you make a debit or credit card purchase at the store or at the restaurant, in the milliseconds it takes to make that purchase, they're checking it for fraud beforehand. So certainly latency and throughput are extraordinarily important for these guys. Our third use case is the Internet of Things. And this is a fascinating area to be in right now. Objectives might be a little bit different in so much as rather than existing applications or services being pushed to their limits, Internet of Things is actually opening up a world of new services and new products and new opportunities. Everyone is just finding new ways to do it right now. The area that we're looking at in particular requires people to ingest new types of data and very evolving data, particularly when you look at the smart home or the industrial Internet. We're talking about new devices popping up on a daily basis and those new devices are generating new data. So if you're in a position where you want to begin collecting all the data being generated across multiple devices, that data is not only different between devices, but new devices are also generating new data. You need to interact with numerous devices, sometimes unconnected. We can talk about that a little bit later as well. So kind of from a concurrency perspective, we're not looking at Internet of Things as in connecting to one thing. You're connecting 2,000s if not millions of things. And guess what? Those thousands of millions of things are generating massive data sets on the order of billions of data points. There are very, very high requirements here. From technical perspective, certainly the flexibility because of all the new types of devices, the throughput because of the sheer amount of them, as well as we introduce requirements that have to do synchronization. That's one thing to create data while you're browsing on your laptop. It's another thing if you're generating data from a mobile phone where you're out of service or in the middle of a field of windmills or on an oil rig in the ocean, you can't always rely on a guaranteed Internet access. And so we do need some sort of solution that can support online as well as offline. And the scalability perspective is almost generating new degrees of scale. Why College Base is chosen in this particular use case? Again, the flexible data model, you're going to be able to adapt to new devices or new firmware or software upgrades very quickly, much faster than if you had to change the schema on a relational database. As I mentioned at the very beginning, College Base Server and College Base Mobile, when combined, gives you this ability to not only handle all these devices and all this data, but to do the synchronization for devices and machines that may or may not be connected or may not always be connected. And the push-button scalability. We see environments where start off small with respect to the number of things and very quickly grows to many, many, many things. Product catalog. From a little business perspective, we can talk about things like being able to cross-sell, maintaining inventory, or being more efficient in how you do so. They need to store lots of different types of data. We can see some examples here from SKUs and part numbers and metadata, but it can be tricky in that not all products have the same attributes. Sometimes they're different. Sometimes the values are different, and you need some flexibility there. Updates can be very, very quickly, whether you're changing the inventory and a new product, putting something on sale. This is an environment where the data isn't so much reference data. It's actually much more dynamic. And the customer experience is massively important. If you look at e-commerce or even in-person retail, you need to be able to find what you want and purchase it very, very quickly. Technical requirements, and some of these are going to be a little repetitive here, is certainly the flexible data model. As we just mentioned, they're going to need it because this catalog is very dynamic and very diverse in what it stores. You're going to need very high read-write throughput, low latency access, which is important to that customer experience, and the ability to store large volumes of data. From a college-based perspective, once again, that's a document database with integrated caching and very good scalability. And in this one, a good example from a customer perspective is Tesco, which is the world's largest retailer in Europe. College-based servers used for many use cases, but in particular, they're managing millions of products in this catalog. It's always quickly changing, and they have a very, very large number of users, an large amount of data. Digital communication is pretty neat, but essentially, you are connecting customers, employees, friends, whatever nature that relationship might be on line to hold a conversation. The datasets are massive. I mean, if you think about how many text messages you and everyone you know are sending every day, amplify that for people that are doing this online. Performance, nearly instant, right? It's impossible to hold a conversation if those messages are delayed in any way. In zero downtime, these companies, when they can pull together millions and millions of users that have real-time conversations, they can't afford the system to simply be down for any period of time. The negative impact and cost is extraordinary. And so, from a part of its respective, it's a little bit different in that, while they do need the scalability, as well as the low latency, the 24 by 365 availability that is extremely critical here. As we mentioned before, we have integrated caching scalability, and in this particular use case, which we'll talk about a little bit more with the customer, is the replication and the cross data center replication. Key to availability. You need it within a particular data center, but for a company like this, where they're operating on a much larger scale and with multiple data centers, it's important to have that cross data center replication so that you can tolerate the failure, not just of a node or a rack, but an entire data center. And the customer that we can talk about, and this one is live person. And so, if you've ever been to a website where a little pop-up form showed up, but the picture of someone's face saying, you know, can I help you? Are you stuck? What can I do for you today? That's probably live person. That is what they do. And that's essentially real-time chat. So they're chatting with millions of people who are on these websites. They have about 85 enterprise customers. So those are very big popular websites. Those are live person customers. They do, I think, over 22 million engagements per month. They're bringing in terabytes of data. They're doing billions of data points around those users. The sheer scale is extraordinary. Another good example in this area is Viber. They have about 600 million users. I think a little bit more now. And they generally have about 200,000 concurrent users at any point in time. So again, the scalability is extraordinary. Customer 360-degree view. We mentioned this one at the beginning a little bit. Essentially what's happening is it's difficult to take advantage of the knowledge that you gathered on customers when it is bits and pieces spread out across the organization in different databases or different data silos. And what the customers are really looking to do is to aggregate all that data into one single place. And then you can begin to interact with it and analyze it and learn from it. And so if we look at some of the technical requirements, certainly the data model flexibility. And this should be obvious in the sense that if you want to take data that lives in multiple repositories and aggregate it, you're going to need a lot of flexibility in your model. You can simply have this predefined fixed model with the assumption that everything will fit into it. Some customers may have a lot of data. Some may have a little. They may not share the same data. The flexibility is critical. As well as integration with Hadoop from what we see with our customers is particularly interesting here. They can begin to aggregate that data and then push it into Hadoop to learn about those customers. As well as low latency. So in this particular case, the JSON model was particularly helpful. We did Hadoop integration through streaming and integrated caching. And so in this case, this was a Fortune 200 global brand company, a global apparel, if you will. It was kind of doing a direct consumer online business. So they were beginning to sell directly to these consumers. And the workload was continuously growing. So they were just starting this venture and sure enough it was picking up momentum. The number of online interactions was significantly increasing and so they needed a certain amount of scalability and way to consolidate customer data so they can be accessed not only by external customer-facing applications but internal applications as well. And as they moved from more of a traditional infrastructure to a cloud-based elastic infrastructure, is the moment when they looked at couch base to be that scalable database. Personalization, as I mentioned before, there's a moment of truth where you have an opportunity to engage a user but the only way you can really take advantage of that is if you know about them and if you have some information about them, you can offer them the right discount or the right promotion or the right related product that can increase the chances of making a successful purchase in this particular example. So there's large volumes of fast-changing data and what I mean by that is a particularly popular source of data is clickstream data. So as a person on the Internet, every link you click, every page you visit, every time you search for something on Google, these leave kind of a trail of problems for what you're doing and that's the clickstream. And that is very fast. I mean, imagine if I was capturing every click you made in your web browser right now, it's just you how much data would be generated. And I'll multiply that times millions and billions of millions of people doing the same thing. And if you want to ingest all that data, it is a lot of data. Many concurrent users. And again, zero downtime. It's just not acceptable. From a technical perspective, again, the flexible data model, high throughput and low latency in both directions. So high throughput, certainly the ability to ingest clickstream data and low latency in that when you want to deliver someone an offer, an ad, or a recommendation, that has to be immediate. Scalability and of course the 24x365 availability. I won't go into the couch space solution. I know I've said these things before. But if we look at kind of a customer example behind this one, we can look at AOL advertising. They are using couch space server and Hadoop. They are one of the largest ad networks in the world serving billions of impressions per month for hundreds of millions of visitors. That's the scale that we're working at. So that data will, you know, there's kind of two things that happen. All those visitor profiles are stored in couch space server. So based on what you've been clicking on, your age, your location, et cetera, they kind of have a profile of who you are as an internet visitor. Based on that profile, they will serve the most appropriate ad for you. So as time goes on, they're collecting this clickstream data. They're serving impressions. They're finding out what's clicked on, what's not clicked on. And then it gets pushed back into Hadoop, where they can then run additional analysis on it with the goal of improving how effective those profiles are. So I might browse the internet. I go to some web page and add a serve, and I don't click on it. An hour later, I go to another website and I see an ad. I do click on it. Eventually, my behavior is analyzed in Hadoop so that the next time around, they are less likely to serve an ad that I'm going to ignore because they're learning about me and they're trying to figure out what it is I like and what I don't like. And why they chose to go to a couch space server was really around the flexible data model and the performance. When you're serving ads to people, I think it's safe to say that ad has to be immediate. No one's going to go to a web page and wait a few seconds for a little loading section for an ad to show up. Content management, we talk a lot about semi-structured and unstructured data and that's what we're getting at here is particularly from a raw content perspective, you're storing all different types of data and the applications that can be built on are expecting very quick access to it. And so yes, we need a very flexible data model because we want to store a large variety of content which might not be similar to each other. We're going to need that low latency access and the ability to scale as we generate more and more content. And so kind of one of the examples of this was a Fortune 500 media company. They needed to fill their content to 50 million online visitors. They have 90 plus media outlets. So they were using Microsoft's SQL server before they migrated to Couchface server and part of what was driving that is they were discovering new types of content which were by and large semi-structured and in order to kind of improve that online experience for all these 50 million plus unique visitors every month they had to move to something that could provide better performance and a more flexible data model and scale a little bit more. Today they're doing 50,000 reads and 10,000 writes per second which is more than enough for them to keep up with their demand. Mobile applications is a particularly growing area for certain. You know, customers are building mobile apps in many different ways. We can talk about a few here in a second. One of the things that we have to contend with from a business perspective and the reality is that there may or may not be a connection. You know, it would be nice to think that we always have Internet in our pocket but if you've ever been on the subway and airplane on a rural highway, you know, there are times or even a very crowded area like a stadium event, your Internet connection can be anywhere from, you know, not very usable and pretty slow to simply not there. So we have to deal with that. A fast time to market, especially in the gaming sector, everybody wants to be the next big iOS app or the next big Android app. Time to market is critical here. Publishers have to be rolling out games quickly and to add new features and to expand on those games quickly. And supporting multiple device types of platforms, especially in the mobile environment. We thought it was bad with desktop browsers. Then I get into a mobile environment with iOS, Android and Windows. Three separate platforms whose native abilities are not often functional with other platforms. And so as we begin to look at more of the technical requirements, well, if we can't have a network connection, our technical requirement is, you know, we need to be able to store data on the device until as much time as it can be pushed to the database. If we're storing data on the device and we're storing it in the cloud, we need an ability to synchronize data between the two. And certainly scalability. We've had customers who release a game on day one with very few users, a handful of nodes. And within a matter of months, they're moving into tens of thousands of users as they become a very popular game. And they just add nodes every step of the way. Every couple of months, they add more nodes to support an additional 10, 20, 30,000 users. So the solution that we present, which we highlighted earlier, was couch-based light. That's the database that's going to run in your iOS or Android app, as well as the Ginkwap sync gateway, which is going to ensure that that data is pushed to the cloud, or if other devices push to the cloud, it gets pulled down to your device. So it's bi-directional. And the scalability as well. And a good example here is Ryanair. He's been in the press lately. One of Europe's largest airlines is building a brand new mobile app that supports more than 1 million travelers. They're moving from a relational database to a couch-based server and couch-based mobile. And in particular, you know, they needed a remote backend database, but they also needed something in the app so that they didn't always have to be online. And that way, the application itself was always working, regardless of the network connection. Real-time big data is really, really interesting. And we could do a whole presentation on this if we had the time for it. But essentially, it's the ability to extract information from big data as it's being generated. If you look at traditional big data and kind of the offline approach, you store some data in couch-based server. You push it into a dupe later on. You learn from it. And you go back and you push that into couch-based server. And that's what we're talking about just a little bit in a minute ago with personalization. In this case, it's not just a large volume of data that you have to contend with. It's the sheer speed that is being generated. Right? And you need to be able to interact with different analytical platforms. And this goes kind of in both directions. So we had the skills-leading throughput, the flexible data model. And the defining aspect here is integration with Hadoop. And so we have that not just in a back-scenario with something called scoop, but as well with streaming options, whether it's something like Kafka or Messaging, or it is Storm, maybe Spark, for stream processing. There are any number of options there, and that is really allowing people to learn from that data immediately as opposed to when it's too late in some regards. And so the example we can kind of talk about here with real-time big data is PayPal. And so PayPal is using Calgary server and Storm and Hadoop and all these tools to do real-time analytics. They're collecting quickstream and interaction data, which I mentioned earlier. They can then analyze it as it comes in, filter it and rich it. After it's actually been processed, it can be put into a NoSQL database like Calgary server so it can be accessed by different visualization tools. This all has to happen in real-time. And yet the data still gets moved into Hadoop as well for some of the offline analysis. That was a really quick run-through of ten very popular use cases for NoSQL. Do we have any questions? Hi, Shane, thank you so much for this presentation. We definitely have questions come in. And of course the most popular question is always whether or not people will receive a copy of the slides. So just a reminder, I will be sending out a follow-up email within two business days with links to the slides and links to the recording of this presentation, along with anything else requested throughout the webinar. So just a couple of questions here for you, Shane. One comment that came in, where about transactional, not referential integrity and recovery? So it looks like there's a couple of things there. From a transactional perspective, college-based would fall into the same category as most NoSQL databases, where we tend to say that rights are assets in the single right sense. So if you insert your user profile, that right will be acid. It does not, however, do multi-document rights. So if you think of having to insert three things within a transaction, so to speak, that's not going to be there. And part of the reason why is a move from a relational model. So if you had a purchase, you bought something, we know that within that purchase is some information on you, the method of payment, might be of mine items. Those things might be broken up, and stored into different tables in a relational database, so it only makes sense you need a transaction to ensure that all these different pieces of data are all written. And in NoSQL world, all that could be rolled up into a single document, and then you just insert that document. I hope that helps. I believe so. And kind of a tough question for you, Shayna. Maybe you can answer it here. Maybe it might be something that you can send for me to include in the follow-up email. But can you show an example of what is in the key and the value for one or more of the use cases? Key and value. I could just make one up off the top of my head here and say that for user profiles, the key is awfully up to you from an application point of view. That key could be an email address. It could be, you know, a social security number. But that key at the end of the day is a known key. So you know something about that user, and that's how you're going to retrieve them. And then the value is just a JSON document. So when you create that user profile, it's going to include their name and their address and any of their preferences. But think of it as being written. One big document can have all the information about that user. And going forward, when you want that profile, you can just fetch it by their username or their email or their social security number. But whatever that identifier is that you've selected for users, that becomes the key. It's a little perfect. We've got a lot of questions going on here. So I want to make sure we get through them. Just a follow-up from the previous question, Shane. In regards to transactional integrity and recovery, I think he answered most of it, but just a couple of follow-ups that were on that. As data is replicated, do you take advantage of the total throughput available? That is, can reporting come from all servers? Emphasize on all servers. Yeah, and that's a great question. If you deploy... I'll just use three because it's easy. Three nodes. All three nodes are doing reason right. If we go one level deeper, each node is the primary for some subset of data. And so we could say that Shane Johnson's profile, the primary owner is node one. Now, my profile might be stored on node two and node three as well, depending on the configuration. But every time someone wants to read and write my profile, it happens on node one. Node two might have someone else's profile. It might have my boss's profile. And node three might have some other colleague's profile. So at the end of the day, the data is distributed evenly across every node. Each one owns a particular subset of it, but might maintain copies of someone else's data just for availability reasons. Thank you for expanding on that. The next question is, is JSON, couch base, and key value the same concept? If not, what are the differences? Yeah, I think in some respects, you could probably say that every node SQL database is a key value store in that the original, if not primary way of interacting with them is that given some key, you want to insert or update data or read it. A document database simply makes the assumption that the value is JSON. And if we know that it's JSON, we can do things like index it or allow you to query it. And a key value system, sometimes the value is arbitrary. It could be text, XML, an audio file, binary, a serialized Java object, we just don't know. So there's not as much we can do with it. But if it is a JSON document, we can read that document and we can parse out those fields and index them and then let you query them. How do you get data out of couch base for other business processes? So there's a few ways to get data out of couch base. Right now we have clients for multiple languages. Certainly all the big ones, Java, .NET, CC++, Ruby, Go. There's kind of a whole bunch of women there probably for getting some off the top of the head. But chances are whatever your preferred language is, there's a client that lets you read and write to couch base server. On top of that, there are some RESTful services for querying data. And with couch base server 4.0, as I kind of previewed in the beginning, you'll have ODC, JVC drivers as well. So I would say getting data in and out is really easy when you're building your own applications. And pretty soon with 4.0, even easier if you have ODC, JVC requirements. I love how technical this conversation is going. And how do you back up databases? Yeah, so couch base server has kind of a variety of approaches there. So we do complete backups. We do cumulative backups. And we do incremental backups. So you could back up your data every Sunday. And then every Monday, Tuesday, and Wednesday, you do kind of a differential backup. And then if you need to restore, you can basically restore using a big backup along with all the incremental ones. So if I could just say there is a lot of features there to allow you to back up and restore data. And is there data compression? Yes, there is. I, off the top of my head, I do not recall what we're using in 3.0, but in 4.0 we'll begin using snappy. So the data is compressed. Very cool. And what about systematic business rules? Is Visua a basic code stored with data or the user profile? Here, being a key value store, what you put into value is up to you in some respect. But in my experience, I haven't seen anyone mixing and matching code and data. That user profile is usually just elements of that profile, the name, the address, et cetera. It's not mixed and matched with code per se. And I love this next question. How are large objects processed more efficiently using CouchBase than associating structured, semi-structured unstructured data in different spaces, multiple files using relational calculus and rdbms? That's a great question. You know, it's depending on how large your object is, a NoSQL database, whether it's CouchBase server or any other one, may or may not be the right fit for you. If you're looking at a file, a log file, that's gigabyte large, or you're looking at a movie file that's even larger, Hadoop and HDFS in particular is probably going to be the way to go for you. What we have seen with some people in this environment is you would store the metadata in CouchBase server. So you might store this very large object on a file system somewhere, but in CouchBase server, the metadata tells you where it's located on the file system, what the title is, when it was created, et cetera, et cetera. But if it's very big, I wouldn't recommend putting it in CouchBase server. Interesting, I love that answer. What use cases are rationalized based on the flexible data model as opposed to the scalability requirement? In our case, we did not have the large scalability requirement, but we are interested in the data model flexibility. Yeah, and I think that's an entirely spot-on too. There are certainly some customers that have larger scalability requirements, but that's not true for all of them. A lot of them can start with as little as three nodes to get going, and one of the nice things about CouchBase server is you get a lot of bang for your buck. So those three nodes can probably buy you many thousand of operations per second, depending on the size of the hardware. From a flexible data model perspective, I would say that most use cases are going to benefit from a flexible data model, maybe not all use cases. You would certainly have to see, you have to put some thought into how you're modeling your data. So I tend to think that relational data can be hierarchical when you look at it, and if that's the case for you where you realize that, oh, you know what, I could net all of my related data into a single document, then you're going to benefit from that. Certainly with 4.0 rolling around in these new query abilities, it's not required. You can certainly reference other documents within a single document. But yeah, you probably just have to put a little bit of thought into it, but my instinct tells me you'll benefit from it just as many others have as well. And I'm going to skip around a little bit while we're on the topic of modeling. Are there any data modeling tools such as Erwin used for modeling CouchBase server? I am not familiar with any modeling tools as far as software and app that you could buy but there's certainly lots of good reading out there on best practices for modeling data in a document database. And this questioner wants to know, how do you query the value part? Just in reference to something you had mentioned earlier, that is interesting. I thought you can just look up by the key. You can do both. If I inserted my profile and I used my email address, I can update it or fetch it based on that email address. Or at the same time, because it's JSON, and you can create indexes, you can start to look for fields within it. There might be a location field, a city or a state, or a date-joined field. You can then index those fields so that someone else might want to create a report that says, give me all users who live in California, or give me all users who signed up more than a year ago. So it's the ability to do both. I think we have time for maybe one or two more questions here. Thank you for submitting all these. I love the activity coming in. I love this next question. If a Naughty app drops bad data into the database, is it possible to back out the updates from a single app and restore the data? Yeah, I mean, I think that's probably safe to say. It would depend... Yeah, I'd probably have to understand this scenario a little bit more, but certainly because of the backup and restore tools available, if something wrong happened, you could only restore to a certain point if you wanted to. Otherwise, I guess it would depend kind of on this bad data and the best way you might get out of there. Makes sense, and I'm afraid that's all we have time for today. Thank you so much to the attendees for submitting so many questions and being engaged in everything that we do. Shane, thank you for this fantastic presentation. Very informative, and obviously there's a lot of interest out there in this with the questions that had come in. Just a reminder, I will send a follow-up email within two business days containing links to the slides and the recording of this session. And thanks to CouchBase for sponsoring today's webinar. And thanks, as always, for taking your time to attend and be engaged. I hope everyone has great...