 My name is Shannon Kemp, and I'm the executive editor of Data Diversity. We would like to thank you for joining today's webinar with Vladimir Bakvonsky. We will be discussing what, where, and how a polygot persistence and all resistance today will be entered to win a complimentary guest pass to our upcoming News Equal Now 2014 conference in Expo in San Jose, California, August 19th through the 21st where you can meet the mayor in person. Just a couple of points to get us started, due to the large number of people that attend this session, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen, or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag Data Diversity. Vladimir will be sharing his desktop, so if your screen goes to full screen mode, just raise your mouse to the top of the screen and you'll see a menu bar drop down with an icon for the chat, and there is a down arrow on the right where you can click and see the icon for the Q&A section. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. Let me introduce to you our speaker for today, Dr. Vladimir Bakvonsky, the founder of Sci-Spike, is the founder of Sci-Spike. He has two decades of engineering experience with software and data technologies in areas such as architecture and design of mission critical and distributed enterprise systems, rule-based systems and languages, modeling tools, real-time systems, agent systems, and database technologies. Vladimir has helped a number of companies including U.S. Treasury, Federal Reserve Bank, U.S. Navy, IBM, Dell, HP, JPMorgan Chase, Nokia, Lucent, Norton Networks, General Electric, BAE Systems, AMD, and others to select, transition, and apply new software technologies. Quite the impressive list. Vladimir is published worldwide and is a frequent speaker, session chair, and workshop organizer at leading industry events. I know he is one of our top speakers at many of our conferences, and with that, I'm very lucky to have him here with us today, and I will give the floor to Vladimir to get the presence started. Hello and welcome. So I will be sharing my desktop. I'm going to show you some PowerPoint slides. We are going to do some live drawing here. And in fact, I will start by sharing a story. This is a story of one of our clients. We have been developing custom applications for them. This is a company that is dealing with supply chain. So what we do in their applications is they're getting a number of files from their clients. Their clients are purchasing various things. It happens to be in the healthcare space. And you have a file with a lot of data. Basically, these files are showing what has been spent in various healthcare facilities. When we get to these large files, then we first need to clean them. These files contain a lot of errors. There is a lot of garbage that needs to be filtered out. And after we remove the obvious junk from these files, that can be very large. There are thousands of clients who are sending these files. And then we figure also, what are these products actually that the companies have in these languages? So where is the product system? What's happening here is we need to do some product identification. So how is this done? So the system that is using a database of products, there are a lot of products in the database. Then we have to match the product that has been mentioned with the product that we have in our database. So what is interesting is that in many cases, the data that we get contains errors. And computer matter how smart the algorithms are cannot really figure out what is this all about. So we have people who have fancy title, which are called Data Stewards. And we are going to get to the daily results of junk files and information. We are going to filter it out and figure out that, for example, format that is not that much. I think it is probably should be actually a 3M and a supplier. Data Stewards. What is happening next is we are packing that into the system that is doing analytics. So through this analytics system, we can offer our clients better products, cheaper contracts, and so on. Now at the end of all this, we should get some nice reports that tell us what should be the products that company invites. Now besides that part, this is dealing with very large volumes of data that are handling. This is essentially reporting intelligence. There is no strict real-time demand for that. Then there is a part of the business which is dealing with the e-commerce part. And there you have your physical locations with your shopping cart, but in a specific domain. So this is the e-commerce system. And what we are dealing with over here is we are still dealing with the product database. And also we have a need to keep track of what are all the different people and their roles who are involved in this business. So relationships between many parties in our system and the buyers, and what is happening is that they are behaving based on some contracts that have been established and have some validity and description rules that have been established. Now that the same entity can play a role of a buyer or a seller, we have distributed infrastructure, and so on. So what we would also like to do, we would like to keep track of these relationships. So it should be some system that should keep our data management with the relationships between different parties. So this is our party system. This application can get quite complex. There are lots of different challenges here. We are dealing, for example, here with very large data. We have our product identification. We have data that is typically semi-structured with our system that is dealing with parties. We have data that is probably not so voluminous, but it has a great deal of connection networking. So we are dealing with things like branches that show different relationships. Here with the e-commerce, the standard thing that you are dealing with is the realization data. You put things in your shopping cart, interacting with the application. And here is a question for you. What database should we use for this? Now, the answer to this would be standard answer. We have the relational database, the database that we are running in our enterprise, so that is what we are going to use. It turns out that the relational database has a number of challenges. And if we inspect this situation, the relational database has certain problems. And let's look into what are these problems and how we should be trying to address them. Imagine if you are telling a relationship that you have between power and capacity, and you have to figure out, all right, so how much do I need to pay for these things? So the thing is today, looking into the CPU power, it's really nicely, and when we look into our storage system, they are growing probably nicely. And what I mean by that is that if you need, let's say, four terabytes of space, I can pay two times more than for two terabytes. So I have a nice number of relationships. We don't see that much progress in the clock rate, but we have multiple cores. I can get machines with many processors. I can swap a rack with the relays and have a nice system. Now, to put a relational database into this picture, the relational database is going somewhere like this. Training the life is easy, but you are expected to pay more and more for your performance. So you can start with a small database, maybe something like MySQL, and then you can say, well, that is not good enough. You may go with something larger. What you are going to say, well, I need to use some other system. I am going to use maybe a DB2 database, as well as up to a couple of terabytes. You might wish this to be even greater sometimes, but perhaps you go here for something like sero-data. And a lot of money for your system. It's fine that you are also dealing with the area. That is basically out of reach of relational databases. This is a relational database. And this is what we encounter in the webcast space, a web application with a massive number of users. The first kind that we have encountered is Google, Yahoo, and the like. So what are we going to do? What would we like to do? So we are going to achieve the following. We are going to have a database, then we are going to have nearly linear growth. We are going to look at what is and flatten this curve. We need two times more storage or two times better performance. I should be paying only two times more, not four or eight. And I would also like to cover covering the areas I could not cover with my relational database. So what it costs is there is a third problem that you are trying to address, which is performance. Now it's out that a group of databases that are collectively known as the NoSQL databases are collecting some type of answers in this space. Now these solutions are not perfect, but they are allowing us to move the stations that we have with relational databases. When we look into these NoSQL stores, we are going to see several different designs. One that comes out to work very well is what is known as the KeyValue. It is a very simple design in which we have the KeyValue store, which we have basically something that would resemble a hash table. The KeyValue can be anything. And you can imagine what would be very popular for a value to be to have, perhaps, a JSON that is going to contain things like what we have in our shopping to have the company come up with. And then I think what could be a good value for a key. Well, then it would be some type of a session, like ID. It's a very primitive design, a very simple organization of our organization, and one that we have here is great scalability of this system. Now, today comes in two flavors. One, a flavor where we have this large volume of data, and this is basically what is the typical organization of data for the Hadoop system, whether that the key values are stored in a very large file, which is some type of ID, and very specifically a line of text, but these are the uncharted information that we have in our system. The other types of applications are mostly for stores that follow the KeyValue approach, and we have a couple of representatives. One that is very popular in web applications is a system called Redis, and this is very good for storing web sessions. Usually we'll have that both keys and values will be kept in memory. You get their math, access time, and then you also, from providing the ability to move your session information from individual web server to some area that can be heard among web servers. So this is one of the very common use cases. We have a number of web servers, and one is that they will store session data in a local database. So the user who is accessing the session data system is going to interact, but the data, the session data will be stored in the database. So this is access. Now you get some points later. The user is accessing the system. Valencer figures out that this guy is too busy. The first server is required to get forward to the second server. The second server has the ability to access the database and grab the data. So it's a very popular way of interacting with the web application. Of other systems out there, Mancache is a very popular system caching layer, which is called a RAIAC, which is providing persistent storage. But the essential thing here is that the data model is very simple. So too much data goes into Hadoop, into Hadoop's distributed files system, or we have a store that is typically using things like web sessions and digital information. So it's a key value store, either very much data stored or the address is the file. So the store that emerged was known as a columnar store. We are going to store the data as a whole. The third databases today are storing data in rows. Next is one of the founders of relational databases, you know, at the time in the 80s, when we were thinking about how do we store data to store the natural, but it turns out it was not such a great idea. So there is a whole group of systems that can store data in columns. It's not interesting and different in things like this. It can have a very large number of columns in such stores. There are systems that can support over 2 billion columns. And this is interesting. The system also supports Spark data, so you can have a lot of holes in your data, and they all vary efficiently. One of the only systems in this space was the system that was developed by Google. This is the BigTable system. And BigTable was used in various applications by Google. Some of the most notable ones were to download the web, store it in the database, index it, and so create the index that we are using during the search. The other systems that are very popular in this space, one of these systems is Apache Cassandra. And Cassandra is a columnar store. It's an interesting architecture. We store the information in a peer-to-peer set of machines. We call them nodes. And they are all connected with each other. So what we can see here is that the different machines are all connected to each other. And they excel over and automatic peer-to-peer connectivity and fault tolerance. So to write the data in a node, the node is going to talk to, usually, to all its peers and store the data recently. Usually, you can have two of these that have your data fail and your members fail. So the other system that is quite common in this space is Azure-based. And this is part of the Hadoop ecosystem. And this system provides for architecture. This is quite similar to Cassandra in many ways. There are notable users of this system. We have Netflix that is running most of its stuff on Cassandra. eBay is running eight eBay-spotted searches and they're serving over 2 million queries per second. So great scalability. They're very good performance. And the systems are quite popular. Also, they have one really characteristic, and that is that they typically have a very high capacity. And this system don't need to read the data before they write them to the desk. So they're very good. We often use them for time-series data. Whenever something happens, you can add a column that is going to contain the data. So you can go to your entries. And as you have less than 2 billion of such entries, you are in good shape. If you have more, there are some architectures and designs that allow you to define your data models to handle even larger sets. The color of the store is the most popular form of a not-equals store. And we are going to see now the most popular form, and this is the document store. And you are going to take something that is closed on it. It is not a PDF or a Word document. It is just a set of data. It is usually in documentation. So you want to have the structure of your JSON data. You want to put this in your database. Now, usually this type of database is not required for the schema. So the visual entries are very in their structure. And systems like this end up being very easy to use. They are very popular. The most prominent system in this space is multi-layered. It is focused on greater usability. You have a system that actually has a very up-peg HDB. And you have a commercial version that includes some caching systems, which is the kind. And usually enough, you are going to see also some of the traditional XML databases now competing in this space. For example, our logic is a very prominent XML store. So it is interesting that you can store a variety of documents with varying information. Very easy querying. Definitely the dominant form of a noticable store. So we are going to see once more that it works very well when we have nodes that are connected in many different ways. A node with a connection. And in a system like this, you might be interested how do I get from one node to the other? How do I get from my node A to node B? And you get various paths. We have this will go like this. You have another path that is going to throw out. And this may have different costs. You can also find various ways to do these that are connecting with nodes. The next is called the graph store. What is interesting about the graph store, they can store a very complex set of relationships between the nodes. They are typically not working that great for storing huge amounts of data, but the better languages allow them to do very efficient over the graph and path and finding the relationships, who is connected to whom and similar. The graph store, we have a couple of products out there. One is the quite popular Neo4j. You have an open source system which is called OrienDB. OrienDB is interesting because it combines the graph database for the relationship and then each of the nodes, they actually access the document database. So you have a combination of connectivity of the graph and also storing documents. OrienDB that is running on top of either Hadoop or FHECAsundra is the Python DB, which because it is sitting on top of our SQL system. Sessioning these different systems and products, I'm really not recommending one or the other. I'm just some of the typical examples of this. So this is the graph store. We come to finding relationships between elements. And the question is, these are the four types of non-SQL stores. When you use them, what are their characteristics when it comes to performance and scaling? So like that, we can arrange them and classify them based on their characteristics. So if we do something like this, where we put our vertical boxes and we put the complexity of the data on our simple access, we can arrange our system. So if we think about the system that is going to scale to the maximum, it's a fairly simple data model. For example, what data store would do well there? What would be the key value store? And the particular realization that it would work well for us is going to be Hadoop. So this is our key value store and our particular implementation as Hadoop. So this is our file. It is not a real model. And then if you want, but it's the processing system that you use with data in very large volumes. So if you're thinking about reaching simple accesses but a very good performance, not much good scaling, then you are going to see that there is the same representative of the key value store, but the one that is running in memory. And this is going to be something like key value store that we have, for example. We've read this other database that is going to store the session data for web applications. So very good for performance. So file, because everything is in RAM, but again, very simple model. And this is going to be a picture. It has the columnar store. Columnar store is not as scalable as Hadoop, but it can still store very large amounts of data. And then our next chart. So these are the core stores of each base and kathandra. But they have a couple of advantages. You can have your data model better defined. There are query languages. There are even SQL dialects that you can run on this. A lot of advantages. If you have a huge volume of data, you may be better with the columnar store. But for convenience, we see many applications. There is a database time that is going to be quite useful. And that is the document store. Document store in various JSON sets in our system. So well as the columnar store. So they seem to be more convenient for use, supporting web applications and great popularity among developers. Systems like MongoDB, CouchDB, CouchSpace, and similar. So the document store is not as scalable as columnar, but it's still good for many applications. And then finally, the database that you're dealing with with great complexity, such a great scale, is the grab database. And the grab is in their implementation of one challenge. And that is, it is very difficult to split the grab between multiple machines and still have efficient processing. So these will be challenges that they have. The grab database. As for other unique categories that exist, a lot of the performance on this grab is very, very quick. So this is the hierarchy of these systems based on the scale and on the complexity of data. So we, in your application, will be depending on what are going to be your demand. What needs to have in your application. Now I'm thinking about the application that you have. The question is, so where is going to be your limit? So you make sure that your limit can be somewhere here. The limit for your application. If your limits are below, then any of these systems could adjust to your bar higher. You're going to find, well, if it's here, then the candidate that I have are columnar store. This is going to cover my data. I could think of you, Hadoop, but probably may not like the best processing, but the document will be stretched. I made some effort in it, and I applied various charging strategies. I kind of like it to get more headaches than it is going to be worth. So we are always going to look for some compromise in this space, and you will find that no system is going to be perfect for a factory package. So now that there's four different kinds of stores, we come to apply the best store for each task. And if you are creating an application, an application for our clients, we are going to deal with a lot of elements. So remember how to introduce our example of this application as we deal with these different tasks. And in this application, we have some large data, so large analytics. The third dealing with the e-commerce system. And finally, we have their shopping cart. And then the e-commerce system that we have there is we have this data about our various parties that we are serving. So this is dealing with customers, which could be quite complex. So the same entity is engaged in the server and distributor and the buyer or something else. And then finally, we have also part of our e-commerce system, customer buying something in our system. Now, in this type of application, what kind of database is going to be the best technical edge for this application domain? Data analytics. We are dealing with very large amounts of data. And remember from our previous picture, the system that handles the largest volumes of data is going to be hardware. When you're talking about the e-commerce part of the system, dealing with shopping cart, dealing with sessions, for example, the system is going to be well supported with some key value store. So it's going to be some key value store. This can store the ID of the user and the content of the shopping cart in the value part of the record. Product is an interesting one. For the catalog, for the entries that describe the product, if you go to Amazon and browse through their products, you will see every product is going to have some set of fields like the name and the price, the manufacturer and so on. But you see that products are more and more getting additional descriptions. There are text documents. Sometimes you will have a PDF. Sometimes even movies are attached, many pictures. So the description of one product can vary significantly from the description of some other product. And we know for such a variability, the type of data store is the best. The second store would be very well suited for the catalog. Basically what we have there is a graph of relationships that connect our different parties. And you can think that the graph data base will show it for that. And in fact, in the industry, there are already some commercial offerings for the master data management system for customizations that are actually built on top of the graph data base. And here's a million dollar question. Which of these databases that we have seen is the best database for transaction? If you could, this is our old friend, a relational database. So a relational database is really shine when it comes to transaction but other problems that are addressed with this other database system. Now, if you look into this, you'll see that the nautical stores, like Hadoop, have great ability to scale. We have systems like key-value stores in red that provide great performance. When you run performance benchmarks, they tend to run faster than relational databases. Not so much like in many benchmarks that we've shown. And where is this coming from? Is there some secret? Well, there is no secret. So all these stores are going to make some compromises. And one of the most common compromises that has been made in many nautical stores, I will not talk about many, but there are nautical sections. In stores, you also have that there are nautical joints. Changing the way how you are developing your apps. So look into your traditional way of development. Typically, find that you start by looking into some use cases, some data models, and your logical data model is going to be really nice and elegant and hopefully can answer various queries you have about your data. In the database, you are going to store data, you're hoping to have data that are in a fairly normalized form. You are going to use a very powerful and quite sophisticated engine. You are going to submit various SQL queries. And this engine is going to calculate at runtime what is the best way of accessing this data and grabbing them from the database. And then you get your app. Now, what is interesting in this design is that you are creating this data model using the particular use cases that you develop, but the model is allowing you to run queries about something that you did not think about when you were defining the use cases. So you provide SQL in an interactive fashion and explore different queries that you can use in business. Now, the next thing is, is the best time to make answers for our use cases at runtime? We could use all of that and run it faster in the production of the system. This is your standard relational flow. If you apply this to our non-SQL systems, we are going to see that the flow there is different. And you'll find also some comments on how to deal with this based on the polyglot statistics. For non-SQL stores, you are also going to start with use cases. So what are the questions that you need to answer? But then for creating a wonderful data model, you are going to create a model that provides answers only for very specific use cases that you have. Your data is going to be created so that you can avoid all the joints. So you have to store everything as is in this. Perhaps you will do some pre-computation and rearranging the data. So when you read there are no joints, nothing specifically needs to be done. You just can read the data. And if you do that, then you are going to get to your report. You want to verify that the system cannot answer any question that is not designed into the system from the very beginning. And this is one of the big vulnerabilities that you really don't hear much about when you talk about non-SQL stores and polyglot systems. In the next setting, you must design your system for specific use cases. You add a new use case that you designed about from the beginning. You need to create a new data model that you need to add to your database so that it's a significant amount of work. And not only that, if your use case interacts or modifies the data for your use cases, you need to make sure that you update the data in this other data model that you have created. So one way of caution whenever you start combining your systems and working with non-SQL issues. Now the integration of this system. So let's look at the following. Let's go back a couple of pages and let's think about the following. So if you are thinking of integrating your systems, you have our data. This is from functional sources. So this is our relational database. So what we are doing here is we are going to put it into our relational database, run here some analysis, and then we're getting the results. So we are going into a variety of these non-SQL scores. I'm going to represent all of them just by introducing an area here. We are going to put it in potentially one or even several of different scores. For example, columnar. And then you can apply some analytics tools. Now the analytics tools in this space are not as mature as in the relational space. As far as they are not so popular as the conventional source. So one of the critical elements that you would do in the polyglot system is you may actually hit a movement that will go from your non-SQL scores in the relational database. This is a very common and we extract the interesting information from our non-SQL source and move it into the relational database. Now we do our everything. We are just adding the interesting things and then we can use our conventional analytics tools. And this is actually an approach that works very well, easy to organize in fact. So now that there is a data movement that should happen in the opposite direction so you have arrows that can go in both ways. Particularly at the beginning, most likely you will be extending your relational source or the non-SQL source will have data movement back and forth. Now when we look into that picture, we're going to find that there is also one shift that you see in industry. And that is the introduction of microservices. The microservices are one particular realization of service oriented architecture where you are breaking down your services into relatively small units. Some of these small applications can be around other lines of code. So very different from the old-school SLA where you may have a service that is in a line of code. The traditional approach in such systems was to have one service and have these different services access the database. So we have services that access the database. Imagine when you're running in the cloud the rather problematic is to find a place that is big enough to run our large database. So our large database is a big Oracle system or DB2 and on this serious main frame is very difficult to integrate in the cloud setting. So in such case, we are going to have that our services running quickly on individual machines that will be running somewhere in the cloud. And it is not such a good idea to have one very large machine or a main frame that is participating in the cloud that is not running our database. So what would be a better architectural choice is to have ability to run a number of smaller databases in this set. Our service would have in the database that it would be accessible. And this database can be played somewhere close to the service to make some benefits of the locality of access. And then if you have this several databases, you have multiple choices. So these databases can be additional, but they don't need to be. For example, if you find that it is better to use a document database in an area and then you can use it there, document database here, you can have a rack database here, and you have the database here. And this is what you would do if you need to achieve the maximum auto-technical capabilities of speed and performance scale in a particular area. But one key to note is that when we are introducing multiple databases in our previous life, where everything was stored in a relational database, we would have one headache. And this headache would be, how do we make relational database worthwhile for all these different kinds of data? We have replaced this one big headache with more of smaller headaches where we have problems in integrating this database. So our question is, how do we keep this system consistent? And one of the approaches that you want to find out there is that you are going to integrate using services or using messaging or some type of streaming. So you keep the database consistent with some kind of an integration. But the typical techniques are going to be messages, various streaming mechanisms. One of our systems, Apache Moom, is part of the Hadoop framework. And as a system, Apache Moom is enabling you to move data from one source to a number of different things using different mechanisms at the level of abstraction. It is running in a transactional fashion with a very convenient way to integrate variety of systems, relational, and Hadoop, and the variety of non-seq for a source. So this is something that you really want to cover in your system. So one detail that is related to the software architecture when it comes to this polyglot system is you avoid having direct dependence on your application to the database. So if you have some model with this logic, I would like you to avoid going directly into the database and using the dependent on your particular data background. Instead, as I said, it doesn't work. Instead, you have to have this logic that is going to decouple this logic from a particular source. So we have the data layer here. It is going to provide the interface to have multiple implementations. And one example of how that looks like is that you are dealing with a product. You have an interface that is going to be used by our client. So we have the API to examine different properties of the product. And then this interface is used by our client. So we have the client here. We have here another interface. The object is one of the standard patterns in enterprise software development. And the interesting thing here is that we are encapsulating the functionality to access the data. So you can imagine that we have, let's say, a real-time data product implementation called the SQL code that is going to Oracle DB2 MySQL, one of these. And over here, we can have the implementation that is going to go, for example, into the product. So what is using our system? They are first going to access the product data as a subject. We'll be using this interface. And no call that they make is going to be specific to the particular database. What we have here is a nice way of separating the functionality that we are looking for, which is accessing the product information from the actual database that is going to be used. So here is the aspect of software architecture that is really critical for a SQL. And that is to always isolate your database. Otherwise, you will have dependencies. And what can happen is that you would like to change from one database to the other. Now, what would you like to change your database? So from a operational point of view, you would like to do one thing, to minimize the number of not SQL stores in general. According to your CTO, and you tell them that while in this application we are going to use the relational database and the document and the graph and the key value store and the standard store, the question is that really necessary. It may be necessary if you are going to push all these to the extreme technical capability. But it could be that for your organization and your volume of data, it's perfectly fine to have a combination of your relational database, which you already have, and you combine it with the document. And this is enough. This is where you should stop. The document database is not going to be the optimal for other kinds of load, but it may be good enough for your application. And now you are dealing only with CTO database. So from the engineering point of view, you have to be reasonable and try to keep a number of different data stores to the minimum. But if these data stores cannot satisfy your requirements with regard to scale, performance, and the time to then possibly extend further, but only if it's needed. In team practice, sometimes the architect so fall in love with these different databases that they will apply a whole range of different systems in the same application, might as well be easy on that and up to that are really going to make a difference to try to minimize the number of systems that you are dealing with. But just keep in mind that these systems have the power of advantage over all stores and over each other. And we always evaluate the systems in your concrete environment. It's also saying that you should never trust the propaganda materials of your database vendors but to create specific use cases and tests that are illustrating the way how you will use the system and benchmark the system in your specific environment. It requires some work on proof of concept but trust the literature. You have to try it out on your own. So how do you qualify for these systems? You always try to find the best choice. It is not only a technical choice but also operational. You should try to deliver a real tangible value for your client. So don't be in love with your store but see how you deliver the value. And always integrate with existing systems. So at the end of the day, you might find some hints on how to do this. So the last slide that we have is 37. How to compete with polyglot resistance. This is one of the critical things. The knowledge in this area is still relatively scarce so make sure you either grow your champions or hire them. And the things are fairly volatile in this space. It's a secure field. There are always new things. Every six months, there is something really interesting that happens in this space. So always be on the lookout to new and better techniques. That's all for our presentation. And I'm looking forward to your questions. Thank you so much. This is me. I just love the live whiteboarding and interaction with the presentation. While attendees, I just want to remind you that you can enter any of your questions in the bottom right-hand corner of your screen in the Q&A section. And just to get started, the most common question that we receive is whether or not you'll get copies of the slides. And Vladimir is going to take copies of the whiteboards that he created. So we will build that into the slide presentation and get that out to you within two business days. So by the end of the day, Thursday with links to the slides, links to the recording of the session. And everyone is so quiet today, Vladimir. We're not getting any questions coming in. I think you answered them all. What's the common question you get, Vladimir, in terms of probably that persistence? This is a good question. So what is one of the difficult things in applying this new and old system? So one of the really interesting things is dealing with people. Dealing with them helps to be a rather interesting challenge. So what happens in many organizations is when you look into an existing data architecture or a database administrator who has spent maybe 20 years dealing with relational databases, some of them are in this new technology and they feel threatened. With every change in technology, there is a change in power in organizations. So they are afraid that they might be uprooted from their position. The key thing to successfully apply such polyglot systems is to reach out to these people and realize that this is not a fight against them. This is not a fight against relational databases. But this is something that can offload the type of load that is not ideally suited for relational databases. Move it away from relational databases to something else. They freeze the system from doing unsuitable work. This enables them to get the better performance. And one of the key things in educating people is to realize that not if no SQL systems have magic in them, they all achieve their advantages by abandoning something that is done in relational databases. And move it away from these types of sections and so on. Somebody asked, so how do we run our systems without infections? It turns out that this is something that we are doing that for a while in SQL systems where the infections are very difficult to achieve because you will need to tie together multiple systems that have transactions that run for a very long time. So what we do in such cases is we explicitly from compensatory action. So we are going to detect if we are on the right track, if there is a problem, if yes, then fine. If not, then we are going to execute compensatory action that is written out and that will restore the stable state of the system. One of the consequences of such design is that we achieve much higher throughput that is going to be on the optimistic side but then we can always handle the problem as they occur. We've got another comment here and of course it starts with a compliment and we've got a great presentation. This is Ben. And the question is are there particular use cases where you've seen Cassandra preferable to HBase? Thank you for that question. So Cassandra and HBase have rather similar architecture. They are both columnar stores. We have seen impact in both applications with our clients that Cassandra was much easier to deal with when it comes to setup when it comes to operations. Also there are claims that Cassandra will respond much better when nodes fail. The failover is apparently smoother in Cassandra and it is easier to have undetructed performance of the system. A couple of things that we have seen on the Cassandra side is that it is well with users is the existence of the query language which is very similar to SQL that actually made the adoption of the system among our clients who were all very experienced database people it made them easier to work with the database. If you have to program the API is simple yet powerful but you really need to have a Java person that is implementing this. There are some systems that are introducing a little layer of SQL on top of HBase but they don't seem to be so mature as what you see in Santa Barbara. Perfect. And I typically cut it off right at the top of the hour but if you have time Vladimir we have one more question coming in. How do you typically encounter concerns about the concept of eventual consistency? So when it comes to eventual consistency there is some misconception so what are the systems going to be consistent? Is this going to be a week or in two weeks? Well the answer is that in most of such systems the systems will become consistent in the range of milliseconds. So in distributed systems we have seen a range of hundreds of milliseconds is typically the range. And then you can also adjust your application depending on the needs of your application if your absolutely needs to be consistent you need to lock everything if you can off based on your application that you can be hundreds of milliseconds or a couple of seconds off in your application then this is a great use case for eventual consistency. We do have one more question and I'm going to go ahead and ask it it's a loaded one Vladimir it's one that we get all the time. Is there some modeling model for the NoSQL types? Excellent question so modeling model or the tools for the NoSQL types I was actually looking in this area quite a bit for various applications and I must say that there is nothing that is really usable out there for several reasons. One is that the systems themselves are different also the data structure that we have are rather nested in most of the systems so you didn't see any tool that is directly but there are more things that you can do. First when you're dealing with documents and stores you can introduce your data models for many of them important JSON or JSON schema. It works rather well it is this type of environment. So in plain editor you will be defining the schema in terms of JSON but it comes to stores like columnar stores and graph stores we don't see any particular tool that is helping with modeling so usually we would do a sketch on the paper or on the wild board and then we would go with the copy that's perfect and I'm afraid that's all we have time for today Vladimir thank you so much for this fabulous presentation just to remind everyone we will be hosting the recording webinar on data in two business days and then as mentioned I will send the follow-up email out to everyone so by end of day Thursday and don't miss the opportunity to meet Vladimir and hope everyone has a great day again Vladimir thank you so much thank you so much looking forward to meeting everybody at the most people in Santa Clara have a great day bye bye