 Well welcome, this is the old SQL NoSQL, the new SQL, what? That was my response last year when I came to the conference and I thought, what is new SQL? And the SQL databases that I've been a database architect for for years, they're old now. And am I out of a job? Is this what is going on here? And so for the last year, I've been doing research on the NoSQL space. And a little more background, I've written a book and as in a second edition now on CSS and HTML. I'm the principal architect at the Church of Jesus Christ of Latter-day Saints. I'm in the group that works with enterprise information. So I guess I'm the enterprise information data architect. The LDS church is a centralized church, so it's not like a little congregation. It's 14 million members all over the world. And so it's a very large multinational group. Everything you can imagine in a multinational corporation, we have to do it. So a very large IT shop, a large bureaucracy that comes with anything big. And so we have a big infrastructure group, multiple data centers and lots of data and lots of databases. And so my responsibility is to be, is the architecture for all the Oracle SQL server and no SQL databases at the church. Four years ago, almost five years ago now, we were looking at one of our architects brought in a product called Marflogic. And we thought, and I was at that time, just in our team was Oracle and SQL server. We're pretty much an Oracle shop. And that's my background is Oracle database tuning and engineering. And so my boss, who's also an Oracle guy says, we hear this Marflogic thing. I don't know what it is, but kill it. And so I just got back from Oracle Open World. And so I said, okay, well, I'll do my job. So we went to town and my job was to remove all traces of this abomination from our pure Oracle environment. In reality, after two months of trying to get Oracle XMLDB to perform well and never succeeded. I worked with a product manager. We brought in Marflogic. I find you can see, like, I'll even try this thing. So there's another key that was pushing, they were pushing and trying it. And we've said it's worse going back and forth on specifications and capabilities. So I thought we'd install Marflogic and literally two hours later, I get it up and running. And it's doing everything I should took me two months to get Oracle to do and Oracle wasn't succeeding. Now, in all fairness, that was almost five years ago. We haven't played with Oracle XMLDB since because we went with Marflogic and had great success. There were some internal things happened. A year later, I built the Marflogic team. We had huge wars internally, continuing. Now that I was an advocate for this no-sequel thing, and everyone else was now against me, it was very difficult for a year to fight these battles. But eventually it was very successful. We now have over 50 applications for going to Marflogic. All of our major websites are delivering using Marflogic. So it's been a great success story. And we just actually made a long commitment to Marflogic with a perpetual enterprise license. So as you can tell, my background is kind of a conversion story to Marflogic. But the other side of the story is there were some politics about three years ago. And the team that ended up owning Marflogic was not my team. And so I moved off and went back to enterprise database architecture. And Marflogic wasn't in my group. And so for three years, I didn't really do anything with Marflogic. And so the last year, we had a whole group of architects and they wanted to do research, no-sequel. And so we all got together and started researching all the different vendors, all the major vendors. There were about seven of them that we looked into. It's been a whole year doing it. Ironically, the whole time, and this is kind of sad because I have all my Marflogic friends here, the whole time I wasn't even thinking of Marflogic as no-sequel. And that's bizarre because it is. I was thinking of Marflogic as a D2 warehouse for XML documents and for website content delivery. And the use cases were to get for no-sequel very different. And so I didn't even consider Marflogic as an option. We looked at all the other majors that brought it in. We did extensive analysis, hit a million analysis. I can't share all that with you because we signed non-disclosure agreements. But in the end, I'll show you the conclusions we came to. So finally, after all this is over with, it hit me, wait a minute. We have a no-sequel already that's called Marflogic. And I kind of like that technology. Why am I not even thinking about it? It was really strange. In fact, McCreary earlier in yesterday was talking about how come Marflogic's not even in the mindset of people who think no-sequel. And I'll explain that later in the presentation, but it really wasn't. It was really bizarre. So I kind of almost had another conversion story. So we started debating should Marflogic, since we already have, be our no-sequel of choice in the enterprise. How does it compare with all the other no-sequel vendors? So we did that detailed analysis. And a result was when to management, we said Marflogic outshines the others in every category with few minor details. And there's no question. And so we actually decided to go strategic with Marflogic for not only XML and document delivery, but all no-sequel. And so this presentation will give you the results of my years worth of research. And just so you know, it sounds like I'm a real Marflogic advocate, and I am a fan of Marflogic. I just want you to know I've had several conversion stories along the way back and forth. When I did this research, I actually wasn't even thinking of Marflogic. So it really wasn't a biased research. So the tip is we're going to talk about what are the different kinds of sequel, no-sequel, new sequel. The new data paradigms, new hardware paradigms, and the real question is, does the enterprise need no-sequel? We know that Internet startups love no-sequel. And most of the groups last year that worked at this conference were Internet startup folks. There were a few enterprise folks. So the question is, is it ready for the enterprise? And which ones are ready, if any? So here's the slide that summarizes my whole research. I want to walk over here to the point things out. I have two dimensions. One is the hardware dimension. You have low latency, fast, online transaction application. And it's all about velocity in this upper quadrant. The bottom one is all about high bandwidth and litical. And I will explain later in the presentation why that is so important, because there's physical reasons why you have to optimize this way. And so databases automatically choose whether they intend to or not one of these optimizations. On the bottom, horizontally, you have five different major paradigms and then raw, which is not a paradigm. It's just whatever you got or whatever data format you have. And I see what you're going to do with it. So we're talking about these different paradigms against the hardware realities, and that's the space of no-sequel. And it breaks down into graph databases that are really optimized for RAM and very quick online transaction processing. And the key thing to think of there is your high velocity, which are also retrieving small amounts of data. You're not saying query the entire database, summarize it, aggregate it, give you the results. You're saying give me a few documents that match my criteria. That's what this top area is all about. And that's true for the document databases. These aren't all of them, but these are the selection that I feel like that are currently the top dogs in the market, but that's always changing. Column key value stores. I bring these together because really the column or databases, which are very different from the column or data warehouse, and those are your fluid data warehousing, realize that there is a whole technology group called column or data warehousing. And that's where you have a set of rows of columns, you have columns of rows, and you compress all your columns. And that's the really hot technology, the data warehouse role. This column or no sequel is all about really multi-dimensional keys. You have a key, and then you have a time dimension, which is just a segment of your key, and you have a column family, which is a segment of your key. So you just have multiple dimensional keys, and that allow you to hierarchically dig down into your documents more easily. But really all Cassandra is, and we are, an H-based hierarchical, which sparsely populated multiple dimension key value stores. And then you have new sequel, which was really soft to me last conference, was BoldDB and some other ones like ClusterX. And then I remembered, well, that's not new, Oracle times 10 database, and a lot of other in-memory databases have been around for a long time. That is not a new technology. But it is an optimization, and that's the important point. It's an optimization to restore your data in RAM for high velocity and low latency operational data storage. And then another one, it came out, Oracle announced Enzalytics, at the same time as the conference last year. Enzalytics is basically analytics in-memory database. So as you have your data streaming in very rapidly, it's streaming through your in-memory database, and then you pick off what you care about and do analysis on in real time so that you can aggregate, summarize, and deliver. And that's yet another optimization using a conventional model. Down here, we have other optimizations. At this same time last year, we were doing research to see if should we buy a new type of data warehouse, a current data warehouse, which is an Oracle data warehouse. It wasn't performing very well. And so we wanted to see what's better. So we researched all the major vendors, the ones listed here we included in our research. We had proof of concepts. We eventually ended up going with Exadata because we're an Oracle shop, and we didn't have the conversion costs that were significant to anyone else. So that was mainly the conversion cost issue. But we learned that Oracle and the marketplace is now changing the way they do things. They're optimizing for appliances. So the future of all databases are really going to be appliances because we found that trying to optimize, like I said by background, as performance tuning in Oracle, trying to tune all the layers from your physical servers, your operating system, the network, the storage, and getting it all to work with all the different teams in a large infrastructure group is near impossible because everyone is different and does it in different needs. And so vendors are now saying, we know how to tune everything to work for our product best. And so we're going to sell you an appliance that is pre-tuned and pre-built to make that work. And so we purchased Exadata and it's true. Exadata lived up to everything we've promised because they were able to tune theirs or software work on the hardware in just the right way. And then there's a lot more than just hardware here. There's a ton of software optimization for massively parallel processing that fits into this model of analytical volume and bandwidth. Now here's your single databases. They're kind of in the middle of the road, right? They're not as high velocity as they are high volume, I would say. So that's a little bit lower, but they are kind of multi-purpose. And then you've got your object-oriented databases, they're not forever, but they haven't ever been taken off as a technology. And they're very similar to SQL. They're just more specialized for getting rid of the impunity to mismatch between objects in your Java and .NET and your database. And that's a real big deal. But for some reason they've never taken off. I think I know the answer to why they haven't and we'll talk about it later. Then you have a document warehouse and a lot of the market logic down here. That's a strange place and some people's minds are right about market logic because market logic was designed originally to be a warehouse for XML documents that were published. And it's originally optimized for that. It wasn't optimized for high velocity. But you can optimize market logic for high velocity. And they're just starting to do that. But by default it's not. If you do performance testing out of the box it's not going to give you the high velocity that you can get from a Mongo unless you reduce customization in special code. Well, that's actually what a Mongo does. Customization in special code. So it's trivial from our body to get there. But right now the whole architecture is really strong right here. And I think that's the main reason why you don't see them lumped with Mongo and Couch and all those others that are high velocity because that's what they focus on. Not being able to do the workouts. And then you have to redo over here where you just load anything you want into raw data run Java programs to check whatever you want out of it reduce that down and deliver it to a warehouse. So this is the end result of my analysis of what is no SQL, new SQL, and old SQL. And then we can talk about what does all this mean and why do we care. So here's the data flow that would happen if you would actually apply all this. And this is a little scary. I showed this to my boss and he freaked out. He actually literally freaked out. He goes, you mean we're going to have this nasty architecture in our enterprise? I mean it was bad enough when we had originally just an app going to SQL going down to a data warehouse. That's two moving parts. And now we brought in Mark Lodzik and we got this document warehouse and we have document management systems that publish to it plus web apps that publish to it. And then it has to be data over here in the warehouse. And until the latest release that was very hard to do. But now the new release Mark Lodzik actually supports SQL. And so that's a huge win for us. So now we can get data out of Mark Lodzik into a data warehouse as a standard communication protocol. But then we're going to say we have that produced now. We have to do it in our enterprise. There's still some corner cases. But now we brought that in and it usually does all the large, massively quick reductions of data so we can get it into a warehouse. And then we had several things that actually built and used graph data bases and document databases. So Mongo, some RIOC, and some Neo4j. And so all of a sudden we had all of this. And how do you integrate that? I mean, my job is data integration. How are you going to manage all of these different kinds of data and different data systems? How are you going to do the cost? Think about the infrastructure cost if I were to support all these different servers. We have hundreds and hundreds, that's thousands of servers. And how do you manage all this different complexity? It has a longer-level complexity to our key. And so he freaked out and he says, we can't do this. We need one vendor to rule them all. We can't have all these vendors. But right now there isn't one solution. The output data flow is not much better. You have to write queries in all these different languages. You have to know XQuery. You have to know SQL. You have to know proprietary languages like MongoDB. In fact, every single vendor has a proprietary language that you use to interface with their product. And you have to learn all these different techniques and data out of the system. And that's a lot of query. That's a lot of development work. So this, I know the earlier in the earlier session they talked about, you choose the architecture that makes the most sense for you. But when you start doing that here's a long-term technical debt that you incurred. The more vendors, the more technologies, the more your applications use multiple technologies, the more in debt you are to all the system and the harder it is to update your system to maintain it and your availability decreases. The more servers I have to go to to get a page rendered my overall availability declines because you multiply the availability of each type of server you're going to or each server you're going to against each other. So if one is 95% the best I could be available is 95% but you multiply that against your 98% and your 99%, it drags everything down. So you have to think about all these things when you create a complex architecture. And complexity breeds instability and failures, software failures because different pieces don't work and bugs in your code. So this is not a trivial thing to embrace for the enterprise that cares about low cost offering applications embracing this complex no SQL world is a scary thing. But it's a cool thing because with the graph database I can do things I can't do otherwise and same with the document database and new SQL gives me that flow of real and flexible queries and then regular traditional database which is lower cost and then you guys warehouse so each of these things gives you different capabilities. So how do you manage all this? That's what I'm going to talk about today. So there's new native paragraphs I touched on this several times. One of my favorite books is by Thomas S. Kuhn we touched on the structure of scientific revolutions and when you have a paradigm everyone holds on very strong to the old paradigm like my boss William Orwell will say you can't let this market logic abomination in here that is a market logic and no SQL is a new paradigm. Graph database document database is new paradigms which really are legitimate paradigms and it's hard for the establishment to accept new paradigms until as Thomas Kuhn points out that the dissonance between the old and the new becomes so great that the new can be things that the old can't be and then the new has crew in it that the distance between you is so great that the old is a little a revolution happens and a revolution can be bloody hopefully not literally but intellectually bloody right and then you can transfer over to the technology. So what's wrong with our SQL databases? I found this picture on the Internet and it's open it's an open source domain picture I think there's three things and you'll see you'll hear about our variability of writing a lot right and for good times but even for a lot of relevance and that's one of the weaknesses of most of those SQL solutions is that they don't leverage document relevance if you have a document database and you're not leveraging that you're just in the boat and most of those SQL solutions that are documented are we have this have we seen and they're trying to get into search and that would help but if you're not anywhere you're like mark logic mark logic is that's what builds our whole system on relevance how can you find a data you're interested in because the relevance is all about so really to me that's not what SQL databases are here at you can use context index in oracle and I know that that technology very well but it's very cumbersome it doesn't work anywhere in here as well it's mark logic that's not any workflows so and you have to in order to get relevance for your SQL database and you have to take everything you care about turn it into an XML document it's for the XML document in the context index so you're really emerging both technologies a document database inside of a relational database and using triggers basically to ensure that your XML documents are kept up with your relational data so variability talking about rapid change and schemas and schemas development some of you who are thinking through these there are many of you talking about all kinds of data in various forms and how do these new data structures paradigms new data types different market status so relevance is also about taking an area a story a document and you add data to it and what we're doing is we're discovering in the document what the data is then we discover the semantic relationships between the data that's really a graph database and then we discover the structural relationships that's your document hierarchy and then all together you can use your contextual information if you think about SQL database we just shred all that stuff out into a schema and we define what always did it maybe those L's were put in O's and A's and it's a patch of data like T's and topics you may go into the topics so I'll just shred this data into a topic schema you know but if that's all you care about that might be fine but there's stuff you don't know about and how do those topics fit into the overall context all that is lost in a relational database but in a document database it's all there and so you can keep discovering oh I discovered how much more should be in places and people and let's add those to it you know and then I need to discover more relationships between the data there's infinite number of relationships implied in this document and every one of us in this room would actually probably see different relationships that others in the room don't see and so over time people will add relationships and you can't build that relational database it's an infinite number of possibilities that's where we are and then you're about to construction if you want better search results you need to have well I'm going to search within my headings because they're more important than the body text they're going to give you more relatives better meaning to the end user I'm going to search annotations I'm going to search all the headings I'm going to search you know whatever I do I'm going to I'm going to search people and I'm going to rank all these different types of things in my search results so that this sorts so that's other parts like autonomy of the search users don't do it's allowing you to search within a context and a context of structure the weakness of work body though is it doesn't yet do a good job of semantic graph searches and that's critical for its seizure as a technology and they need to do that by the way only graph databases today typically the semantic versions which aren't in this conference you'll hear the graph databases semantic databases and those two graphs are no SQL semantic are more document ontology oriented experts how old and so the web kind of technology those people start talking working with each other and they're really the same kind of thing and it's a different focus and so the document-centric ones are doing this kind of thing and that's a place that will be if we could merge that into our no SQL solutions well better no SQL solutions so what does SQL do that well it's really very important that's not a bad thing EF-COD proved that 50 years ago and then we have the initial model which we put into warehouses and then we have an object model which are Java.net apps and other object-oriented languages used which is very different and it's not it's not a persistent model by the way and I know talk about that language to that and then the document model we can talk about and then we have the graph model so we'll talk about each one of these real briefly so the relational model this focus is on the data and you have a I want to be it's called relational but the relations are not the focus you're focused on how can I take data put it together into a set of data a table and then I got to connect it if I have to do something else and I do I shred it as I connect it to a few things and the video is back and these were invented in the days when we were shooting data processing apps remember that term was real popular you don't hear that anymore because audiogram program has changed from to sorry data entry apps now I can't even say right usually dating entry apps because we're entry data into our data systems now it's data processing apps because we are getting into processing applications it's not really a good variability of relevance the racial model is really good flexibility in querying operational data you want to see your data in any way you can imagine because the modeler will take the data and shred it into flexible structures you can create ways to anticipate when you start it but there's no time cost a lot of design a lot of thinking a lot of generic thinking to make that happen dimensional modeling it's not on data it's on information data that's in a relational database I restructured into a star schema and now it's information because the data that I care about are the facts the facts are in a fact table like in 3 years when we were doing drugs and giving them surgery and here I'm interested in drug doses and now I can query it but I don't know these dimensions that are surrounding it I can say this dosage and you can do it very easily because the structure is a dimensional model so dimensional model takes a relational shredded data it's very hard to query very flexible and makes it into a less flexible beast that's not flexible I can't do anything but query drug doses but it's very easy to create anyway you can imagine and because it's so easy end users can query it very quickly and get the data that they need because it's surrounding data it's a model that's like putting data in a context so that's why it has good relevance and users use data where else is to get relevant data and the data they want it's got a decent variety because you can extract data from about anything the problem is you gotta write a code to do it and that code's brittle if the source changes your safe breaks and we I'm very tired but I'm also dark-tight for our BI team and we have 45 40 it keeps growing and we have hundreds of ETL jobs and if the source system changes any data breaks our job and you can imagine a large enterprise they don't talk to you they just change their system and breaks and they go what wrong and they can even just change data not even the structure right and break your ETL because there's assumptions so it is brittle also it's not great on variety and it's very good but it's great for transforming authority to native to contextual information to get good relevance the object model is like I said it's not persistence it's really what Austrian language is or baseball but remember C++ the founding Austrian language didn't even have persistence at all in the language or I see you didn't have persistence in the library as you add it persistence was never a thought in in any of the major Austrian languages C-Sharp job in that's an afterthought it's a relational paradigm and an Austrian language it's a completely different paradigm that entrepreneurial mismatch is huge makes development so explicit well auditory programming is all about process it's about creating taking data and encapsulating it inside of an object and hiding the data and they really call information hiding notice the database is the opposite the database exposes your data so you can query it flexibly well auditory methods processes and then I can inherit processes I can group processes so auditory programming is really good about managing process you know what can I do to my data and managing that complexity of what I do this is where the auditory the database vendors miss the boat they there was this auditory the databases started to come in so that it was less the fetus mismatch and so the the major vendors said okay we'll do auditory relational mapping this is great and we'll also the database well they forgot what auditory programming is it's about the methods it's not about these minor object relational objects stuff that they say and then no one uses this and hundreds of databases don't use it at all it's about encapsulating data inside of methods and managing methods and the database vendors did nothing to encapsulate data databases with methods the for example language in Oracle is PLC it's not auditory and they did no efforts to make it auditory so that and that was a mark I mean Microsoft's language T SQL transactions it's procedural it's not auditory so they missed the whole boat on bringing the auditory and each of the databases and so because of that we have this problem relational mismatch and that's a disaster so auditory model is really good to see the supplier development the document model is a nice compromise between auditory and relational it brings it all together and I really like this model and it doesn't replace relational a simple point it's complementary it's different so the document model talks about putting objects inside of a context just like dimensional but in this case the dimensional model is very restricted to the context you put it in the dimensional model says dimensions and but an object a document model you can have any kind of structure you want inside the document anything as complex as you want well in the real world auditory and programming just allow you really complex structures and link lists with a lookup list V trees and then you can have any kind of connections with any structure it's just incredibly complex and I've been a developer for 20 years and I built various sophisticated modeling models I'm going to try to keep it simple as possible because they they tend to explode in complexity and then performance hangs but you still have lots of flexibility power a document you can take whatever complexity you have in an object model and stake it in your document and auditory models do need to save their their objects in a point in time because they have to persist them at a point in time and what you do you wrap them into a document so it turns out the market logic does this too but they came from the opposite perspective they took documents that are published by people you know XMO books and magazines and whatever and then they published them into a warehouse they are documents it turns out they agree the same thing the difference is a program is generating your document which is more complex both documents are very complex but they're very complex based you know you can generate magazine articles you want to deliver out to the web and those they're about programs they have auditory programming we want to save versus our data two different approaches one is and Mark Lodge is more analytical and this is and this is no sequence more transactional but there is no difference really in the document and then Mark Lodge realizes now that Json is a native data structure in Mark Lodge there are other programs to persist their Java objects because Json has JavaScript notation so this is an object notation that you persist that in your Mark Lodge database so Mark Lodge should have made the jump from being XML document database into a SQL database with those changes so XML versus Json they're very similar but almost identical XML is more complicated than Json XML allows text to be freely distributed among your content so here's the word D it has no object wrap around it has no element wrap around it's just floating there and other elements are free floating in there and Json that word D cannot disclose it has to be part of an object and that's the primary functional difference but it also just seems like you have to have namespaces in XML which complicates the language and makes X3 a real pain to use it has no standard for precision typing so big types are undefined such as you have and that's not good for data integration at all so that's the strength of XML but XML is hard to read in parts it has more overhead Json is very easy and simple and so Json is replacing XML as a data interchange language across the enterprise in my opinion anyway so the document model is really good to enable search relevance because you can compare what's in your other document it's really in us versus in comparison but you have to have a container to do this comparison that's why an object-oriented database doesn't work they don't have containers your objects are just objects that you persisted they're not graphically containers so you can't say compare this set of objects to that set of objects without running a query in the different queries and so that's why you can't get search relevance right easily on an object-oriented database but it's very easy to get out of a document-oriented database just say and it's more relevant to me but then last one is the graph model we're managing relationships we're relationships with a focus now and so it inverse everything in a relational database data was a focus and in a graph database we care about the types of relationships and we actually define types of relationships the data is not the point in fact I would have if I were in a graph database I would just be some ID what identifies me or all the relationships of everything about me it's the relationships that define me not my data point that's the exact opposite of a relational program where I say here's a mike here's a table and here's everything I want to know about mike Bowers in this table or a couple of tables because I don't know how I do it and we're in a graph database I'm just a number and then I have properties your brown hair blue-green eyes you know by date whatever all those things start to define who I am and there's an infinite number of relationships like an attitude and the relationships are tight notice that in a relational database there's no tiny relationships and your modeling might document it and it's meditative but it's never used the relationships just say this column is related to that column in a relational database in a graph database you say this relationship has a type this is a hair color type and it has a data type and a metadata about the relationships so I can say when I assign a hair color type to a person and it's brown that tells me something about the person the relationships are a critical thing that's cute I think this is definitely a graph model's future to the problem is at a data and atomic level you're dealing with at least a relational database a hybrid you can get your mind on for a table it's easy to look at the data and the table and figure out what it needs and a graph database I have a number now what does it mean well I've got a bunch of other numbers that have types what are those oh yeah that's brown color I mean to figure this out there's so much interaction it blows your mind how do you manage that so we have a lot of tools that have to be developed to make a graph databases easier to manage but it's great for enhancing tech search you can't this is a really strong statement but I really believe this you search relevance is okay in a document database by itself the things I was talking about before about what's in this document versus ones and others but if you want a real good search relevance you have to have a graph database or graph database techniques in your document database Google right now is proof that they're doing that they purchase a graph database vendor and Google gives you a whole list of things on the side I don't know about that person that comes from a graph database not just semantic database so semantics are the future of search relevance but that's beyond this discussion but it needs to be important so here's the model takeaways we've got five different document five different models and each term is very important and they're all going to be around for a very long time these are paradigms these are not technologies okay so that's how come we have those paradigms at the bottom and how all of those SQL systems fit into the five paradigms now we'll talk about hardware paradigms we've got glossy volume do we need to optimize differently he bet the difference is cost RAM is more expensive than disk that's the bottom line of this but if you look at this chart on the bottom right I do the cost for each of these technologies flat RAM and hard disk for volume gay bikes versus the bandwidth but getting to it the apps they can do and also latency and when you do a full analysis of all these costs it turns out that you have to optimize if you want to go for big volume you have to you can't afford to do it RAM it just doesn't work I try real hard to figure out the economic model does not work maybe 10 years from now we'll be in there but today we're forcing decisions and so what happens is we've lost the optimize on the top of a different RAM and it's going to be about small data in RAM but we're going to use RAM as a big throughput mechanism to get our data on the disk but when it's on disk we're not going to touch it very much we're going to keep the data on a working set in RAM and if you have to go to disk well there's going to be little pieces of time in this to keep it in RAM and take stuff out of RAM basically it's going to be cash so I give this transaction a little stuff going in and out of a big RAM cache and that's why no SQL systems evolve from M cache D because it's a big cache system on the bottom though a warehouse is the exact opposite I've got all my data on this to begin with I've got to process much of it or most of it to get an answer I've got to crunch to aggregate stuff I have to go through all of it so I'm going to optimize my technology very differently for that and so that's what we've seen the vendors building the appliances and the appliances are specialized like for example Oracle will exolytics optimize for RAM for in-memory real-time analysis and then it takes the data for the bottom part to do the warehouse so here's here's the problem with regular SQL databases old SQL is that they're serialized and the serialization is killing them so you've got this was from Microsoft but last time you've got four ways of databases serialize their stuff and it kills performance and you may need to be 4% of real processing and this is really sure like I said I've always performed this and I'll go in there and look and I'll say you we're just serializing so much stuff that CPU is not even running and I don't can't get into the database why is that? it drives you nuts and then you go is it one of these four things which are these four things Tom Pye who's a Oracle Boomer wrote a book called Oracle Database by Design and the whole point was you write the queries and here are latches latches are in-memory locks in-memory locks that kills Oracle performance is locking in the memory why is that? because cost of synchronization in-memory is huge a single thread can return in 300 nanoseconds on a simple like calculation but if you start doing locks look at the bottom line 224,000 nanoseconds because we've done two threads with a single lock just doing a lock on two threads for two threads on the mutex on one case data will be a thousand times slower performance and that's what Oracle does in-memory cache you think it's faster when we're locking on it so it's not so that's where no SQL comes in they say no, we're not going to have any locks at all that's one of the secret sources behind HopeDB and because they force you to use broker procedures for everything they force you to basically compact it by design don't do things with locks and they force you by design not to do them and therefore they're fast so here's I can't look at this but here's the two paradigms they go over it and then it works on so the new parallel SQL hardware program is a secret massively parallel high-performance computing program so why would you change the database architecture because it doesn't work well maybe it's faster all the time well it used to but ten years ago things changed because we had more of a creating exponential growth of transistors but are possibly climbed out because heat and the cost softness then our power well the cost power and then we have flat IFV in the structure of the parallel is flattened out that gap between exponential growth of transistors and flattening out everything else forces to go to multiple cores and on multiple cores you think well the same cost is whenever you're good no it's not I've got multiple cores sharing the same RAM but does that mean locks locks kill you so lost in volume future architecture going to have to be massively parallel asynchronous some RAM for boxing discipline and a massive parallel functional programming languages like XQuery remember SQL is not only not how the terrain of SQL is not functional and functional is a future but I'm kind of good at that so here's where we turn into high velocity RAM based low latency system on the top and little systems on the bottom so it's normally no SQL well the developer gets to do everything in no SQL it kind of punted they said well we can't handle concurrency so it's your deal we can't handle asset transaction so it's your deal and I mean facetious some are more so that way and some are less so that way but that's a huge deal and I want to briefly through the cap term which explains why they want to do this and then the problems are doing this so there's the consistency availability and partitioning I like to keep the sample more than partitioning though I think it has to be here's the cap term in that show we have scalability today we use the scale vertically now we're scaling horizontally all the way to multiple data center scaling we're scaling within the CPU to cores we're going to multiple CPUs multiple servers within the data center and in multiple data centers every time we scale out we're creating communication lack of performance order of magnitude slower communication and that creates that server they talk to that data center so that creates availability problems and then the other thing is the more I get away from a single core I'm having consistency problems but I have to synchronize my data across multiple running processes and the further I get away at multiple data centers the bigger the consistency problem we have so my availability of my consistency scale in different directions my consistency and because of that we have to make choices so SQL used to be sitting back there on one CPU it was designed for one CPU this scale is great one of the CPUs got faster Oracle's SQL server scaled beautifully but now we have to deal with multiple cores and multiple CPUs multiple servers that is creating one more problem for the database to scale and they're going to appliances to solve that problem but NoSQL says we don't want to scale vertically anymore instead we want to scale horizontally by design and so NoSQL has to start making choices between consistency and reliability because the more you scale horizontally the more you have to trade off between it might have consistent data that's synchronized and right up to date like for example I have data two different data centers I have heated up to date consistent I have to commit to both before I can say it's there consistently if I do a two phase commit across the data centers a latency is going to kill my performance I have to wait for two different servers two different data centers to come back and say yes I committed across two different data centers I have high performance and high availability so all of us make it a synchronous and so the count there really is all about going asynchronous to have high availability and high transactions or synchronous to get consistency if you have synchronous synchronous is a synonym for serialization serialization kills high performance and kills availability so that's why I need this trade off so it's communication to come up to a point and that's reliable and then we have consistency availability so that's where NoSQL is innovating so we have these tradeoffs when NoSQL says we'll do it by the transaction I want the developer to decide what we want to do and the developer can say I want this transaction to be consistent I want this one to be available and the developer can go to work but also most NoSQL systems don't provide asset transactions people forget that RDVMS standards for relational database systems they forget that AND is AND is not administrators DVAs easily managing the system AND is managing concurrent transactions it's ACID it's the ACID paradigm ACID is a paradigm it's a consistency model the CN ACID is actually redundant I mean consistent really is the byproduct of a common isolated durable I don't have time to go into the presentation but there are the slides those are great details developers have to kind of work to compensate for the lack of ACID compliance and so some NoSQL market logic is ACID compliant or noSQL is ACID compliant but others like Longville, Esondred, Riyadh are not and this is a big deal it's the bigger deal than you think we're so spoiled by relational databases and managing our consistency model force we forget what that means I just want to say in a nutshell if you've ever read multi-traded programs imagine now noSQL system millions of users around thousands of concurrent users I should say coming in your system and currently changing data and managing the consistency of that not trivial at all to develop and then developers could completely underestimate the cost of that so that's something you have to think about for enterprise applications if you're not ACID compliant you're not going to save money why do you have no SQL app you'll save money you'll be finding concurrency bugs like crazy you're doing that you'll go crazy just like I don't like other programs it's not easy so I'm going to talk to you about these things so if you the trade-offs are this if you want higher performance and higher availability then you'd rather ACID model because it limits those things but if you want less data loss and one more accuracy less deadlock and one more data integrity and less code if I would say for lack of those things then you want an ACID compliant or ideally you want a system that the developer can choose and you can say for this transaction I don't want ACID compliant and that is that's one of the secret sources of no SQL that Mark Blood can't get to do by the way Mark Blood is always ACID compliant so that would be that would be one thing if they wanted to super high velocity and volume something they would consider it's allowing you to tune that so is it intermediary for no SQL today well these are nine questions we asked and here's the nine answers we did a year long study of seven vendors and yes we need a document I talked all about these things already yes we need better search rubble that's because predation they basically don't do a great job of that Sparkle by the way for the graph database is a very important language that the graph database is still an implement but the semantic ones which are really graph databases they do so that's kind of the difference if you're semantic you can Sparkle and Owl and on holidays and if you're graph it's just job integration do you need global availability? no not that we don't need it but we don't need no SQL for it because we've had global availability in relation to databases for decades we've had multi master systems for decades the problem is they don't work people don't use them because there's problems with and no SQL is no different it's not a magic thing they can save to the developer and you have to handle it and we need to have a volume well the after let's start with big data we do we need to put some volume so I think it's great for graph processing and Mark Waddick's great for ad hoc querying and it's a great string so Mark Waddick has a great big data solution for ad hoc discovery interactive discovery to do you program in a matter of these jobs it's not just ad hoc you have to figure out what you want to do program it and deliver it and Mark Waddick is literally experimenting with his ad hoc what is this what do you try this ad hoc query and it's iterative it's a different approach do you need for velocity sometimes we ran the analysis and we just really could not find good use cases in the enterprise for high velocity in our particular use case I think enterprise some do if you have stop transactions that's high velocity there's definitely some cases for high velocity in the enterprise but I think most enterprises have less than any high velocity that you need and this is a predation and this is the key all the archives are going to be all time set I believe that too that no SQL is less expensive than SQL because I know what our Oracle and our SQL server costs I'm in charge of the cost analysis so I know exactly how much it costs and so then I did analysis of SQL and Mark Waddick and no SQL to figure out which one is the cheaper and I was shocked this blew me away and the reason why that Exadata which has usually pricing the millions is cheaper it's because the appliance allows you to get economy of scale and performance that you can't get by building yourself it shocked me I still every time I say it I cannot believe this I know the price tag on that thing but I run the numbers and the cost per gigabyte and that's actually easy tactics I know what it costs us and how many gigabytes are on that that are usable is cheaper and then I run the cost analysis for all the others and our SQL server and Oracle databases are medium cost Mark Waddick is a medium cost no SQL is high why? because no SQL redundancy is at the server level so however we redundant in Mongolia I have to have three servers with the same data on it which means I've tripled my data size if I triple my data size I have three times the sand storage sand storage is very expensive we pay $5 a gigabyte for our high end highly available EMC DMACC storage that's not the exact price that's the price that our storage charges to us because it's marked up somewhere but so I don't want to give you an impulse price that's the price my team pays for storage for VMAX high available storage and that storage is decent storage but it's already redundant no SQL as soon as you have local disks that aren't redundant it's cheap I can go to Walmart and buy a $50 car or buy a disk and that would be great for no SQL but you can't do that in an enterprise data center they will let me stick a Walmart disk in there you know they will let me stick a server by a cheap Dell server that's really reliable that runs in my whole entertainment system I pay 300 bucks for it they will let me stick that in the data center for no SQL they may be going expensive blades why? because of power and space and then they assume the old paradigm so I can't so no SQL turns out to be way more expensive plus by the way open source is not free I don't get the pricing models and some of them are extremely expensive more expensive than oracle so that everybody thought oh it's free it's open source you actually price it out because by the way if you're making identity across multiple servers you're buying more servers which is more licenses and more support costs so I actually priced it out and I was shocked I really didn't expect that I thought it would be cheaper but then that's also I broke it out by price per gigabyte versus price per transaction it's not so bad because no SQL is very fast on the transactions it keeps the price low per transaction so that's an important consideration are you looking for volume or velocity? if you're looking for volume no SQL is not going to be an achievement solution it's development faster well it isn't mark blotter we've proven that we have 50 after an minute and it is about twice as fast as the development app on mark blotter versus Java or .NET but it's not sure for most of the other no SQL solutions because they're not asking compliant and the other thing is like we talked to all the key value store vendors that includes or was no SQL Cassandra, React all the key value ones HBase, etc they all require you to store data multiple times if you want to create in multiple ways I want to create by date I have a date key I want to create by person I'll better store it again order by person by geographic location I'll be stored again order by geographic location because they can't search inside the document efficiently get you the results you want quickly so their answer when we ask this question they have to store multiple times store the same data stored in different ways by different keys you're wasting disk space up to cost and then and since they aren't asset compliant if you have a problem writing it in all those places and it fails it comes back and says well I've got it to some of them they're good right and you go so my queries return the same results but I'm sorting by date or person or location or whatever and you don't have any way to guarantee it got there and so it's not a trivial thing and they make it out like it is if you have a trivial app it may be trivial but it's not and so the answer is our project is absolutely right for the enterprise we've proven that we bought it four years ago as proven enterprise worthy we did spend a lot of money to bring it in the end rate of the enterprise something to consider you bring in a thousand to a million to get it integrated into your environment to the engineers and scientists to support it that's a cost all by itself but also the thing is the other solutions are like version one version two and we talk about basic things like security you have security and your database you have a password to protect your database well some do some don't you don't have a password so a hacker just breaks through my enterprise firewall which they do all the time by the way no one is secure from hackers they get through ours all the time they have intrusion detection they get in there and there's no password to your data there's no secondary level of defense no your apps are supposed to do that you can do a local firewall on your database I mean the level of maturity there is very immature because it's antistartics and it's a good fit don't be wrong I'm talking critical it's just they're not thinking about the enterprise Mark logic for example was built from the beginning for the enterprise that's their customers were big published companies in the government and they wanted they had enterprise needs so they built an enterprise system notice the other most signals are built for internet startups so those needs are very different and so if you're doing the internet startup those signals great but I'm talking about enterprises will it be mainstream? yeah I think in a couple of years a year or five years more and more of these other most signal solutions will become enterprise-related there's a big money in the enterprise enterprise pay lots of money for this stuff Mark logic is working there but it is still in the high phase Gardner puts a smack dab in the high phase Mark logic is a little ahead because they've been doing this for five years before everybody else started in the electrical space so and the end result is my recommendation is that enterprise needs no signal for these reasons I list and those are real legitimate reasons and that's why the LDA structure bought Mark logic to do those things Mark logic is ready it's the only one that's fully ready today for the enterprise and the other no signal solutions will be there eventually and by the way I really feel bad that it's not like a Mark logic commercial but it's just the reality of the analysis and I really honestly forgot about Mark logic for that whole year it was bizarre I looked back and I was thinking what's wrong with me and when it came back and I said duh because it just was a different paradigm I wasn't thinking of Mark logic because those signals just didn't even come together in my mind and when I did it was a perfect fit so we have a long relationship with Mark logic now that I explained because we just chose you because of this analysis