 Hello, and my name is Shannon Kemp, and I'm the Executive Chief Digital Manager of Data Diversity. We would like to thank you for attending the most recent conference of ours as database now online, the first of this now conference produced by Data Diversity. And we are excited to wrap up the event for today with this final session. And of course, a special thanks to our sponsors today who helped make it happen. Just a couple of points to get us started due to a large number of people that attend these sessions. You will be muted during the event. For questions, we have a short Q&A at the end of each presentation, and we'll be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights of questions via Twitter using hashtag DBNOW. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the chat icon in the top right-hand corner for that feature. And for this event, we will send a follow-up email next Monday to all registrants containing your unique login to access the recordings and the slides from today's presentations. Now let me introduce to you Mike, who will be discussing data modeling for document and graph-nose SQL databases. To give you a brief background, Mike Bowers is a principal enterprise data architect at the LDS church. He has spent over 20 years as an enterprise architect, software developer, and database architect. In 2008, Michael brought the Markologic SQL Database LDS. Today it delivers 1.8 terabytes of data to 95 million annual visitors with billions of page hits across 182 websites, transactional applications, and web services. So with that, I will give the floor to Mike to get this session started. Hello, and welcome. Thank you. I'm excited to be here. That's a great introduction, so I can just skip my introduction slides and jump right into the content. Today I want to share how we are turning relational database models into nose SQL models. So the idea is to transition from flat tables with meaningless constraints to true business entities with meaningful relationships. And so the idea is to use documents and graphs to create these real-world entities and have unlimited meaningful relationships between those entities and within those entities. So there's some new indexes we'll talk briefly about and create APIs that make this possible. And so this last thing is kind of controversial. You know, I think hopefully you'll be surprised and not in a negative way at how unnatural and meaningless relational data is. That sounds kind of mean to say relational data is meaningless. It's not that data is meaningless or that relational databases are bad, but actually when we get into this, you'll realize that the meaning is not in the relational data. It's actually in the minds of a DBA or the minds of the developer in the development code. So the application code has implied meaning and tries to apply meaning to the data, but the meaning is not actually in the relational data. So how can we do that? How can we stick the meaning in the data? How can we become more natural in our models? And that's what we're going to talk about today. As Shannon mentioned, I've been in this industry for a long time. I've done relational databases for 24 years. I did years of Oracle DBA work. I've been decades of as a software developer and we've been using NoSQL professionally for nine years. So I'm very familiar with these technologies and we're actually doing what I'm talking about today. And so this isn't just theory, even though it'll sound very theoretical because I have my ABD PhD in music theory. So I'm kind of a theorist, but we've actually had been hands on and doing a lot of this work at the LDS Church. The LDS Church is a large church and we're centralized. We have 15 million members, but we're centralized in Salt Lake City. So we have a huge IT shop, thousands of applications, hundreds of databases of all kinds, relational, Oracle SQL Server, non-relational like Markologic and Cassandra. So we have a lot of experience in all these technologies. So the agenda is to talk about database modeling paradigms, how to get from relational to document slash graph. And I called this the document graph model. So that's why you'll hear me talk about it, it's a document graph data model. And then I'll talk about the advantages of document graph in the end and show you in contrast relational and document throughout the whole presentation. So we get into it now, here's some paradigms. I think there's basically six major data paradigms. We know about relational, about fixed dense tables of flexible queries and joins, it's great for operational applications. We've got dimensional modeling, which is great for analytics and reporting. And those are well known, so I won't talk about those other than contrasting those with the other four models. We have graph, and I want to take a second on graph. There's several kinds of graph databases. In the last presentation, we talked a lot about property graph, which is very mathematical, nodes and edges and vertices. And those concepts are really the same as semantic graphs at the core level. All the graphs have the idea of this relates to that. They have different names for nodes, edges, vertices or in the semantic graph world, you have IDs, and you have objects, subjects, and predicates, so three things. Every graph is composed of these little atoms that are three basic things. This thing relates in some way to that thing, so this relates to that. That's the core of a graph. And then a property graph puts properties on the this, properties on the relationship, and properties on the that. So there's properties on everything. And the model I'm talking about is very similar. It's a document graph, where you don't just stick properties on each part of the graph, you stick a document on each part of the graph. So this is a document, the relationship is a document, and that is a document. In other words, the object, the subject, and the predicate are all documents, and they're all related. And notice in my little example, I've got a surgeon excels at operations, a surgeon performs operations, operations operated on a person. They're directed, but in my case, I want to say each surgeon is a document. The excel that is a predicate, and that's a document. The operation is a document. So everything you're seeing here is actually a document in my document graph model. But in a pure graph, they're just IDs. The surgeon is an ID, Excel, that's an ID, and operation is an ID. Sometimes you have data, so a surgeon has a name of. So a surgeon is the ID, has a name of the relationship, and the value would be the person's name, like Jim. So everything in a graph normally is just IDs related through IDs to other IDs or pieces of data. In a document graph, a document is related to another document to another document. It's the same as a property graph, where a property serverware is just document serverware. The key, so I want you to just make sure you understand the concept that a graph at its core is a bunch of little atoms, these little triples are like atoms. And they're the most, the most atomic piece of data you can have, this identifier, or this ID, or this identity, or this entity. These are all kind of synonyms I'm throwing out there, are related in some precise, meaningful way to something else. So identities are related to precise semantic meaning, that's a semantic graph. And then you can do other kinds of traditional mathematical graph operations on them too, like traversing the graph, doing algorithms on closest neighbors, the closest distance between different identities. You can do all that, plus in a semantic graph, you can do queries. And you can ask, you can do pattern matching. You can say, give me all the fathers of fathers. And you can even name that and say, a father of a father is a grandfather. So give me all the grandfathers. Even if grandfather isn't in the data, if you say a father of a father is a grandfather, if that relationship is a grandfather, then you can pray for grandfathers. It's about querying for patterns, querying for algorithms, traversing the graph with algorithms to find nearest paths, and relating documents to documents. So that's when I say graph, I'm talking about all of those things. I'm focused in this presentation on the semantic graph, on the meaning of the relationships. The next one is document. The document model has been databases like Cassandra, I'm sorry, and like MarkLogic, or MongoDB, or CouchSpace. Those are popular document databases. Graph databases are like MarkLogic's graph databases, semantic graph one. Neo4j is a property graph database. Allegro graph is from Franz as a semantic graph database. There's just different companies that provide these new kinds of databases. The document database combines everything about an entity in one document. So notice that what I've drawn up is a little form, and a form is a document on paper, but you can put that same document electronically, and that's the concept of a document database is to take paper forms and make them electronic. And there's data throughout the whole thing. You can see this is a, this little form you'd fill out if, if you did an operation on a person and you wanted to track what drugs were given to a person during an operation, this little form you could fill out of where did you do it, who's the surgeon, what was the operation, and what drugs were given. It's a little simple form for tracking that information. That's a document. And notice there's many to one relationships here. You have one operation with many drugs, you know, you have, and you have relationships with other things such as the surgeon, you're relating out to the surgeon, you go to the hospital, you have different drugs you're relating to. So, but all that's captured in one document. That's the concept of a document and document databases are really good at storing and indexing and querying them with unique APIs. And a document is very different than a relational because you can have many to ones in the same document. It's sparse, it's variable, whereas relational is fixed and dense. And not dense as in, dense as in data is really tightly packed and very fixed and, and structured in fixed ways. The other things, the other two models I have on here are wide column and key value. I'm not going to go into those into this presentation because I'm going to focus on getting from relational and dimensional to document and graph. But wide column key value are performance models. You do wide column when you have high velocity in jest and high velocity queries. The main difference between wide column and relational is that there are no joints. Everything's denormalized and everything's predefined and fixed, but it's like relational, it's predefined and fixed, but it's just no joints and the queries are querying one table at a time. So you join lots of tables together into one table, you pre-join it, and you query it. But you can get some very high velocity performance. We use Cassandra for this model with great success in our family search website. So if you go to familysearch.org, you can do your genealogy there for free and it's behind it is Cassandra. And it's very fast. So we changed that from relational to Cassandra for that very high velocity use case, which is the perfect fit for Cassandra. Key value, like Redis, is very good for a variety of data structures. The structures are fixed, like you have linked lists. You have key values. You can even do simple documents like you see the hash example. It's a simple document structure. You can have arrays of things. So there's fixed structures where you can put a variety of data in them. It's great for high velocity updates of data. That's its main use case. So we're not going to talk very much about wide column key value, but there are data paradigms and there are ways of thinking about data organizing data. We're going to focus on combinations of things between document and graphs. So for example, we combine relational and dimensional all the time. Almost all of our apps have dimensional models in them and relational models, or we pull the data out and throw into a dimensional data warehouse. So most of our apps have both models and use both models. And they're very different models, even though relational databases can do relational and dimensional. These are different ways of organizing data that have specific use cases. For example, the dimensional, you have facts that are surrounded by dimensions. And so you can query a fact based on attributes and dimensions like give me all the sales in a certain region of the United States at a certain time by certain types of people. Those are dimensions. And I'm limiting the facts of sales to those dimensions. So that's very different than relational where we're normalizing the data so that every update occurs in one, only one place in one or one place. Wide column key value are all about performance optimization. So we're not going to talk about those any further. I'm going to talk a lot about document and graph. Documents, when we combine the two together, you have documents in your database and you join them together with graph relationships. And the graph relationships are data. The documents are data. The structure is data. This is a key point. Everything is data. It's not like in a relational world that metadata lives in metadata tables inside the relational database, but it's separate from the data. In a document and graph database, the data structures are data. And the relationships are data. Everything is data. And that makes everything variable and everything very flexible. And the graph, because everything is precisely defined, all the relationships are precisely defined, it makes everything meaningful. So that gives you variety and meaning. And that's what we're going to focus on. Why do we even care about document graph? Because relational modeling covers pretty much everything, except it doesn't. So it was revolutionary 50 years ago. EFCO did a fantastic thing by creating relational. And I love relational. I've been using it since the 80s. And it's fantastic. But we are in a new revolution. And there's a new way of thinking of data. And this is document graph. A relational has two major flaws. First, it forces you to shred the business identities into multiple tables. You take something, a real world identity is complex. A person is complex. And relational says you can't have many to ones in the same table. So you have to make multiple tables. And you have to join them together. You're breaking a business entity like a person into many tables. And that breaks that action. And the other point is you're doing your intentionally removing context so that identities can stand independent of other identities. And when you remove context, you remove meaning. And so that's a problem. And the other part is relational actually doesn't have relationships in it. I mean, it sounds controversial. But relational databases have constraints. And a constraint is a business rule that says how many of things can you have? And you can't have this if you don't have that. You can't have this row in this table unless you have a row in that table. That's not a relationship. And that's a constraint, a rule. And the name of the constraint is there for the DBA. It's not there for meaning, a semantic meaning. And the human might interpret some meaning from that name, but it's not meaningful data. It's a rule and a name for management purposes of the data, not a name to describe what it is. The table names are there to try to describe the meaning, but they get 30-some characters and you abbreviate them. And only a human can interpret the abbreviations in a meaningful way to try to imply some meaning in the data. And you can add comments to columns and tables, but that's implied meaning. There's nothing in the data that says this has this precise semantic meaning. Therefore, everything in a relational database has implied meaning. And the only way you can actually take the shredded meaning, which is shredded into multiple tables, is you have to join it back together into a query. And now you can create a new context for that shredded data to create some implied meaning. And the application takes that SQL statement, brings some context into it, and then writes code to interpret the meaning and do something with it. So the meaning is never in the data. The meaning is not in the app. It's only in the human mind of the DBA and the human mind of the developer. And what I'm proposing and what graph does, what semantic graph does, is put the meaning in the data. It changes everything. So it improves document graph modeling because you can create an entire business identity, like a person or like a product or a vendor, a company. Any identity you can think of or any transaction you can think of, that is a business identity. And it's a single document, one document per business identity. And then you connect those business identities together with precise semantic meaning using semantic graph. And now you have a business model that represents the real world with true meaning. So let me get into exactly how that works. This is an example of an XML document combined with RDF graph. And I'm using EFCOD's document that you wrote that transformed the world, that changed the model from hierarchical mainframe databases into relational. And so this is a, I truly think this is revolutionary 50 years ago and I love it. This document is a story. It's a narrative. The document itself is highly structured and I have a pet peeve just to mention briefly that people say they're structured data and unstructured data. That's absolutely false. The fact that you say the word data means they're structured. And documents are full of structure. You know by reading this, you couldn't even read this if there wasn't structure to it. If there wasn't sentences and there wasn't sections, if there wasn't words, there wasn't structured like nouns and verbs and objects and subjects. You could even read this. It's highly structured. If you've ever been a writer, you know, well, you've been through school, you know how hard it is to write because there's so much structures you have to follow and learn. Human structure is very complicated. You could say there's simple structures and complicated structures, relational is somewhat simple compared to a document model like this is extremely complex. So that would be more accurate. This is highly structured data, but computers don't handle this complexity very well. So we have to mark it up. So we go and we say, hey, this title is a, I'm marking up this sentence here, a relational model of data for large shared data banks is a title. I'm saying you have caused a person. I've done research laboratories and organizations located in San Jose, California. This is a publication, information retrieval, this journal, and it was published on June of 1970. That's a date, and that event, it's an event. Inside the text, I say here's topics. These are positive topics like these are goals. Program should be unaffected when the internal representation of data is changed, which by the way, as I know, we still, relational did not solve that problem. Even though that was the goal, our programs are always affected when we change anything in the relational database. So that never was achieved. And that's in the fallacy that people think relational solved that problem and never did. And it cannot. So that's not part of this talk, but that's a fun little point. They're negative comments, like tree structures are bad, hierarchies are bad, networks of data are challenging. He tried all these things to solve the problem and he ended up saying, we can't go to this mainframe hierarchies. So let's go to relational to solve the problem. We can't solve it with other kinds of hierarchies or networks. The problem back then was they didn't have the types of indexes and concepts and APIs and memory and compute power that we have today. So yes, in 1970, he couldn't solve the problem with hierarchies and graphs. But today we can't. So this is obviously a dated paper, but he was correct. When you have a hierarchy and you nest orders within customers and products within orders, so you have this hierarchy of customers to orders, to products, to vendors, you can create and get all the data about what customer ordered what, but you can't create the opposite. Hey, what vendor sold what? Because the vendor data is nested deeply inside the customer. And so that's the problem he was trying to solve is he said, hey, let's break it out into tables. We solved the problem. That is a very true statement, but there's another way where we can have documents and graph and it's like cake and ice cream and you have both. So we're going to, we can move beyond this limit that he's talking about in the paper. But you notice how I've taken the story and I've marked up the data. And now when I do that, I get contextual information. Data inside of text is contextual. In fact, the word context needs with text. So anytime you identify the data, inside of text, it's contextual. So I've turned data into information, but not just any information. Information in a context. So a database that knows how to query and search text with data embedded in it can do powerful things. Now the computer can say, hey, who wrote this article? What was the, what was the journal it was published in? When was it published? All these data points are marked up for the computer. Now the computer can search and query it. And we merge the concepts of searching query because you can search for words, you can search for data, and they're both the same thing. And that's part of what a document database gives us. But there's more because we can now find relationships between the data points. So I can say, EF caught is the author of the title of this article. I've been research laboratory is the publisher of information retrieval and published this article. I've been research laboratory is geolocated in San Jose, California. Information retrieval is the publisher of the article. June 1970 is the event on which this is printed. And I'm giving you a high level example to try to make it human readable. But these, these triples, because notice everything's a triple, this relates to that. The predicate, which is the printed on the publisher of the article published by those predicates have to be very precise. I'm being very general on a presentation. You want to make sure that it was, how was printed? Is it was it printed, was it printed electronically, printed in print and form, in a magazine, in a journal. If you want general predicates, you can say that. And generally it's the concept of printing. You can make sure it's a very general principle. You can make it very specific. And if you have, if you have both, you're going to have two predicates and two triples. You're going to have one triple for the generic concept and one for the specific. So this, the level of meaning you want to communicate has to be in the exact semantic relationship. The predicate defines the meaning. Notice that the data, there's data here, but it's the predicate, the relationship that defines the meaning of the data, not the data. And in another way, when I was back in the pure document forum, it was the context of the data that defined the meaning of the data, because the text had the meaning. The data in the text was in the context of the meaning. Now I'm saying the semantic triples are defining the meaning of the relationships. It's another way of defining meaning. So we can keep going through this and going, there's related topics. There's problems. The red topics are problems with, with mainframe data. The green topics are the solutions. And then there's another kind of relationship. It's structural. The data is in a hierarchy. And the hierarchy has meaning. So structure has meaning as well as, so there's semantic meaning and structural meaning. For example, access path dependence is, it doesn't mean a whole lot until you put it in context of the structure that says data dependencies and present systems. And data dependencies has meaning in context though, relational model and normal form. So if I start from the top and go to 0.1, relational model and normal form, and then the topic area, data dependencies and present systems, and back down to access path dependence. You can see the structure is giving context to each item in the structure. And we do that in everything we do in humans organized by putting things in structure so we can give them context. And the structural context is implied meaning. Now implied meaning is not as good as semantic meaning. And so we're trying to, I'm trying to bring all of this together into a graph document model that brings semantics and structure together so we can have meaning. So when you do this, when you take data and text and relationships, semantic and structural, you get meaningful knowledge. So the purpose of this whole presentation is to show how we can have, take relational data and turn it into something that is more meaningful, more powerful, more intuitive, and more natural. There's two types of document models that are really popular in the industry. One is XML and one is JSON. XML was designed for text. So as you can see on the left, the text is in yellow, and I marked it up with the blue. It's very complex. We could spend half hour just talking about how powerful and rich the XML markup can be because it was designed to mark up content like text in any way you can imagine. So on the right hand is JSON. JSON is the opposite. JSON is structure. You can see the white is the data structure of the JSON document. In XML, you start with text first and you mark it up with tags. In JSON, you start with an object model and you throw data into it. So JSON is ideal for data where you're data first, XML is ideal for text when you're text first. So you can have both in the same database if you assume your database can support it like MarkLogic can do that. And then you can use JSON for data, XML for content, and you can do the opposite. You can put content in JSON, you can put data in XML, but they're designed for different use cases. So really a document graph model has to understand that there's text content, there's data content, and there's graphs between it all. So since RDF graph in XML enables to turn content into meaningful knowledge, can we do the same for data? Can we use graph, semantic graph, with JSON documents which are data oriented to get meaningful data? And the answer is of course we can. If you have a database that supports semantic graph and you have a database that supports documents and they can be transactional and the engines work together, then you can create, you can do the solution. And like Microsoft Cosmos Database has done this, MarkLogic does this, OrientDB does it, Cassandra, even though it's oriented toward a different data paradigm, it has some document capabilities and it has a graph engine, you can even do it in Cassandra, data stacks Cassandra, not pure Cassandra open source. Anyway, there's vendors that have a database that can do this. We're using MarkLogic to do this model. So you can see your JSON document is a business entity. That's the number one and most important point. This is a customer JSON document. The document contains all related tables, if you're talking relationally, that are required to take all the tables in a relational model that represent one business entity and that is a document. For example, it's like a row in a person table, but a person table is going to be joined to the emails that are associated with the person, the email addresses, the phone numbers, the ad physical addresses, the mailing addresses, all the many rows that are joined to the person object, there's a person table. That is a document. You take all those rows together from all those different tables, join them together and that's a document. So a graph is between documents. So here is a customer document joined to an order document or many customers making many orders. So notice that this is very much like a business level model. If you're doing business modeling, you're going to model customers and orders and they're related by, they have relationships between them. So in a document graph model, it's the business level because we're dealing with business entities and business transactions. An order is a business transaction, a customer is a business entity and customers make orders. And then orders have products in them and products have vendors. I can associate documentation like an XML with these JSON data objects by saying the shipping instructions for this vendor and this product are in this XML document. I can say a customer invoice for this order, annual reports for the customer, their tax forms are in XML documents. I can say the customer's order documents and invoices are in XML. I can have product manuals and vendor order forms. Everything can be in XML that we deliver to customers as reports and documents. We also, that our data is in JSON, our contents in XML, and graph connects it all together. We can also do miscellaneous things by, hey, this customer like this product. We can say, hey, this vendor received an RMA on this product. Notice that there's no limit to the relationships between entities and transactions. You can have any relationship between any document to any other document for any purpose, for anything you can imagine. And it's all data. Everything on this screen is all data, so you can change all of it at runtime. You don't have to define it up in advance in schemas. Not a thing you shouldn't, you should have minimal schemas that define the minimal required set of data you need. You should have schemas on your relationships so you define the exact meaning you want. So you do have to think and you have to model, but you can connect anything to anything in any way you want. There's no boundaries. In relational, everything is boundaries. You have a table, has a foreign key, and that foreign key implies it works with these other tables. And so it's all implied. You don't have to create rules around it, but these are explicit relationships. Real data connecting real things to each other. So this gives us a way to model the real world, the complexity of the real world because the real world is not fixed and predictable. It's full of variety and variability. So for example, relational doesn't do that very well. It's all fixed. So you have fixed deal types, you have fixed rows, fixed structures, fixed sets of tables in a schema, fixed constraints between tables. Everything's fixed. Everything's at design time and you have to put data into this fixed structure. And if the structure doesn't work with the data you want to put in it, you just can't do it. Or you break the structure and you cheat. And everyone knows that end users stick data where it's not supposed to go because they have to capture it somehow and the model just won't let it go in. That's not true in the document model. Everything's flexible. You can have a document could be any type you want it to be. There's no table that says this document must be this type. A document can be one or many types. A document could belong to different collections. A collection could be thought of as like a table, but there's no limit. A document could live in lots. This operation document attracts drugs and operations could be in transplant collection or operation collection. It could be in a surging collection. It could be categorized things. Think of the flexibility you have here. You can categorize a business entity in any way you want. It's not one-dimensional. It could be anything. Operations is an object. And instead of operation, you have properties, key value properties. But notice when you get down and administer drugs, it's an array of objects. You can nest objects with an object. This is the same thing as a mini-domini join with an associated entity table. You have a sub-document inside which is an array. That's a mini-domini relationship, and it's simple and easy. So I can have variables, sparse collections. I can have variable data structures. I can have nested tables, nested mini-dominis. I can have variable data types that may be bad, may be good. The real world is messy. Drug dose size in one system could be a number and it could be a string in another system. It could be a floating point in one and it could be an integer in another. And so the real world is complex and a document database can handle the complexity. Now normally, like I said, you want a minimal schema that says if we're going to do good queries on this data, we want a minimal structure that is consistent. So we may not want to do what I just described, but we may not have an option. Or maybe we take the data in as is and then we transform it into a standard structure. Documents, you can take a document and bring it in just as it is and then you can make it better. In relation, you can't even put it in if you don't have the structure to put it in. You can have sparse properties you can add and so you have sparse data. You don't have to have it all predefined in the schema to collect whatever data you want to collect. You can have variable relationships. Now down below here is an array of triples. So I wanted to show you this is a graph model. You're looking at a document containing graph. So this is a document graph model. You're looking at it. It says subjects, predicates and objects. So the subject in this case is the document ID. This is ID number one. The subject ID number one. I made up these predicates. They're not very good, but they're simple for the presentation. So this is saying this document which is an operation occurred in this hospital. So it's a foreign key. It's saying this is a foreign key from the operation to the hospital. The hospital ID document is number ten. So I'm saying document number one is related to document number ten which is a hospital operation predicate which is the relationship. Now in a real world you would make that very precise and you'd say this is an operation performed in a hospital and the hospital is a medical facility. You would define this incredibly precisely so that there is no question of what the meaning is. Notice how this is not just a foreign key to saying hey, there's a hospital relationship here. The predicate defines the precise meaning of the relationship and I've embedded that relationship meaning in a document. So now you can see there's a definite relationship between all this contextual data and all the other data that might be related to it. So this is very relational. It's not the relational model, but it's very relational. It's graph. You can have sparse denormalized properties but denormalization is completely optional. For example, hospital address is something you might add for performance reasons or for search reasons for better search resolution because you might search on hospital addresses in context of operations but it's not necessary. But notice that this is a foreign key. This operation hospital is a foreign key to the hospital document and I'm pulling hospital data into this operation only for denormalization purposes. I might do that in relational too. That's a performance optimization or a search optimization completely optional. When you do everything in the document graph model has to be using precise ontologies with precise meaning. So when you're modeling in graph you are modeling relationships. There are ontologies out there, vocabularies, and you can go to linked open vocabularies and you can see all these different vocabularies like there's a friend of a friend and it defines all these relationships between people. They're well defined and well understood. Computer programs can automatically know what the meaning of something is and automatically process it. You don't have to write that piece of code over and over again. The goal of graph modeling is to go out and find all these pre-existing ontologies that are public and defined and well known and bring it in to your data. Use them in your data. The other part is you can go out there to governments and they publish their data using graphs and you can see to find meanings and you can just pull that data straight into your graph database, just load it and then you can combine that with your data and you can create meaningful queries against it because the meanings are well defined. You don't have to guess that, oh yeah, P-R-E-N-A-M means person name. You don't have to guess. There's no inference in what the meaning is. The meaning is precise and well defined. So when you're modeling graph you have to do this. You can do this in your own industry or make up your own and well make them well defined and then use them precisely and then you can get to meaningful knowledge. You can turn data into meaningful knowledge if you do this. So here's how you get from relational to document graph. You take a relational model like this North Wind model. This is a very dumb down sample database. It's actually the North Wind database and you have customers, shippers, orders, details, products and suppliers. That's not very real world but you can see it is kind of broken up. It's two tables. You have some status and order detail status. You have four tables to represent a single order transaction but in the real world you would take a customer, just the customer table you see here and it would look like this. This is a real world customer model. It has 13 tables in it because relational requires multi-valued property to be a separate table. This is a single business entity but we have 13 tables represented and you have to write to map them together to treat them as one unit. All the business rules and constraints and all the work we have to do. This is the exact same customer table. The same model you just saw with 13 tables in one document. You're not seeing the whole document because it's a little bigger than my screen as room to show but you're seeing two thirds of it. This is the same thing and notice if you look at it for a minute which won't have a lot of time to study it, it has everything about the person in one document. It's shredded and hard to understand unless you understand the data and the model. You have to recreate the model in your mind to understand what really is happening. That person is a document. A single business entity is a document. This is the customer order model that's like Northwind and I've modeled it in a real world relational model. It has 43 tables. This is the same original 13 tables in a real world situation. In the document graph model you're looking at it. It has customers, orders, products and vendors and you have graph relationships between them. There is a 10 to 1 reduction in complexity and notice that you're aligning the business model with the physical model. In the relational world it is so out of it is so different from the real world model that we have to go through three layers to get from business. We have a logical model and then we have to do a physical model. That's what it takes to model relationally. In the document graph model it is always at the business level. The business entities match the physical entities. You're not having to go through multiple layers in all the confusing and complexity and you get a reduction in complexity in the process. So this is the same 43 table model in a document model. You're looking at it. People would say, yeah, you embed the order inside the customer and you embed the product inside the order. I mean, for everything you can embed the entire database in one document if you wanted to. That's bad. I'm not promoting that. That's very bad modeling. I'm saying each business entity, the person is separate from the order, the order separate from the product, product separate from the company which is the vendor who delivers it. And then you have lookup tables. You still have lookup tables. All that complexity you saw. Notice how the document graph matches the real world. The complexity of the real world but it simplifies that complexity. The same way our brain organizes it with documents and graphs. That's how our brain works. So we normalize in relational. We actually quote, unquote, normalize in document modeling. When you normalize in relational, the idea is everything that belongs to the key is in the table. And nothing but the key, that table unless it belongs to the key. The same thing is true in the document graph model. Nothing belongs in a document unless it belongs to the identity of that document. But because document allows you to have multi-valued relationships, multi-valued values inside the document, you can bring all those tables back together where they belong. So everything that belongs together is together unlike relational. You denormalize if you want to. You don't have to but you can use graph to denormalize. What is the triple concept is stored in indexes in these databases. So there's what they call a triple index. And the triple index is a powerful tool for denormalizing. You put any data you want to share between entities. For example, a person's name is a piece of data you would share between all kinds of things. You want to know the person's name who ordered the document. So you want to put the person's name in the order. But that's bad because then you have to update two documents when you change the person's name. And the triple index, it's there for any join with any document. So now I can join, I can have the order and look up the person's name in the index. So graph is a powerful tool to denormalize without doing the horrible things of copying data everywhere. You can either orthogonalize in relational or you orthogonalize in graph. I don't have time to show details but it's the same concept. We generalize in relational and notice I subclassed the customer out of the person. I don't even know that so I told you that and I even put names customer on it. I tried to describe it but it's all implicit. I have another generalization on the right-hand side versus person companies. I have generalized the relationship between persons and companies. The only reason you know that is because I'm telling you that. There's nothing in this model that says this is what I'm doing. But in the graph, document graph model, it's explicit. This person document is also a customer and I'm telling you it is by putting a schema for person, a schema for customer and I also have this generalized company relationship and I have a schema for it. I can subclass explicitly in the document graph model so it's much easier to understand the data. So subclassing is beautiful in document graph and it's always implicit and you have to look at the logical model to figure out subclassing and then you have to look at physical in relational and you have to figure all that out. In document it's just right in front of you and you see the data and the structure all together so you can figure it out. Some advantages of this is that you're connected directly to the business. Every entity is the business. Every relationship is a true relationship that is data with precise meaning and that's exactly what the business wants. Some other pro is simple, easy data types. Unlimited strain length. You don't have Varchar in Oracle and SQL Server with 4,000 character limits. You can have megabytes. You don't have to go to a separate cloud or blob to store those kind of things. Strings are long. Names of things are long. You can have a megabyte name of something. Probably don't usually do that but you can. You don't have to abbreviate down to 30 characters. A number is a number in some of these databases. In JSON databases the number is either integer or decimal. You can be either one. You can put schemas on it to limit it. So it's simple. You don't have to say it's this big and it has so many decimal places. It's the number and you can have data types like sub documents like different objects inside of objects. Person phones is an array of objects and each phone is a different purpose. Notice the purpose is an array so you can have many purposes for this phone. When you model this in relational it's very complicated. It's very simple here and it's all built in together. Everything is data. Your relationships are data. Your structure is data. Your data is data. Everything is data. Because everything is data everything is data. You can write queries to query structure. You can write queries to change structure. You can write queries to do anything. Everything is queryable. You don't have a separate metadata database that tells you what the metadata means or when nothing tells what it means. In relational you just have a separate metadata database that tells you here's the metadata about the structure. It's in the documents. You can see it. You see the data with the structure and you have time to change on the fly. You can just write a query to transform the data structure and now you have a new structure. You don't have to remodel your database and create a new schema and then migrate your data to it. You just go write a query and transform it in place and you're done. You have the multi-value thing which is a huge thing. On the left here I'm showing you an array of addresses. Each address has a different purpose. In a document model, look at the right. You have an entity that connects a sociability that connects the persons and the addresses so I can have a mini-to-mini. A mini-to-mini relationship in a document is so easy you don't even think of it as a mini-to-mini. It just looks like natural. But in relational it's complicated and so you can see the simplicity on the document side versus the complexity and the variability. I can create... I can change this structure on the left any time I want. It's very powerful and very simple. If I have to change the structure in relational it's a major application change. It's a major change to the entire system. So if you want to get the EFCOD's goal of reducing the impact on applications when you change structure the document graph model gets you very close to that. You still can't get away from it completely but it's much closer than relational. Mike, as awesome as you're doing you've got about five minutes left. Perfect. Good. Well, I think I might make it. Jason Data can be multi-typed. I talked about that. But here's an array of objects. Each object is a different type. The first object is the payment method for a credit card. The second object is the payment method for a checking account. That's very simple in JSON because I can have an array and each item in the array can have a different type and the code can just loop through it and it's a checking account. I'll do that. Very easy in development to use this structure but in relational look what I have to model. I have to say I have to have two associative entities, one for each type and I have to have them look up data and other kinds of entities that map to the type. So it takes me five tables to do what's very simple to do in JSON and it's intuitive to developers in JSON. What I have on the right is not intuitive to developers. You have to have someone explain to you what you mean and what you're doing. And the last slide is there's I want to summarize by saying document graph captures the meaning of text, the meaning of data, the meaning of semantic relationships and structural relationships within documents and between documents. Inside documents and outside documents and you can do it with text, you can do it with data and you can create meaning. This is what I would call true relational modeling through relationships between data and the relationships have meaning and you can create the relationships because they're data. You can create the patterns in the relationships and you can create within documents and between documents. And so this is the next generation of data modeling and it takes us to a whole new level of business purpose and business meaning. And you can do all the cool things where you can traverse them. You can do all these amazing algorithms and you can't do that relational. And the graph indexes are designed to recursive joins which relational is terrible at recursive joins but in the graph as you can imagine it joins anything to anything all you're doing is recursing. And so the graph index is designed for very fast recursive joins across trillions of relationships. So this takes us to a whole new level and I don't know what God would think today but I would guess that he would love it because it takes us closer to where he was trying to go to get our data to have meaning and to be createable in any way you can imagine without breaking applications. So there it is. Mike, that's amazing. Thank you so much for this great presentation and just to let you all know just as a reminder I'll be sending out the follow-up email on Monday with links to the slides and I'll be recording the additional Q&A if you have to jump off, totally understand but let's just dive into the Q&A really quick. Mike, way back to slide seven for those portions of document content which would be reused in multiple documents. So how do document graph databases manage the integrity of their reuse content? Oh, fantastic. Again, integrity is a very important point because you're so flexible you could break everything by not having schemas. The goal is actually to have schemas. So you would take a document model like this and you can create a schema for this. Now you could be like relational and make a structure that requires everything here to be here and you could define the data types for everything the subtypes for everything the structure of the arrays the objects, everything you see here can be defined in a JSON schema or you can be or you can be more flexible and you could say all I really need is the hospital name the operation type and the surgeon and then the administered drugs and then for the administered drugs all I need is the name of the drug and the dose size and the and the unit of measure. So you could create minimal required structure or maximal required structure and you have the total flexibility with JSON schema or XML schema to do that. So it's not like I'm not trying to say we don't have schemas. I really believe we should definitely have minimal schemas to define the things that must be present so that we can create them. But we don't need them to find maximal schemas because we want that ability to be flexible and add extra data when the customer says hey I happen to know this can I stick it in here and that's easy to do. It's easy for programmers to write code to do it and it's easy to store it in the database. So you don't lose data and you don't have to do that. You just cram more data in some string field somewhere and then try to use that later. Applications go crazy. The data quality goes crazy. So this allows us to have data quality and structure and schemas and be flexible when we want to. And then moving on actually to the next slide you know to slide 8 how to manage the integrity of drug manufacturer if it is embedded in the table there. In other words how to add a drug manufacturer that is not duplicate or discrepancy and so how not to add a duplicate. What about identity resolution? Yeah. So what you do is in my model here on the right-hand side I've got this look at these lookup tables. You can have as many lookup tables as you want or lookup documents. And in this example I've actually took all my lookups and put them in one document because I can and it actually simplifies if they're small like this or I can break each one of these the order lookup ID and the product order status into separate documents. I still create lookups so I want to ensure that the manufacturer of a drug is from a list. I can do that. In the case of I don't have an example showing here the manufacturer of the drug but really a drug manufacturer would be a separate document. So that would be more than just a little lookup thing. You would be joining the drug manufacturer documents to the operation documents so that you would then create in your schema for JSON a list of predicates that you can use to join so that you can ensure that the meaning is constrained to the proper meaning. So you can use schemas to constrain items and lists, objects and arrays, relationships between different types of documents. You can use schemas to constrain all of that. You can use triggers in your database to enforce those. There are most of these databases like Markalotic has triggers so you can do that or you can do it through schemas. So there's a lot of options. Now I will say this, these new document graph databases are not as mature as relational because they haven't had 50 years to mature. So everything I'm talking about in Markalotic, for example, you might have to create triggers for some things whereas in relational, you might just define a constraint and it really is a trigger internally in the engine that does the same thing. So you can do it all. Sometimes you might have to write a little extra code if you want to enforce something. But that's possible. And we do that. Certainly. No, Mike, what you describe is inherently denormalized. They're caring with presenting about hybrid environments. Aren't there times when normalized relational models with its inherent rigidity is useful to have in addition to a graph document model because when you denormalize, what process do you have to have in place in order to ensure all those JSON documents get uploaded? I love the question because I would say that document graph is every bit is normalized, it's relational. And then I would go into further and say relational normalization is artificial and document graph is actually true normalization that's based on the business. For example, if you look at this slide, the person document, everything in here is quote unquote normalized so that everything references the key. So the person ID 11 for me is everything about me. That is completely normalized. Now, when you have multiple phone numbers, multiple addresses, multiple emails, multiple payment methods, those are subtypes that are every bit as structured as those tables in relational. In fact, you can turn this right into a relational structure because you can see there's a many known relationship. There's an array of phones, that's a phone table. There's an array of addresses, that's an address table. There's an array of emails, that's an email table. So this is every bit is normalized, is relational, but it's better because what belongs together is together and everything that belongs to a key in the primary key does belong to the primary key. I'm not artificially having to put it into flat structures just because that's all the technology they had 15 years ago. I can now put them in the right structures nested so that I truly am normalizing my data. So there's nothing not normalized about this. The other part is denormalization. There's nothing denormalized here between order and person and product. Notice there's separate business entities and separate transactions. They're entirely normalized. I'm not sticking orders in person. I'm not sticking products in orders. I'm sticking foreign keys through triples to them. So I'm not sticking all the product data in my order. I'm just putting a foreign key to that product data. So that's exactly what we do in relational. In that shell, I'm doing the same thing in relational. I'm just doing it better because I put what belongs together together. I love it. So I think we have time to get a couple more questions in here for analytics. Which data model would be the most efficient? I love that question. You can do analytics in a document graph model. If you're trying to do star schemas, the document graph model allows you to do the exact same model you do in dimensional modeling. I have a document for my fact table for my fact row. So one fact row is one document. And then I have graphs out to all the dimensions. So one row, one document for every dimension row of every dimension. And I have one row for every fact. And then I have triples connecting each fact to each dimension it belongs to. It's just the exact same model. You can do that same thing. I will say the graph document database, like MarkLogic, are not optimized for that use case. Unlike relational, which builds a lot of optimization in their engines to make that perform well, this model has not been optimized yet by the document graph databases. And that's saying it performs terribly. It's just not as optimized. You could do another model which I don't like. You could take a document, could be an entire fact with all of its dimensional data. For example, if you had a fact about the number of the amounts sold in a transaction and you connected it to the person and the place and the time and every dimension about that, the fact that this item was sold, you could take all of that in one document. That would be incredibly fast. It would be way faster than relational databases. But your data would be enormous because you'd be repeating all that dimensional data in every document over and over again. So you're definitely trading space for performance there, but the space would be ginormous. So I don't recommend that approach. So that's a little challenging. But if you're not doing dimensional modeling, you can do a different kind of analytics with document graph that you can't do in star schemas. This is really exciting because I met a month ago with a bunch of data experts and it's about a data-first initiative that we're doing. It's fantastic. These guys are doing trains with triples and they're all semantic and they're doing medical predictive analytics using this idea of a knowledge graph that I was talking about. And there's another company doing the same thing for their product sales. And they're doing it when you have meaning in your data, not implied, but graph meaning with true meaning in the data. Then you can do predictive analytics in a new way, not just with statistics, which is looking at averages. I don't want to demean statistics. It's very powerful and then you get great insights. But instead of looking at implied results as what statistics is doing, you can look at specific, precise meanings and you can do what human brains do. You can reason over true meaning of the data and say, hey, we see this pattern because the pattern is precise and meaningful. You can query meaning and you can find patterns or relationships and produce better predictive analytics. And we have, I know people who are doing this with this exact model, but what they're doing with pure graph model, I should say, not document graph. Document graph is taking it one level further. But they're doing with pure graph and doing precise analytics that you can't do in start scheme as you can't do in relational. So when you combine that ability to do amazing analytics with semantics and you combine it with the power of documents which matches the business, then you can take this to whole new levels. There's so many patterns you can use with this. We don't have time to go into them, but you can do analytics way beyond anything you can do today in relational. Mike, thank you so much for anchoring today's event with such a great presentation. Just absolutely fabulous. And thanks to our attendees, especially those who've been hanging out with us all day and being so engaged in everything that we do. We just really appreciate it. But I'm afraid that is all the time that we have for today. And again, thanks to our sponsors for helping us make everything today possible. And I hope everyone has a great day. Again, Mike, thank you so much. And to everybody, thank you so much. I hope that you enjoy the rest of your day. Thank you. I hope. Thank you.