 Hello and welcome. My name is Shannon Kemp and I'm the executive editor for Dataversee. Today Karen will discuss cross databases. Where do we do the modeling part? Just a couple points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar and we very much encourage you to chat with us and with each other throughout the webinar to do so. Click the chat icon in the top right for that feature. For questions we'll be collecting them by the Q&A in the bottom right hand corner of your screen or if you like to tweet we encourage you to share highlights or questions via Twitter using hashtag heartdata. As always we will send a follow-up email within two business days containing links to the recording of the session and additional information requested throughout the webinar. Now let me introduce our speaker for today Karen Lopez. She is a senior project manager and architect and info advisors. She has 20 plus of years experience in project and data management on large multi project programs. Karen specializes in the practical application of data management principles. She is a frequent speaker, blogger and panelist. Karen is known for her fun and sometimes snarky observations on data and data management. Mostly she wants everyone to love their data. You can follow her at Datachick on Twitter. And Karen, hello and welcome. Hi Shannon, thank you so much for that and thank you for being here to help make this happen. I also want to thank everyone for showing up today. Don't know what the weather is like where you are. Well I know we're a little bit for those of you who shared in the chat what the weather was like. But today my voice is a little weak so I'm going to keep trying to try to speak up but if I get too quiet just someone put something in chat and say you can't hear my voice that would be helpful. Okay so we're going to talk about graph databases a bit today. And please do tweet your comments live tweeting about this. Always love that. So we've been through my bio but one of the things I wanted to point out is graph databases are really new to me. I don't have clients who are currently implementing them directly. I have in the past but I think one of the things I'm going to talk about today is sort of the maturity level of this specific type of no sequel database. So normally I do a poll here but I've got a lot to cover and I'm going to ask you some questions as we go along instead of doing the formal poll. So don't forget to put your formal questions that you'd like for me to answer in the Q&A and then chat with each other and potentially chat with me in the actual chat box. So those are two separate things and yes they'll be slides distributed early next week and they recording. So why this topic? Well I started I've done a few topics through the series about no SQL resources so non-relational or features within relational DBMS that expand beyond the relational world. And there's a lot of I just got back from EDW which was co-hosted with no SQL now and so I thought that was a great move because now we had mixed audiences of people work some working with traditional relational databases both in the transactional and the data warehousing or BI world or analytics and also people who are living and breathing and doing lots of work in the no SQL world. One of the interesting things that has happened in the market for no SQL databases is if I've given this presentation last year there'd just be a handful of databases or data stores or database engines that one could use with graph and we're going to talk about some of those. But something that's happened in the last few months of the last year is that more and more companies have either announced new graph databases to be downloaded and installed or graph database services so databases as a service and we're going to talk about some of those as well. I think this is a key indicator that if we have our traditional relational databases adding graph features if we have sort of the big technology software companies announcing either graph databases or graph services or graph services that sit on top of their other technologies that this is a sign that the enterprise world that it's time to be thinking about what sort of data stories you have that would might better fit into the graph world. So I'm going to spend some time doing the introduction to graph stuff as well as talking about how we've solved that problem in the relational world and how we still can do it as well as how we're going to end up maybe not using our relational databases and some of the options. I'm not going to be doing any demos today but I have some screenshots from a couple of graph products and I'm going to talk about two major approaches to graph databases and the whole point of this is not to highlight those products but to show you how it differs than say traditional SQL queries or the ways that we interact with data and then of course every good speaker leaves you with some resources and the reason this is important is that I believe that modern data architectures and therefore modern data architects and modellers will be have expertise in hybrid technologies and that includes both relational and non-relational. So I already work with data architects that work with non-relational technologies like let's say IMS as an pre-relational system or XML which is a non-relational both data store and messaging system and standard that a non-relational system and those that will be taking a look at are just a representation of some of the ways that our jobs are going to change. So I'd like to clarify some terminology in that so there's graphs, graph is a math concept it's a networking concept there's the social graph there are there's graph theory and we have graph databases typically we use that to mean any sort of technology solution that has graph like thinking to it and graph like querying to it but typically today when I say graph database I mean one that is natively storing data as a graph in a couple of major ways then there's graph processing so we can do graph processing on top of all kind of data. So if we look at the major pieces of what no SQL means if it means not only SQL is that there's relational there's graph there's column there column family key value document there's also Hadoop there's also hybrid versions of all these things and these are not necessarily types of databases because what we're seeing now is more and more systems are implementing a couple of these things together so as an overview of sort of some graph theory so one type of graph database is a property graph that if you think about it a graph has in traditional theory about graph theory we have nodes and those nodes have relationships so the most common graph that you've worked with that you work within your own life and everything is a hierarchy but I'm going to talk a little bit about hierarchies in a minute. A hierarchy is just a specialized form as a graph so if I have things that are related to each other a hierarchy has special rules so typically a hierarchy you know in a what we think of a hierarchy we think of it as a tree even though that's not the official definition so we think of things having parents so we think about family hierarchies we think about departmental hierarchies we talk about organizational hierarchies any of those things graphs can be directed or undirected so directed meaning the relationships between the nodes have a direction that's meaningful or things can just be related so if we look at the hierarchical structure on the last so this might be a typical reporting structure we have generals and colonels and captains and sergeants and privates and this is normally how we'd represent position or job hierarchies or reporting up opportunities at work but I have always figured out I've always dealt with hierarchies is that they are so rare we want our human psychology we want our world to be ordered we want to be able to say you know a captain always reports to a colonel and a colonel always reports to a general but what we find is in the real world people tend not the real world doesn't fit in a clean hierarchy unless that real world is something we construct from a hierarchy so if you think like the classification of species or certain library and record keeping systems that are you know a structured hierarchy to them the reason we're able to do that is because we make the rule first and then we wedge everything in but that's when we find like in a hierarchy of species is that things really aren't that strict so we have this definition of mammals of animals that have live birds who breast feed who have in those skeletons I can't remember it's been a long time since I was in a biology class but then we have things like platypuses or other things that don't have live birds but we still consider them mammals because they don't fit into this structure that we applied to it so we find in the real world that in a reporting structure we also have matrix reporting or secondment so we have people sharing positions or any of those things so we end up with these also reports to lines and the way we typically do that in our HR database is we might create another table for all these exceptions because we've built and I'll show you in a little bit a strict hierarchy to it then the other thing we have to deal with when we think about hierarchies is there's all these types of them there's binary trees where every node is split into exactly two nodes so one node only has one parent and one parent only has two child nodes and then we have we can have binary trees that have to be balanced so that if we add a grandchild we can't add great grandchildren so we've added children so that's typically a bee tree or something like that and we deal with this thing called ragged hierarchies which is a huge problem a huge it's not a problem it's a challenge let me see if I can move that next time it's a challenge in dealing with relational because it means to go find all the leaf nodes all the nodes at the bottom we end up having to we end up having to scan through every single row in a table to go find all the way down and that's not a bad thing it's just a costly thing so the other thing about hierarchies is so if we have a typical component breakdown that we have vehicles that are made up of engines and entertainment systems and then we decide to introduce a new level like an injection system that means sorry that we end up creating a new level so the fan and the fan belt people get a new sibling called injection system and now we have to move everything down and when I get to the data model yeah those of you I mean most of you know how that we implement a hierarchy this inserting a new level in the middle and moving everything down ends up moving a lot of data so how we model in relational so the number one way we model a hierarchy or even a network in relational if we do a recursive relationship also call this self-join some people call it a dog year or a mouse year if the relationship if it's a really a bill of materials relationship which almost all of these are which means a component is made up of components and those components can also be part of other components where we end up having this many-to-many relationship that's what we implement but when we try to implement a purely recursive thing where we say an employee is managed by another employee the way we build that is by having a foreign key that points back to the same inch here same table and what that that's why it's called either a mouse year dog year or recursive I think we've come up with a lot of different ways of doing that but it also means that the guy at the top who doesn't report to anybody it means that you know we have to create this oh sorry to create this no we have to say the CEO reports to nobody but what that also means is we could end up with somebody we could end up with someone down below also not reporting to anybody just because the data is missing and then the other thing that happens so we have employee number one who is the CEO he has reporting to him the vice president of sales is no one else reporting to him but reporting to the vice president of sales we can see is the marketing manager the North American sales manager the Pacific sales manager and then we just have these people reporting to each other all over the place but what happens is now instead of just a vice president of sales now we want to have and we want to appoint maybe directors in there that means we would insert a row with the directors and we would literally have to go through and move all the the children vice president of sales to their appropriate director and often that type you know this looks really simple in a short table like this but we end up with another way another sort of complete scan of the table to clean it all up and as well as we can't do updates to the table at the same time so literally we have to you know not let anyone make any updates to this table and any other tables think about all the other stuff that can happen so something that happens in real life like adding a new level or taking a level away or promoting someone causes all this to happen so the way we get around some of that stuff is in a relational database is sometimes we have special data types or we can use a process called adjacency lists or path enumerations or closure tables or nested sets now I'm not going to go through all those because we're not here really to talk about all the ways we go about doing hierarchies but at the end of the deck in resources if you want to implement these which I highly highly recommend is there is a book Joe Selco has a book called trees and hierarchies and again that's going to be listed at the end that tells you how to implement these sort of performance tuning as well as work around so that you're not constantly iterating through every single row on the table to do a query and these are literally tricks that we're doing in a relational database to make what looks like a hierarchy work well and perform well but if we go back to this and you realize we don't really have as many hierarchies as we thought what we really have is a many-to-many relationship so like a bill of materials so we have an employee can report to many employees and a employee can have many employees reporting to them that's a many-to-many so here's one of the weird things about the relational model the reason we have to do this we have to create a brand new entity in order to model this is that we're really now keeping information about that relationship the relationship between employees indeed one of the number one reasons why employees report to other employees and have many employees reporting to them is the whole concept of that reporting structure changing over time so we have to create this separate table and then maintain it and then we have to do all these other tricks to make sure that we don't have you know a director reporting to a VP but that VP reporting to one of the directors reports I mean in theory that could happen but now we have to do all of this stuff outside of the database and we'll talk about why we have to do this so here's an example of sorry here's an example of a reporting structure so we have George who runs the small savings alone and he's got Tilly who reports to him as an assistant and Clarence and Billy and Mary report to him and Burton Ernie and PD and Tommy and Zuzu all report to Mary but in the real world I'm just gonna try to in the real world there it's not just a simple reporting structure so we actually know that people are related to people they report to people they might be married to them one of them might have a bodyguard role one of them might be an administrative assistant for it so what you can see is this hierarchy that we have in our mind that we've modeled is really a graph and the reason I'm going through all this is that when people tell me we don't really have any graph stories yet data stories I believe that every of every enterprise has graph data whether it's their master data or their products and services or their HR reporting is I can guarantee you that every enterprise every enterprise data model has a great data story for graphs and that's just by me picking on the hierarchies there are other reasons so the other way we deal with this traditionally a recursive relationship in the relational world I'm just showing here from SQL server a function a CLR that's written just to manage how you query and deal with this inner joins all these union alls to go find all of the direct reports and the reason someone might do that is so that you don't have every single developer having to write this logic and this is just a snippet of it by the way so the other types of recursives in real life that we have are facilities so buildings are made up of rooms are made up of areas are made up of warehouses and warehouses have special storage locations that have certain properties documents another thing that look hierarchical but then we reuse parts of paragraphs or standard disclaimers across many documents so a section of a document can appear in many documents and then real legit networks like our IT networks or our phone networks or our people networks so the ultimate graph that everyone likes to talk about is the social graph so Twitter and all of these things but that's kind of a cheat because you know the first time someone in the enterprise says oh you need graph so you can keep track of Twitter stuff someone says yeah sometime in my spare time but I'm trying to tell you is your enterprise data is also graph another trick that we do so in SQL server and other DBMS's have this too is they have a special data type called a hierarchy ID and it's attempt to solve this problem so you can see that you can have I can create this table has an employee ID and I've given a specific data type called hierarchy ID and this is so that behind the scenes there's a function that SQL server manages so that you can insert things into the hierarchy and then use these types of language query things to say go get me get me the ancestor of this so the parent get me the children of this tell me what level this thing is tell me what the top level of is and then there's all these other things about reparenting it and and parsing it and reading it this was an attempt by Microsoft to try to make dealing with hierarchies easier the problem is virtually no one uses it because the performance of this is not that great and so it doesn't scale well so if you had a really short hierarchy you might use it like all the departments so by short I mean not too many levels and not too many rows some people will ask me well isn't a relational database about relation and I think that's one of the biggest myths is that relational databases aren't about relationships that word relation comes from the fact that in the relational model the tables and I know it's they're not really tables but the tables are relations and the links between our tables or entities are just constraints we call them relationships in our modeling tool we call them we give them names that work just like in a graph you know we say reports to and everything but under the covers and everything those lines between our boxes are constraints constraints are wonderful that's how we get great data quality it's how we keep from having a whole bunch of invoice lines without an invoice or a whole bunch of email addresses that don't belong to a customer but that's a constraint one of the other features of a relational database is because it's a constraint we can't put any properties or attributes on it so there are modeling techniques notably the Chen data modeling notation which allowed us to create relationships give them names and have the relationships have properties that's all great but if we then go to implement it in a relational database it's typical for these lines to just be implemented as constraints or for us to create an entity between these two entities or tables so that we can store properties so that we can change the cardinality or enforce the cardinality so if there's one thing I want you to take away from here is that relational databases aren't about relationships they're about relations and as Graham Simpson says in his normalization talks the more you normalize the more relations you get just like in marriage the more marriages you have the more relations you get so now we've talked about relational databases and the things we struggle with with trying to model relationships there especially networks and hierarchies we'll talk a little bit about a property graph a labeled property graph so I'm taking this example from sort of the notation and sort of the nomenclature or the vocabulary that comes with Neo4j which is one of the graph databases but the concepts here are the same we have nodes they have relationships those relationships have directions those relationships can have properties and every node can have labels to it so there can be multiple properties for n labels for a node and properties for a relationship so think value pairs so you know I Karen could be a node and I could have a name of Karen Lopez and I could have another property of my Twitter ID is data check nodes can have be categorized or tagged via these labels relationships have a direction just like in a data model relationships have these names they have a start and end nodes you can't just have a relationship hanging out there so here's a an example of here's a an example of a graph that has this so this one's a little bit easy so it's unlike a data model that circle in the middle it's not an entity it's an instance so it's employee for so let's just call employee for Karen so that center node is Karen again it's not an employee entity it's Karen and I can perform activity for and I work as roll for which is a good thing because roll for has a related activity of activity for I'm part of team one I have personal skill one I have a degree three I have another personal skill a four I'm also I used to work as roll six but now I work as roll for and then you can see that these relationships also have properties so that how long where did I work as roll for now one of the things that's hard to get your head around as a modeler is that you'll notice that some of the properties of these relationships have nouns on them and that means that we would normally not think of a noun like location you see these locations here and your question might be why isn't there a location node well this particular design someone had to decide meaning model that we weren't going to necessarily worry about properties or locations we were just going to make those we weren't going to have in our nomenclature in the data modeling world attributes of locations we were just going to list the location property off the relationship right there that's a data modeling decision to deal with graph databases so it's not a right or wrong answer it was the right or wrong answer for this particular graph model and this particular graph so another example from so Neo4j has some just some sample graphs they run competitions for people to make them and people try to make the most fun graphs the most interesting one so does anyone recognize what these nodes might be you could put that in the chat if you want I'm waiting for that chat to explode this is one of my favorite things to use in a data model maybe that space I thing is an indicator so what you see here is we have yes with keys and scotch is so there's a scotch just graph that you can go play with so you'll notice the red or the scotch is the purple are okay now I'm going to get this wrong the purple are the locations now I can't remember now yes a location the teal colored thing in the right is sort of the class the type which is also related to locations because that's how we do booze but think of the blue the purple as distilleries and or locations and space I'd being a larger classification of it the a b and I the yellow circles are flavors so my guess is that we use a b and I here because it would be too hard to describe all the flavors and have them fit in this sort of little graph that are doing there yes whiskey distilleries a great data set to play with so but what happens on oh there's one other type so I showed you a property graph another type of graph database that I haven't had a lot of experience with so we're just going to get an overview of it is a triple store so triple store comes from the semantic world and a triple store is called that because we basically write down facts about things and it makes these really short sentences see of a noun a verb and an object or verb phrase so gender dances with Fred Fred likes ice cream and Karen loves data we write all that south down and in that structure and an individual triple which is what we might call a row or an instance see the hard thing about talking about all these things is we because we have so many types of no sequel coming from all different parts of math or thinking or philosophy is that we now have similar words they're not quite the same but it can get confusing I'm just going to apologize up front that I tend in these webinars and when I talk to put relational terminology on top of it I know it's technically incorrect but think of them as analogies so each of these rows ginger dances with Fred Fred likes ice cream Karen loves data is called a triple and caught a triple because it does subject and then a verb and then an object any one of those things and you know semantically poor but as you take them all together if it said that Karen likes ice cream and Karen loves data and Karen dances with Fred if you took all of those triples and then asked them some graph like questions like how many people does Karen dance with who also like ice cream or how many people does how many let's see we'll do the ultimate graph story is the six degrees of cabin bacon where you just want to know what's the shortest number of dance partners that Fred has to get between Karen and ginger but taking this all together makes up this big graph and in triple stores we use RDF and XML sparkle which are languages in order to query the triple stores the 04 J like I said is a property graph so it uses a language called cipher to create the things and you basically it's a big text file so for instance in this particular example we're creating a graph based on movies and in this case matrix and see this is why it's such a great pun because Neo it was in matrix and their sample database and their documentation is all kinds of stuff about matrix the films I think that's nice so I'm creating a movie it's got a title it's got a year I'm assuming that's when it was released I'm creating actors I'm creating those actors have names and then I'm saying which actors acted in which movies and then now I started to build my graph now one of the things one of the key statements that most people say about graph databases is there's no data model and the reason there's no data model or the reason we would say that one of these is the one from the questions is that there's no when I'm working with Neo or working with another graph database there's no data model to look at it's literally these instances so if you see on the right the graph there is the actors is the movies it's not an entity but do you notice anything is that in this another representation of the graph is while the instances are in the boxes and the actual relationships on the lines is that we have these things that look a lot like attributes the properties or the labels and I think this means there is a data model and I'm going to show you that in a minute the way we now ask a question with cipher is it's just saying a select we say match so basically go find all these things so in this particular model the instance is you who has these friends and we have Anna and Johan and Andrew and Julia and Rajesh and Amanda and what they've worked with and what they like so basically this is asking this particular query notice there's a function here called shortest path we don't have that in the relational world I want to say what's the shortest path between me and someone who's an expert and the expert part of this is not on that graph so if I said that someone was an expert in something or someone had a role of an expert I could just go ask the graph just go tell me the shortest path we had to write this in SQL we'd have to go scan our many to many relationships which means we'd have to read every single row because if you need to know the shortest path through something you'd have to go read every row find all the paths which for some networks or hierarchies could be millions and millions of rows and then figure out which ones were the shortest that's why having a database that specializes in these types of queries is going to perform better IBM also has a graph as database so it's a database as a service I said Neo was something that you could download and install IBM graph which is brand new has database as a service through their blue mix service and their graph same concepts their vertexes or vertices and then they have edges and basically what you do when you create a graph here and in this particular example you can see that we're creating a graph of what looks like airports so LES is McCarran airport in Las Vegas Nevada and we have a latitude and longitude and we're creating this particular this is the programming language that is creating one node one vertex vertex in their graph and then when you query it they're using a language an approach with Gremlin that allows them to go get and in this case I've forgotten now what this is an example of is that oh we're finding a route from an origin to a destination and we're able to query this is a different graph but we're able to do that so what are the things you notice both the IBM graph and the Neo4j it's all text right now it's command line stuff and that you know it's not like opening up SQL server management studio or some third-party tool and working with the database part of this the tiny bit of it is that these are all brand new database systems and they are database engines primarily there are third parties or open source projects working on tooling that goes all around it people do this all the time they go through a web browser they do all this and in fact some of the resources I've listed allow you to do just that but the real engine is these are not replacements to Oracle IBM or any of those things where you would rip out SQL server and replace it with one of these graphs is the ideal situation as I talked about in hybrid is that you would take maybe your transactional data and put your graph stories into a graph database so that you could ask these questions these questions that are harder and more expensive and take longer in relational over here that doesn't mean though that it's a read-only thing so these graph engines have depending on which ones that you're looking have have different levels of acid which is sort of transactional support and both of these support transactions both but there are other systems that don't have that yet so here's another model of IBM graph taken from their documentation and you can see it's also a film related one so we have a film million dollar baby and it has an actor and the actor is paired in the film and this has it so it looks you know these are circles in lines and you can see we haven't set up the relationship between Arnold and the terminator so it looks just like our traditional data models with the exception of that in the actual graph the boxes in lines or circles in lines are actual instances of the data so another example that I'm showing you here is Titan now a bunch of no sequel databases are available that use this open source product this Titan database you also see gremlin here too so this example came from data stacks so because Titan can use several different databases or data stores or data sources to serve up these graphs graphs so for instance Titan can work with Cassandra as its source data and now you can do graph functions against it so in this case Titan in their examples uses gods and monsters as part of their sample set so we can see Hercules we can see all these things as a data modeler when I look at these I see roles and I see verbs that I want to standardize and I see properties or attributes that I want to standardize in fact even on the right you can see that Titan has some data types don't those look familiar our existing data models have an applicability here because we've already documented at least what we think all of these things the data type should be and what we should be calling them now the big question is what tools do we use for these so as far as I know or when ER studio and power designer have no native support for these graph databases um where they would in theory is when there is ODBC connectivity to these databases the tradeoff with that is if I were going to reverse this graph so the one from the matrix and I had ODBC connectivity I would get back into my tools something that looked like an entity for every node I don't want that in my data modeling tool I have other ways of visualizing graphs than using my data modeling tool and also that could be problematic if I had a million or a billion nodes in my graph think about that data model where I think we're going to go and one of the things I want to play with is the fact that I think I could take a very simple data model and I'll show you some examples not in a data modeling tool but in a minute that I could create in order to generate the names of the nodes the properties of them that we specifically want in a graph um and the relationship names maybe and that I could take that information and the data types that we already have and have already decided and already decided whether we're going to call things customers or clients and I could generate a start for this not as structure but as a way of ensuring that we don't spend a lot of time thinking about what we're going to call things in the graph unless we have to and the fact that you know that we would want to be able to have standardized names so in other words data governance and data stewardship so I already talked about we hear a lot that there's no data model in the graph I think there is one it's just not the same as the erd's that we think about um but that model isn't a structure it's a I see I don't want to say conceptual model because that means something else either it's a maybe it's a type of semantic model I'm not sure it's something where I want to write down how we want to talk about this data and then try to ensure that we have some consistency as we use these things across many databases I think we can use that model to design the graphs but we'll have the same issues that we have with traditional modeling trying to get consistent naming not having overlapping names like the word job which could be a construction job and could also be a task for a printer and could also be someone's position in HR that we have these same issues with properties and rules and consistency now one of the things you might have noticed about the graph is there's not a lot of constraints in there there's really none I mean the only constraints we have is if we go crazy with the properties and we don't um if we call the same thing different things or incist inconsistently call the same property something different then we might not be able to query the data and we might get bad queries coming back but that consistency the way we do it in a relational model is by everything in the same table means the same thing and all the columns all the instances of a column a cell in a relational table we refer to it by the same thing not so much in all these other no-SQL solutions so we have still the same modeling and data governance problems we'll just need to go about dealing with them differently especially based on what type of graph we're building if we're building a transactional graph where it's our source and our gold record of this data we probably want to have more governance if we're exporting a bunch of customer interactions and web clicks and bringing in a bunch of data from a whole bunch of sources to see what sort of questions we can ask it then we wouldn't have a lot of governance coming in we would just bring in the data and we would have to do sort of the forensic modeling of knowing that person and people and employee are things that we should be matching up so I'd say that there's no logical and physical data model to the graph because your physical model is actually your instances but I put an asterisk there because of some data models I'm going to show you in a minute the graph people like to talk about that you do a lot of whiteboard data modeling and that just means that the graph model itself isn't overly complex we don't have 10 000 symbols we don't have you know 20 different ways of expressing a relationship we have one with some properties on it but I think traditional data models and therefore traditional data modelers have the have a role here so where this all comes into play is traditionally if we do data model driven development we have requirements and we create this beautiful erd and then we generate a database and we might add some stuff that aren't in our model and then our requirements change and then we go through this all again well in this especially in the nasequal world but in the graph world we might not be starting with requirements so you know the recent news about the Panama papers and the investigative journalism I mean those people didn't start with a model and say this is where we're going to wedge all the data they took data from all kinds of sources to track who knew what people who went to the same places who had businesses with the same people who banked at the same bank you know what banks had the same staff they got that data from all kinds of sources they bring that all into a graph and then they have to figure out how they're going to bring all that data together so it's kind of like working with a package that was developed an application package that was developed over years by lots and lots of different people with no standards because it's data from sources that were never intended to be brought together so that the type of modeling activities I can I call it either forensic or archaeological data modeling is that you're not specifying a structure over everything you're you're deriving the meaning of all of these data sources via a graph so when they say there's no model one of the interesting thing about the just the sample data models that you can go play with is that all their documentation starts out with these things to try to explain when you go to do a query in my graph here's what I called the things and you can see the whiskey one up in the top left hand corner where the purple means whiskey and they had a flavor group and a location over region right that looks like one of our that looks like a conceptual model not a conceptual data model not a formal conceptual model but these are concepts these are things that appear in that graph and what they called them you can see one over here with people oh this is a game of thrones just sits out there where they keep track of people and houses and groups and things that happen to them I love this one complex relationship set of relationships and then there is a Santa one where there's a planet and sectors and regions those look a lot like data models to me and maybe what happens is we will model these things after they're created after we've done this because one of the other things is because we're not specifying a structure to it one of the interesting things about graphs is there's a lot of talk when people talk about them is putting your data in playing with it seeing if it works if it's answering the if you can ask the questions that you wanted to if now by looking at the data you've found new questions to ask it so you want to make an adjustment so you blow it away and rebuild it again and bring the data in I mean those that type of graph usage is you don't know all your data stories up front so I'd say the modeling happens both at the beginning if you're trying to do good data governance and good consistency to save time and to make the data more reusable but we're also going to have a big role of perhaps documenting the decisions that were made so that someone sitting down to write one of these queries knows oh it's called a region here and not an area and a sector see I know contains a planet that a sector doesn't contain regions this to me a data model for a graph database and I see these all the time as people talk about it even while they're saying a graph database has no model so some tips for everybody understand the use cases for graph technologies you have them right now on your projects and you might not even realize it you're going to evaluate and profile your data to see if it works with graphs you need to investigate if you're going to use one of these graph and there's a whole there are many more graph solutions than what I showed here especially in the triple store areas that transactional support is going to vary and you'll want to test your use cases for it you want your queries data stories the questions you want to ask your data to guide your decisions and you may change that a lot you need to you need to test your current development tools for support and your database and data modeling tools we definitely want to leverage our existing metadata models and a lot of the people talking about graph databases think there's no use for them I think there is I think true hierarchies are very rare in the real world and probably every true hierarchy has an exception to prove that it's not there's some sort of rule that I shouldn't make up for that you need to ask questions on the teams about exceptions to hierarchies and rules and keep asking where the data integrity is going to happen or is relevant so I gave you lots of graph stories where we might care about the data integrity and other stories where we don't want to impose integrity on it not in the graph database so before I get to resources let me check out some of the questions we discussed graph add-on functionality that both oracle and tater data ask or offer I don't know a lot about it but I do know that those features are embedded in oracle I haven't had a chance to play with them but that's also an example of a relational system a relational database vendor adding graph features to a relational database our geographic information systems based on graph databases that's a good question most of the GIS systems I worked with predated all of these graph databases that are current now so I don't have an answer to that if anyone else knows about that that's a great question and I would think they would be a good case so definitely locations like definitely the geographic data stories where we consider things inside other things and related to things would be a great case for that any resources or relations with ontologies so I did say that triple stores came out of the semantic world and I haven't had a lot of chance to play with them but that is it was in my studies of RDF and other ontology things is where I was first introduced to graph databases so I'm sure there are resources out there related to that but I don't have any for you today let's see so some resources there's some learning stuff if you want to go play with Neo so and again you're going to get the slides so here's a Belgian beer one see I'm noticing a pattern here with these yeah the fun with drafts the ones I like so they're fun ones and then there are some really serious ones for forensics and access control and everything and that's where you can have fun you don't have to install anything to go play with these you can even move the nodes around but you can write your own queries you can do all the things that you want to do with it on this website I said that IBM has this brand new graph database as a service you get that by signing up with Blue Mix they do have a free trial where you can go just in minutes set up what I think is a one-week play area it might be longer than that I can't remember or if you're an IBM shop you might already have Blue Mix credits for dealing with this and this is database as a service so not anything you have to install here's the trees and hierarchies book that I recommend if you're working with things in the relational world with all those relational tricks for making hierarchies and trees work I also wrote a white paper about your master data on a graph which is an introduction to graph and that's available out on Dataversity as well there's O'Reilly has a book that was written by people at Neo4j that you can download free as an ebook that's in a second edition if you want to deal with that I mentioned that I started all this in the SQL world and actually Joey D'Antoni and I at EDW gave a workshop just last week about seven databases in seven weeks and I don't know that this one's been updated I haven't gone to look but it also has some tutorials across all kinds of no SQL databases that I recommend for someone getting started with all this and all this comes down to is that every design decision includes cost benefit and risk and anyone you're talking to whether they're on the graph side the relational side any of the no other no SQL ones if they are talking to you about why relational dead or why relational failed or why graph databases are never going to be anything anyone who's talking to you that way instead of here's the use case that you would use to work for this type of data story they're not talking to you as an architect they're either talking to you as a marketer or someone who's on team relational and not on team data every design decision comes down to cost benefit and risk and I'd say every data modeler who isn't within a year of retirement needs to start at least going understand going out and understanding how these things work and what are the right times to work with them so that's all I had for today I'm not seeing mention that navigation systems are graphs so one of the things is is you know the stick you can get says graphs are everywhere is like once you start playing with these graph things you relate everything to graph and part of that's because we still think in terms of graphs just because of relational databases and we do think about customers being connected to products being connected to manufacturers I mean at a very high level even a relational database our data is always related the difference is is that relational databases are fantastic at data integrity data quality normalized data fact once preserving that integrity through all the updates and changes and also in data warehousing and analytics for all kinds of other things but there are these niche cases where no matter what we do to a table we're not going to be able to easily and quickly ask at these questions and the other big thing about no sequel is that it's schema on read not schema on write so we're allowed to work with dirtier data or data we don't understand and all these other things that a good relational design wouldn't let us do of course we can always put dirty data in a bad relational design so oh look one more minute so that's all I had for today I'm glad you joined me I'm glad I got some interesting questions and I'm going to stay on for a little bit longer in case you have some after the recording goes off Karen thank you so much for another great presentation as always and thanks to our attendees for the great questions and interaction just a reminder I will be sending out a copy a follow-up email by end of day Monday with links to the slides links to the recording of the session and anything else requested throughout now I will turn off the recording to have a an additional pile up