 Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor of DataVersity. We'd like to thank you for joining the current installment of the Monthly DataVersity Smart Data Webinar Series with host Adrienne Bowles. Today Adrienne will discuss emerging data management options. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share how it's your questions via Twitter using hashtag smartdata. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the chat icon in the top right for that feature. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to you our series speaker, Adrienne Bowles. Adrienne is an industry analyst and recovering academic providing research and advisory services for buyer, sellers, investors, and emerging technology markets. His coverage areas include cognitive computing, big data analytics, the Internet of Things, and cloud computing. Adrienne co-authored cognitive computing and big data analytics, Wiley, in 2015, and is currently writing a book on the business and societal impact of these emerging technologies. Adrienne earned his BA in psychology and MS in computer science from SUNY Binghamton and his PhD in computer science from Northwestern University. And with that, I will turn it over to Adrienne to get us started. Hello and welcome. Thank you, Shannon, as always, and thanks to everybody who's taken the time out of their busy schedules to join us. I know I've been very gratified and humbled by the international audience that we've been getting through this, so it's pretty cool. I should just say one thing about the recovering academic part, because I acknowledged that. I used to be a genuine professorial type, and usually when I'm preparing these webinars, that doesn't come out. I have to be careful today when we're getting into graph databases, and I've tried to stay away from anything that's too theoretical, but if we get into the weeds at any point, just pull me back a little bit, okay? I think this is an area where if you do have a strong background in computer science, you hopefully will want to go into more detail than I will today, but we can just consider this at the beginning of the dialogue. So I'm keeping it at a business and technology level today. Okay. With that, I'm going to start as soon as I go here. There we go. You know, it's always good to give some basic life advice, and since I don't have a reality TV show as a platform, I'm going to start here. One of the things that we get into a lot when we're dealing with data management is the idea of modeling, and in particular, what do you model and how do you model it and how do you represent things. Of course, that's important in today's topic, looking at data management, and specifically, we're going to spend most of the time looking at graphs. So one of my long-term idols, I guess, is Louis Sullivan, who is often credited as the father of the skyscraper. He was a Chicago architect. He was Frank Lloyd Wright's boss for a while. And he's the one that first wrote about form-following function, and I think that as we get into it today, you'll see that one of the real advantages of graph databases is using graph notation, just using graphs, even if you don't use a graph database, but using a graph to model the world is a lot of the world as we see it, we can actually think of in graph terms. But I do question. Don Gawes and Jerry Weinberg wrote an excellent book a few years ago on exploring requirements, and they pointed out that in military training, a lot of times you're taught that when the map and the terrain disagree, you should believe the terrain. So we're going to look at how to get good models to some extent and when it's appropriate, hopefully, to use a graph database, and when it's appropriate to put that in the mix with other things. There we go. So in that vein, in that line of thought, when you're thinking about a domain, something that you're working on, a domain could be your vertical industry that has its own language. If you're in manufacturing, you'll be looking at things differently than if you're in medicine, you'll be looking at perhaps the same data differently if you're a provider than if you're a prescriber. So I'm sorry, if you're the payer, the patient, or the provider, you'll have different views. So how you think about the domain is going to influence your choice of maps and models and rules of representations. And in particular, when we're starting to look at databases, what are the required operations? What do you want to be able to do with this data? That's going to be important. So Marvin Minsky is one of the early luminaries in artificial intelligence. Attack on this problem back in the 1950s, I think this quote was strong, but to solve really hard problems, you have to use different representations because each one has its own virtues and deficiencies and none is going to be adequate for everything. So if it looks like I'm on the warpath to get everybody to adopt graph databases, that's just because it's the topic today, understand that in most organizations, this will be yet another tool, another representation that will augment but not replace everything that you're doing right now. Adrienne, I do have a request for you to speak a little bit. Sure. I'm sorry, having a volume problem today? How's that? Yes, very. All right. Yeah, when we switched configurations here, I can't actually see the meter to see the volume, but thank you for that. Okay, shall I start? No, I won't start again. Okay, so here's my view of the world of modeling, which is going to be very important today. That's why I decided not to use PowerPoint or Keynote for it. And the issue here is we need to be looking at the intersection of what you're modeling and the real world and focusing on that. It's not a talk today on how to model, so we won't go into that. I know there's a good dataversity webinar on modeling and graph databases that Karen Lopez did recently. So if you're interested in that topic, you might look it up. I'm sure Shannon can tell you how to find the slides for that one. But we have to understand that if we don't get this part of it right, it doesn't matter how much you optimize things. So we'll start with that. And let's get right into who needs a graph database or do you need one? And these are the kind of questions that we hope to be able to help you address by the end of the talk today. They're typical for looking at any new technology, but specifically for data management, what do you need to store or what do you want to store? How much is it looking at the volume of data? How complex is it and how fast is it moving? We did a session a month or two ago on streaming data and analytics for streaming data. In general, what we're looking at right now is not going to be something that you would use for streaming data, but it certainly can scale up for great volumes and really complex structures. And then the third question, what do you need to do with what you store? And that goes back to the earlier comment about defining the operations. This is something where you're looking at data that's going to be written once and retrieved rarely, or is it something that we have defined operations? Because either choices range from the mundane, files, tables, trees. And I didn't go into detail. I don't have slides on all these because I'm assuming that these sorts of terms, when I'm using something that's a treaty or whether it's a queue, you know it's a list, a queue is like you're lining up in the movies, to the bank tower, first in, first out, that sort of stuff. So you have a lot of traditional data structures, but then if you have more needs for operations, you get into database management systems. You know the traditional hierarchical relational, the relational world has certainly prospered over the last several decades and seems to be the default for a lot of things because of the rigor with which you can query the database, frankly. Object databases came into vogue in the 90s. There are some still around. And we'll look at an example of one that now has a graph-oriented view into an object database. Then, of course, no SQL and actual graph databases. So this is what we're going to get into today. The diagram you see now, this happens to be in the domain of cognitive computing, where we've got machine learning at the center. But you could put almost anything in the center, the application that you're working on, and start to partition the types of data that you're going to have to process. And graph databases are rapidly being adopted for knowledge management systems for the underlying architecture for cognitive computing. And that's why I thought this would be an appropriate way of looking at it. So we've got the input types. We've got the response types. Again, that's all data. And we could start to break that out into how structured is it? What's the level of depth, if you will, for the structure? And I'll come back to that in a couple of minutes. But basically, we need to start looking before we can answer the question about what's a good database and categorize and organize the data. And that's going to give you a cue into how you're thinking about things. I think that's really kind of the key. If you start to describe your data to someone and you recognize that you're doing it in this fashion, and we'll have a little graph 101 in a second, then it's probably a good idea to look at it. There are some data categories that we can get into that are probably not – you're not going to get any benefit out of it. Let's get in and actually look at sort of graphs 101, just so we have some common terminology. So graph, simply put, is a structure that has vertices and edges. So the circles, the blue circles are the vertices, the lines are the edges. And you can use this to represent lots and lots of stuff. And I'll give you some examples. But in terms of formalisms, this is where it starts. Now, you can label each of those. So in a particular domain, perhaps the vertices represent places and the – so the edges represent roads, highways, paths, alleys, et cetera, going between them. That's one way of doing it. The vertices could be airports and the lines would be routes. It could be people and the lines between them are lineage or inheritance or familial relationships. This part of it should probably be pretty comfortable just because we use graphs in everyday life. But let's get into a little more detail here and see what it means when we're starting to represent this more formally so that we can use it in a database. And I want to make sure that I emphasize this. We're going to be looking at graphs and how to represent things as graphs and how to use perhaps graph databases. But you don't need to use a graph database to store a graph in a database. I want to make that distinction very clear. With a lot of what we're doing here, you can actually accomplish in a relational database as an example. It may be more difficult and maybe more steps, but there's nothing conceptual here that says you have to have that database. So I'm not pushing that idea. It's just going to make it easier. So in this case, we've got the graph and the edges are labeled, the nodes are labeled. The actual name of an edge or a vertex, the name itself is a property. So in this case, I've just listed, we have how many things? There are five nodes and one, two, three, four, five edges. They happen to be directed. A graph doesn't have to have a direction for the edges. So this is showing that you can get from A to B because there is that line and there's no line back from B to A or from anywhere else to A. So basically on this, if you start at A or C, you can get somewhere and if you don't, you can't. On the right-hand side, I have listed the names for some of these and associated with each are an attribute or a property. So they can be represented as these key value pairs and that's when we start to look at the idea of a property graph. It's how do you represent all the information that's on that graph based on its properties. So you'll see that Old Post Road, for example, has two entries. It would have two entries in the database, one to tell you that it's a paved road and one to tell you the length. You can have these properties, you define the properties basically when you're defining the data model. So it could be age, it could be average temperature, it could be the materials used. It's whatever makes sense in the context of what you're modeling. That's what you're going to capture. Obviously, if you're going to be doing operations on this, there has to be some common theme for the properties. So you wouldn't typically have a graph where some nodes were people and some nodes were buildings and they didn't share any operations. That's a whole different kind of thing. I will actually share one example where you can have nodes that are very different in terms of what they represent and still have a meaningful graph. So here we've got the properties in this key value system. Okay. Now, I just point out here that when we're dealing with data in general and in particular when we're getting into kind of the smart data arena and we're looking at things like machine learning, a lot of times people talk about unstructured data and the requirements that systems be able to process unstructured data. For those of you that have been with us before on numerous occasions, I don't think there really is anything that's unstructured. It's a question of whether or not you know the structure and how difficult it is to determine the structure. The reason I'm using the chart today is because if we're dealing with a database that's organized around the properties of a graph, it's important that what's captured in the vertices and the edges is actually something where the level of complexity, if you will, the level of structure is fairly high. And we may have to come back to that if there are questions, but just kind of keep in mind that if you have, let's say, a blob in a database and it represents a video, you know, I'm shooting a lot of video these days and things that I save in files, I can end up with five gigabyte file for a minute of video. If you don't have the processing power there to determine the structure and define what's in it that's useful, you would think of that as being sort of a deep structure and it's not probably something that you're going to want to deal with in the graph. But on the bright side, I'm going to assume that you're probably already thinking in graphs even if you're not doing so explicitly. So the four examples here, you're probably thinking in graphs if you ever took a biology class, if you ever watched a detective show, if you care about trivia in entertainment or if you know any trivia in entertainment, or if you have occasion to remember or value or use or process, if you will, relationships between people. So hopefully that includes everybody on the call, if not everybody that you know. So we'll start out with you're thinking in graphs if you took a biology class or played the game 20 questions. You're trying to determine something and you get to ask 20 questions. Generally the first one is is an animal, mineral or vegetable and that goes back to 100 years in terms of classification of the natural world into a taxonomy which in fact can be represented as a graph. So the reason you do this is because that is one of the questions that will prune away as much of the world as possible in one question and let you move on. The next one, I don't know how many of you watch detective shows, but there's usually a scene in it like the one on the left with what's known as the Crazy Role Whiteboard. This one happens to be from the show Fargo where we put up the suspects and they start to draw lines and figure out what the relationships are. So you might be thinking of this as well, that's kind of like an entity relationship diagram and in fact it is. You've got entities, you've got these nodes if you will, the people are the nodes here and the edges or the vertices are the relationships. And you can see on this one they're all labeled and that label represents something that will be a property in a property graph. I know it's just about impossible to read the one on the right. This is actually a real screen from a system. IBM actually acquired I2 and their system Coplink has been used in some pretty sophisticated criminal monitoring and fugitive apprehension cases. And what you have here is people that are being tracked and attributes that they know about them. They're all classified and I don't mean classified as in secret. I mean, they're categorized and similar attributes are coded with the same color, that type of thing. But basically what they're doing is building a graph of the people and the relationships. And you've probably heard about things like social network monitoring and looking at who knows whom in those sorts of networks. That's all done with graph analysis, whether or not there's a graph database underneath. So next one. If you know trivia about movies, I was happy to hear Karen Lopez the other day talking about the Kevin Bacon game. Because looking at data and trying to figure out how many links there are, that goes back to Stanley Milgram's experiment on the magical number seven. When you're starting to look at, I'm sorry, when you're trying to determine how many degrees of separation there are between you and anyone else. And I think that as we get more connected via social networks, obviously the number of links goes down. In this case, I just picked the IMDB database. David Sars looks an actor. You can go in and look at it. This is the kind of thing where you might look and say, well, I think I saw this guy in this thing, but I don't remember who he was. So I can go in and I can look at the entry, if you will, which would be a node for the show where you perhaps saw the person and then start to trace back. It's also a good example of what we call a multi-hop, where a typical query in this environment. You start with either a person or an event or a show or a movie, something like that. And then you can move back. You can go from one to the next and so on. And even if the database itself, as I said, is organized as, let's say, triples or relations for each of these graphs that would make it harder to traverse, but you would still be able to do sort of a multi-hop query. And if you remember relationships between people, I had to do this because I never put pictures of my kids. Those are my three sons. So if you think about the popularity of folks going on and tracing their heritage, that's all done with graphs. Family trees, for example. A tree is a particular instance of a graph. It's a type of graph. LinkedIn is some of the most people on the call today probably use this. I won't say who, but I went on LinkedIn and I selected somebody who I happen to know is on the call today. And these are the eight shared people that we know in common. Shannon, it's you. So if you start to look at that and say, well, I'm interested in finding a connection to the person with a particular role at a particular company and you have a tool like LinkedIn or like sort of a graph, even if there isn't a LinkedIn, that's how we get that information. So the reason I'm going through this in this level of detail is that you can start to think in terms of the types of queries, the types of searches, the types of graph traversal, if you will, that you're going to want to have, because that's going to help you understand how you organize things. And one last one on this. This is, let's say, the anonymized look at my typical day if I'm doing sales mode. And you've got nodes that represent companies that are already my clients. You have nodes that represent, in this case, the purple ones, these are companies that I want to sell to. And I look at it and I say, well, how do I get to the right person at Bank Zero? I can look and say, all right, well, I know Jim, who used to have Ellen work for him, and I can track through that. The way you organize the data, the way you think about the data is going to help you create your model and physically organizing it the way you logically think about it is what makes the difference between having something that is possible with other types of organization or data management and something that is optimized for it. One more example. We're going to look at a couple of different ways to organize data. This one happens to be looking at a cognitive computing system. And the reason I put it in here is because if you start to look at the different steps presenting the results, getting the feedback, starting the statement, you can see that the processing here is what's going on at the edges. So you can have something that looks like a project management graph, a project management chart, but you recognize that as you go from one state to the next, one state, meaning a set of data points. So I'm looking for similar solved problems. The data has to be organized in that way before it can get to the next step. And so this is just a representation that shows that the processes themselves and the relationships between processes can be represented in graph form. When you're looking at the types of data and relationships that are a natural fit for graphs, taxonomies and ontologies, and I'll just say a word about each here, are things that are naturally represented because they are typically hierarchical. There are relationships that are generally inherited, and you can think of them as either trees or balanced trees. It doesn't really matter in the abstract. If you're dealing with a taxonomy here, you have the formal structure within your domain, and the critical part here is that if you identify any new node that you haven't seen before, it should fit within the classification scheme. You should know where it goes. If, in this case, motorcycle is defined to be a motorized vehicle with less than four wheels, so it could be a two-wheel motorcycle, it could be a three-wheel, then it's automatically going to go into one of these categories. Where you get into some issues is when something is ambiguous, and in general, if we're describing something in natural language, there are ambiguities that need to be resolved, and they will be resolved in the definition of the nodes or the properties of the nodes, if you will. So you know what properties are valid. So you can see here the weight. I don't actually have this. Okay, so I have motor vehicle weight is at the top because it's an attribute or a property of every type of vehicle we could measure. But it may be that if this were a taxonomy that were being used by, let's say, an insurance company, perhaps weight is irrelevant to them unless it's a commercial vehicle because that's how they rate it. So the same information, the same data, can be structured differently based on the intended use. And that's important when you're starting to create the design to see, because if you've used any kind of a database, you probably recognize that there are lots of ways that you could represent this data. It doesn't have to be in a graph. But if you think about it, if you're trying to describe data to someone before you put it in a database, typically something like this, you sketch out the way I did on my desk with all the post-it notes. And to me what's interesting about a graph and a graph database is we have a lot of money being invested these days in tools to help you visualize data, particularly when you're getting into analytics. Basically that's trying to create a way of looking at something that makes sense by the organization that you see or by the symbols that come out of it. Graph databases off the top of my head are the one approach where you can visualize before the data goes in. And so visualization is almost bi-directional with a graph. If you're writing out this information in another format, you may have to, if you're constrained, let's say you have to do this model and you're going to put it in a relational database management system. Now you have to start to kind of flatten it out. You have to start to create relationships and tables and tuples and all that good stuff. It's not a natural transition. This goes back to my form follows function comment. If what you're looking at is something that you think of in terms of a structure like a tree, then it makes much more sense to start to look at the tree representation when it's big. When it's being stored mechanically. Okay, so that's the idea of a taxonomy. And taxonomies in general are a natural fit for graph databases. And I have talked about this, the little plug for the book on the side is just to say that when I have that, these are diagrams that actually come out of the book that Judith and Marsha and I did on cognitive computing. The reason I like this chart is that it shows that over time the actual hierarchy will change. It doesn't mean that reality changed. But if you look here, this is the diagnostic and statistical manual, the DSM for the psychiatric association. And over the years, the same set of symptoms or properties, if you will, have been refactored and moved into sort of a different view of the world. And that's a very important thing because we tend to think that a fact is something that doesn't change. And I could say that in general, a fact doesn't change only if it is so carefully specified that you have temporal parameters, for example. And you can say this was true at any particular time. So that would be another parameter, another attribute that needs to be captured. The other reason that I have it here is that to show that as these things change as the taxonomies evolve, the idea of an ontology, which is the next slide, that an ontology built on this would include everything that the taxonomy has, but it would also include more detail, the shared common understanding with the rules as they're used within that community or that domain. So if you look at here and you say, all right, well, DSM going from DSM one to two to three to four, and up to the most recent one, five, the common understanding changed. And so the way these things are interpreted and the rules that go along with them and why they're classified, that changes. And that would be part of the ontology, but not the taxonomy, which is a little simpler. Okay. So I wanted to have this concept in without getting it too cluttered because it's one thing to think about graphs and start to look for graphs in daily life and say, well, this is how I would store it. This is just one example. One of the reasons people like relational databases, those who do, is that there is a set of math properties and principles that goes along with relations and how they fit together and how you normalize and how you do your various operations. Graphs, if you're really dealing with a graph, there are mathematical properties that you can leverage. And I'm just going to give you one example here. A lot of people don't realize that if you have a graph like the one that we've been using and you represent it as a matrix. So in this case, we have five vertices from A through E. And imagine that this is completely filled out so where there aren't ones that would be zeros, but it's just a really difficult thing to look at on a small computer screen. So the way you would interpret this matrix is that there is one path between row A and column B. So there's one path that goes from A to B. If there were no arrow on that edge, on that path, and it was bi-directional, then you would also have a one in the matrix between B and A. But this shows you, because it's not symmetric, that it's a directed graph. So this shows you all the paths, a length one between any pair of vertices. Now the interesting property here is that if you multiply the matrix, and for those of you that I've seen in the movie, the matrix, you'd only want to think about that. But if you multiply the matrix that represents the graph, you raise it to a power. Let's say we're just going to go to the next slide and we square it. That matrix squared now gives you a one wherever there is a path of length two. So whatever the exponent is, an entry in that matrix represents the number of paths of that length. I'll just take one more step if I can here. So if you look, this says that there are paths from B to B. So we have one path from B to B of length three. It would go from B to D to E, and back to B from E to E. All of these go through that little cycle there. But there's no path from A to B that's exactly length three. And the reason this is important, two reasons I bring this up. One is that if you represent a world that you're modeling with a graph and you can put that graph representation into a matrix, there are a lot of things that you can do just by virtue of the fact that you have this knowledge of the basic properties. So if you want to find if this graph represents, let's say, stops on your UPS driver's route and we have all the edges, we can add other things into the property, but this will give you the optimized route, assuming that certain properties of the edges are the same. Now, the other reason that I bring this up is that the representation that we have here, this matrix, is obviously not a graph. It's a table. It's a matrix. So we don't need to necessarily store everything for every operation in the same form that it's going to be when we're representing the graph itself if you remember back to one of the early slides when I talked about the idea of a property graph. But this is kind of a crucial thing if you're considering the graph as your representation to look for the types of problem that you can optimize because it is a graph. Now, I want to go quickly through where the market is, now that we understand what it is we're trying to, and I'm going to tell you to flat out. The market's ready. There are options right now that are commercial options that are open source options, and emerging right now are graph data management as a service options. So let's take a look. And I'm sure if you're looking on a small screen, this is like an I chart, so I'm just going to sort of summarize it. And whenever I pull a chart like this from Wikipedia, I have to, besides giving an attribution, because it's always useful to have this, tell you that these charts are generally incomplete, and I'll probably give you an example. But what I wanted to do is just show that here's a list of database systems that are listed today as being based on the graph model. Out of all of these, there are only a couple that are commercially viable. We'll talk a little bit about that. But what they show is the name of the product, or the project if it's an open source project that isn't a commercial product, what graph model they use, whether it's RDF, the resource description, or whether it's a property graph, which is the one that I mentioned. But you can start to look at it and say, oh, okay, so what's the trend? What are people starting to use? Because you don't want to be in a situation where you're an early adopter, and you would still be a fairly early adopter if you do this today, where you get stuck using something that's not going to last. And so I've just highlighted the ones that use a property graph model, which is what I described earlier, or RDF, which is the resource description framework, comes from a worldwide web consortium. Originally it was developed as specs for metadata modeling, but now it's used sometimes to describe knowledge stores in a knowledge management environment. So that's pretty cool. Where it has the API column, you'll see that a lot of these require, or at least give you access to the data using Java or C++. Python, what's also interesting is that there's an emerging class of technologies or tools to help you access, to help you traverse these graphs. And so that's why I say, if you can't get to them through SQL, so that's no space SQL as opposed to a no SQL database, it's no problem because there are these emerging languages. So Gremlin, which I'll come back to in just a little bit more detail, is completely open source. It's a traversal language, and it's supported by many of the companies and the projects that were on that chart. Cypher was developed by Neo4j. I had to talk with folks at Objectivity recently. They've been around for probably nearly 30 years. Objectivity is an object-oriented database, and now they're providing a Cypher-oriented layer, if you will, so that you can get to the data in an objectivity database and view it or consider it to be a graph database. Sparkle, an open source approach based on the Sparkle protocol. It's a sort of recursive label for that. So now I just come back and highlight Gremlin, Sparkle. Those three, Gremlin, Sparkle, and Cypher probably account for the majority of the emerging market, and anyone that doesn't have access via one of those, you're going to be doing a little more coding. So that's what the detail is in there. Okay. Mindful that I need to wrap it up in a few minutes. Apache, again, if you've heard any of the talks in this series, I always talk about the open source implications, and Apache is certainly sort of the leading organization, if you will, at the top that promotes these projects and development environments. So Apache TinkerPop is a graph computing framework that is open source, and it's tied to Gremlin as the graph traversal language. Okay. I need to leave some time for questions. In terms of who's using it today, I will say that I took the easy way out on this, because in terms of market penetration and market adoption, Neo4j is currently market leader for the standalone graph databases. And these are just some examples of companies that are using Neo4j as the database with either Cypher or one of the other approaches to actually get to the data, but on commercial systems. And so, you know, particularly if you're looking at something like eBay doing service routing, which is what I was talking about before in terms of looking at that directed graph or Walmart, recommendations based on the data being stored this way, it's really, it's at the point now where it's no longer as much of an adventure, if you will, starting to adopt it. So, getting started, what are the key things? Why would you choose the database? I think that the three keys here are speeds delivery. If you can naturally model the data that you're looking at as a graph, it simplifies these multi-hop queries. Think again of looking at, say, the IMDb database. If you start with something that's a completely different category from what you're looking for, you're looking for the name of an actor, and what you have is the name of a movie, or you know that somebody else was in it and you start to go through it, you're going in these multi-steps, that's really nicely modeled as a graph. And finally, visualization. I think it's baked in because if you think about the graph as the way the data looks before it goes in, then when you get the visualization tools, it's going to be very familiar. So the questions that you have to look at today, if you're going to be adopting it, are, first of all, are you looking for something where you have to have an on-premise solution or you're going to manage your own database even if it's in the cloud? In that case, there are lots of options from that chart. Neo4j is clearly the market leader. If you want to do something where it's managed for you as a service, right now probably the closest thing to a scalable, large-scale solution as a service is what IBM is offering through Bluemix. It's in beta now, but that'll be out there within a couple of weeks as a commercial product. So we're going to close it out and open up the questions. I put in a quick hat tip to Camille Nixon. Camille is actually, we did a panel together at an event recently and I had a good conversation with her last night to kind of test some of these ideas. So I just wanted to thank her for that. She was at Neo4j. Now she's at IBM. Got a couple of upcoming webinars and I'm sure Shannon will tell you if you don't know about the July webinar, which is part of the Smart Data Online Conference. And with that, I'm going to turn it back to Shannon. Adrienne, thank you so much for another great presentation. Certainly one of the most common questions we get are people inquiring about the slides. Just a reminder, I will send a follow-up email within two business days with links to the slides, the recording and anything else requested throughout the webinar. Feel free to submit your questions in the Q&A in the bottom right hand corner. So there was a comment here, Adrienne, that the presented motor vehicle taxonomy is possibly incorrect. For example, a commercial vehicle can be a passenger car. I appreciate the comment and let me explain how I almost got arrested for that one day. I was driving through a toll booth for those of you that are familiar with Henry Hudson Parkway in New York and it's for non-commercial vehicles. I was in a van that was registered as a multi-purpose vehicle in Connecticut, but in New York it was considered to be a commercial vehicle. And to me, I should actually probably use a picture of that in here. I think the issue is that it depends on who's creating it and for what purpose, whether or not this would actually be correct or incorrect. And I appreciate the question, the clarification we're pointing it out. If it's universally an error, I think the issue is that for taxonomy like that, it depends on who is, what it's being used for. So what was interesting to me that day as I was harshly stopped from my forward progress was that law enforcement in two different jurisdictions on a road that if you were to graph this out, go back to that example, point A was where I started in Manhattan, point B was where I was hoping to get to in Connecticut. That's one road, but somewhere in the middle, the rules of the road changed. I'm not sure if that addresses the question or not, but I think the issue is it's really about context. So I appreciate someone bringing that up. And there was a question here about visualization skills, but before I get to that, let me just stay here and graph for a little bit. Do you sign many folks converting relational to graph? And can you mention a few use cases? Do I find them converting? You know, I would say not a lot are going to go through the conversion process. I think it's mostly right now new application where it's a more natural modeling approach. What is the second part of that, a use case? Yeah, do you have any use case exam? Well, I think if you look at the areas that were shown in that NEO slide, a lot of it has to do with transportation or logistics or movement from one point to another. I guess some of those certainly predate the popularity of formal graph databases. And so that would be more the case that I talked about before where you don't need a graph database to represent a graph. It's just a lot easier. I guess the way I look at it is you could represent everything that we're doing here in another form. But why would you want to? So I think that we're just starting to see that shift. I don't have a lot of good use cases that I can comment on where people have actually replaced an existing data store, done a transformation, if you will, from a relational representation to a graph representation. But I'll actually look into that. And if the person that asks the question wants to shoot me an email, I'd be happy to see what I can share with them. Certainly. And so talking to your cognitive computing expert, do you find many folks converting, or excuse me, or whereby I just lost Mike? Great. Cool. Well, I can't see any of the questions today, so you can ask me anything, and I will believe that somebody else actually asked it. Are graph databases the foundation for most cognitive computing platforms? For most. I would say no, but I would say that most, you know, right now cognitive is still in the early stages. And one of the issues is that the label cognitive is being applied to a broad area, you know, anything from predictive analytics to machine learning. If you look at sort of that wide range, I would say that most of the systems that I'm familiar with that are fairly large scale will have some graph attributes. They may use a graph database. I don't know of many that are using graph at the core exclusively. It seems to be a mixed environment. Part of that is what you're actually trying to store. So if you go back to one of the slides where I talked about taxonomies and ontologies, you may have sort of a lot, a large volume of data that's coming in that's historical. Let's take oncology, for example. That's a well-studied area with cognitive computing. And the data itself originates in a form that is not in a graph database. It originates as text in journals or patient records or case files. And then it goes through some machine translation. We get into natural language processing, which actually I think is the topic we're going to cover next month. So it may, in terms of preparation for query management, many of them are now putting that into a graph database, but it's not the exclusive way that something is represented. Sort of a tiered approach. So we generally handle storage management. Depending on the usage and the complexity and the speed with which something is needed, we may put it in at different levels or different tiers and just bring it in as needed. So for the actual traversal, right now I think that's more likely where you're going to say it. Interesting. Well, I accidentally marked this as answer, so let me make sure I get to it. Any recommended visualization tools that can be talked about? A lot about graph presentations. Yeah. I don't have a resource on that because most of the tools that are sold today as visualization tools, and there's a whole list of those. If anybody's interested, I have a chart that I think I used in an earlier presentation listing data management and visualization tools. I'll be happy to send that to you if you want to circulate it because there's a few dozen. But most of those are actually visualization tools that are aimed at more conventional, I hate to say conventional, but at the relational world. Typically we're at the stage now in terms of development where the visualization tools that you're going to see are driven by the vendors themselves of the databases. And just like it was with visualization for relational databases, you didn't see a lot of tools from third parties that would improve on what the vendors are doing until there was a large enough user base. So I don't think that today there's enough of a user demand for that. If you're looking at NoSQL, and I did cover some of these last time, companies like Stream Analytics or Zoom Data, that's where you'll find companies that are doing some very sophisticated stuff at extracting out, extracting, abstracting and presenting in visualization tools. But those typically today are not aimed at the graph market. I don't think they'll be long. Sure. How do you see master data vendor solution existing solutions working or enriching the client information with graph solutions for social data? For example, identity household. Any cases, examples already in production in that case? I'm sorry. Could you repeat that? It's a mouthful for sure. How do you see master data vendor solutions working or enriching the client information with graph solutions for social data, for example, identity household? Any examples? I don't have anything I can share today. Conceptually what I can talk about is one of the things that we're dealing with today in general is that we're starting to find lots of large, rich data sets that are becoming available from third parties. Social data, social network data, that sort of stuff. In general, there are issues there in terms of privacy and security, but we are starting to see – and I've used examples in the past couple of months from Google and Yahoo providing these data sets. And I think that where we're getting to is that that's going to actually tie in to the previous question in terms of visualization and third party tools as people try and rationalize, if you will, data from different sources. Because right now, if you go back to that Wikipedia chart, there are a couple of different approaches to how the data is actually stored. I talked about RDF and property graphs, et cetera. But even within there, if you're talking property graphs, getting from one vendor's database that claims to be managing the data as property graphs to another that's also saying it's done as property graph, that's still not something where there's a standard and a third party is going to come in and be able to look at both of those. So I think that we're probably – I hate to put a time on it, but I would say things like the IBM system that I mentioned under Bluemix that's going to be out in general availability in a couple of weeks. Once you start to get a lot of people using something like that, then I think the third party tools will emerge. I mean, on a separate but related issue, quantum computing, which is something we're going to talk about towards the end of the year. IBM just made available for free access to a five-qubit quantum computer. And they had something like 30,000 people sign up in the first 24 to 48 hours. So as we start to get people to understand why you would want to represent something in graph form, and as you have platforms – and IBM is just one of them – I mean, typically in this space, whenever something scales up like this, you'll find IBM, Amazon, and Microsoft will all end up with platforms like this and allow you to do it. So it's at that point that the ecosystems evolve, and then you'll start to see those kinds of tools. Sure. And, you know, we actually hosted a white paper for Neotechnology for a bit written by Karen on MDM and graph databases. Oh, did you? Okay. We don't have it on the page anymore, but I'm sure you can find it on Neotechnology's website. Adrienne, thank you so much. This has been another fantastic presentation, just a great presentation, great topic. Thanks to all of our – yeah, thanks to all of our attendees for being so engaged in everything we do and offering up such great questions. And as you mentioned, we will be meeting up again next month to talk about – whatever you can talk about. Perception to personality, yeah. It isn't that or it's NLP. I have both of those on my list, and I don't – one of those topics. It's a mystery topic. You and I can talk offline, and then we'll get it straightened out. Yes, it'll be June 9th. Also, we have – as you mentioned, we have smart data online happening. June 13th, our first smart data online conference. I'm very excited, and thank you for participating. And we hope to see you all next month, and I hope everyone has a great day. Thank you so much. Thanks.