 I'm Peter Burris, welcome to Wikibon's Action Item. Once again, we're broadcasting from our beautiful theCUBE studios in Palo Alto, California, here in the studio with me, George Gilbert. And remote, we have Neil Raden, Jim Kubilis, and David Floyer, welcome guys. Hey. So we've got a really interesting topic today. We're going to be talking about graph databases, which probably just immediately turned off everybody. But we're actually not going to talk so much about it from a technology standpoint. We're really going to spend most of our time talking about it from the standpoint of the business problems that IT and technology are being asked to address, and the degree to which graph databases, in fact, can help us address those problems. And what do we need to do to actually address them? Human beings tend to think in terms of relationships of things to each other. So what the graph community talks about is graph-shaped problems. By graph-shaped problem we might mean that someone owns something and someone owns something else or someone shares an asset, or it could be any number of different things. But we tend to think in terms of things and the relationship that those things have to other things. Now, the relational model has been an extremely successful way of representing data for a lot of different applications over the course of the last 30 years. And it's not likely to go away. But the question is, does the, do these graph-shaped problems actually lend themselves to a new technology that can work with relational technology to accelerate the rate at which we can address new problems, accelerate the performance of those new problems, and ensure the flexibility and plasticity that we need within the application set so that we can consistently use this as a basis for going out and extending the quality of our applications if we take on even more complex problems in the future. So let's start here, Jim Cabullus. When we think about graph databases, give us a little hint in the technology and where we are today. Yeah, well, graph databases have been around for quite a while in various forms, addressing various core use cases, such as social network analysis, recommendation engines, fraud detection, semantic search, and so on. The graph database technology is essentially very closely related to relational, but it's specialized to, when you think about it, Peter, at the very, you know, the very heart of a graph-shaped business problem, the entity relationship diagram, and anybody who's studied databases has mastered, at least at a high level, entity relationship diagram. The more complex these entity and relationship, these relationships grow among a growing range of entities, the more complex the sort of the network structure becomes in terms of linking them together at a logical level. So graph database technology was developed a while back to be able to support very complex graphs of entities and relationships in order to do a lot of its analytic. It's a lot of it's very focused on a fast query, what they call query traversal, among very large graphs to find, you know, quick answers to questions that might involve who owns which products that they bought which stores in which cities and are serviced by which support contractors and have which connections are in a relationship with other products they may have bought from us and our partners so forth and so on. When you have very complex questions of this sort, they lend themselves to graph modeling. And to some degree, to the extent that you need to perform very complex queries of this sort very rapidly, graph databases, and there's a wide range of those on the market, have an optimize for that. But we also have graph abstraction layers over our DBMSs and multi-model databases. You'll find them, you know, running an IBM's databases or Microsoft, Cosmos, DB and so forth. So you don't need graph specialized databases in order to do graph queries in order to manipulate graphs. So that's the issue here. When does a specialized graph database serve your needs better than a sort of a non-graph optimized but nonetheless graph enabling database? That's the core question. So Neil Raiden, let's talk a little bit about the classes of business problems that could in fact be served by representing data utilizing a graph model. So these graph shaped problems, independent of the underlying technology. Let's start there. What kinds of problems can business people start thinking about solving by thinking in terms of graphs of things and relationships amongst things? But it all comes down to connectedness. That's the basis of a graph databases, how things are connected either weekly or strongly. And these connected relationships can be very complicated. They can be based on very complex properties. A relational database is not based on, not only is it not based on connectedness, it's not based on connectedness at all. I'd like to say it's based on unconnectedness. And the whole idea in a relational database is that the intelligence about connectedness is buried in the predicate of a query. It's not in the database itself. So I don't know how overlaying a graph abstractions on top of a relational database are a good idea. On the other hand, I don't know how stitching a relational database into your existing operation is going to work either. We're going to have to see, but I can tell you that a major part of data science, machine learning, and AI is going to need to address the issue of causality, not just what's related to each other. And there's a lot of science behind using graphs to get at the causality problem. And we've seen for example, well, let's come back to that. I want to come back to that, but George Gilbert, we've kind of experienced a similar type of thing back in the 90s with the whole concept of object-oriented databases. They were represented as a way of re-conceiving data. The problem was that they had to go from the concept all the way down to the physical thing and it didn't seem to work. What happened? Well, it turns out the big argument was with object-oriented databases, we can model anything that's so much richer, especially since we're programming with objects. And it turns out though that theoretically, especially at that time, you could model anything down at the physical level or even the logical level in a relational database. And so those code bases were able to handle sort of similar, both ends of the use cases, both ends of the spectrum. But now that we have such extreme demands on our data management, rather than look at a whole application or multiple applications, even sharing a single relational database like some of the big enterprise apps, we have workloads within apps like recommendation engines or a knowledge graph, which explains the relationship between people, places and things or digital twins or keeping mapping your IT infrastructure and applications and how they all hold together. You could do that in a relational database, but in a graph database, you can organize it so that you can have really fast analysis of these structures. But the trade-off is, you're going to be much more restricted in how you can update the stuff. All right, so we think about what happened then with some of the object-orientated technology. The original world database, the database was bound to the application and the developer used the database to tell the application where to go find the data. Relational data allowed us not to tell the applications where to find things, but rather how to find things. And that was persisted and was very successful for a long time. Object-orientated technologies in many respects went back to the idea that the developer had to be very concrete about telling the application where the data was. But we didn't want to do things that way. Now, if something's happened, David Floyer, it used to be one of the reasons why we had this challenge of representing data in a more abstract way across a lot of different forms without having it also being represented physically and therefore a lot of different copies and a lot of different representations of the data which broke systems of record and everything else, was that the underlying technology was focused on just persisting data and not necessarily delivering it into these new types of databases, data models, et cetera. But flash changes that, doesn't it? Can't we imagine a world in which we can have our data in flash and then, which is a technology that's more focused on delivering data and then having that data be delivered to a lot of different representations including things like graph databases and graph models. Is that accurate? Absolutely, and in a moment I'll take it even further. I think the first point is that when we were designing real-time applications, transactional applications, we were very constrained indeed by the amount of data that we could get to. So as a database administrator, I used to have a rule which you could, database developers could not issue more than a hundred database calls. And the reason was that if you, they could always do more than that, that the applications became very unstable and they became very difficult to maintain. The costs of maintenance went up a lot. The whole area of flash allows us to do a number of things and the area of Unigrid enables us to do a number of things very differently. So that we can, for example, share data and have many different views of it. We can use Unigrid to be able to bring far greater amounts of power, compute power, GPUs, et cetera to bear on specific workloads. And I think the most useful thing to think about this is this type of architecture can be used to create systems of intelligence where you have the traditional relational databases dealing with systems of record and then you can have the AI systems, graph systems, all the other components there looking at the best way of providing data and making decisions and in real time, that can be fed back into the systems of record. All right, all right. So let's, so Neil, let me come back to you very quickly. Sorry, George, let me come back to Neil. I want to spend, go back to this question of what does a graph shape problem look like? Let's kind of run down it. We talked about AI, what about IoT guys? Is IoT going to help us? Is IoT going to drive this notion of looking at the world in terms of graphs more or less? What do you think, Neil? I don't know. I haven't really thought about it, Peter, to tell you the truth. I think that one thing that we leave out when we talk about graphs is we talk about, you know, nodes and edges and relationships and so forth. But you can also build a graph with very rich properties. And one thing you can get from a graph query that you can't get from a relational query unless you write, you know, character predicate is it can actually do some thinking for you. It can tell you something you don't know. And I think that's important. So without being too specific about IoT, I have to say that, you know, streaming data and trying to relate it to other data and getting down to it very quickly, what's going on, root cause analysis, I think graph would be very helpful. Great, and Jim Cabello, how about you? Yeah, I think that IoT is tailor made for, I should say graph modeling and graph databases are tailor made for the IoT. Let me explain. I think the IoT is very, graph is very much a metadata technology. It's expressing context in a connected universe for expressing context in a connected universe. Where the IoT is concerned, it's all about connectivity. And so graphs are increasingly complex graphs of say, individuals and the devices and the apps they use and locations and various contexts and so forth. These are increasingly graph-based. They're hierarchical and shifting and changing. And so in order to contextualize and personalize experience in a graph in an IoT world, I think graph databases will be embedded in the very fabric of these environments. Microsoft has a strategy they announced about a year ago to build more of an intelligent edge around a distributed graph across all their offerings. So I think graphs will become more important in this era undoubtedly. George, what do you think, business problems? Business problems on IoT, the knowledge graph that holds together a digital twin, both of these lend themselves to graph modeling. But to use the object oriented databases as an example, where object modeling took off was in the application server where you had the ability to program in object oriented language, and that mapped to a relational database. And that is an option, not the only one, but it's an option for handling graph model data like a digital twin or IT operations. Well, that's just what we're thinking about here if we talk about graphs and metadata. And I think Neil, this partly answers a question that you had about why wouldn't anybody want to do this, that we're representing the output of a relational data as a node in a network of data types or data forms so that the data itself may still be relationally structured, but from an application standpoint, the output of that query is itself a thing that is then used within the application. But to expand on that, if you store it underneath as fully normalized in relational language laid out so that there's no duplicates and things like that, it gives you much faster update performance, but the really complex queries, typical of graph data models would be very, very slow. So once we have, say, more in-memory technology or we can manage under the covers the sort of multiple representations of the data. That's what flashes are going to allow us to do. What David Floyd just talked about is that we can have a single persistent physical storage, but it can be represented in a lot of different ways so that we avoid some of the problems that you start in a race. If we had to copy the data and have physical copies of the data on disk in a lot of different places, then we would run into all kinds of consistency and update and that would probably break the model. We'd probably come back to a notion of a single data store. I want to move on here, guys. One really quick thing, David Floyd, I want to ask you, if there's, you mentioned when you were database administrator and you put restrictions on how many database actions an application or transaction was allowed to generate. When we think about what a business is going to have to do to take advantage of this, are there any particular one thing that we need to think about? What's going to change within an IT organization to take advantage of graph database and we'll do the action items? So the key here is that the number of database calls can grow by a factor of probably 1,000 times what it is now with what we can see as coming as technologies over the next couple of years. So let me put that in context, David. That's a single transaction, now generating 100,000 database calls. Well, access calls to data, whatever type of database. And the important thing here is that a lot of that is going to move out with the discussion of IoT to where the data is coming in. Because the quicker you can do that, the earlier you can analyze that data. And you talked about IoT with multiple different sources coming in, a simple one like traffic lights, for example. The traffic lights are being affected by the traffic lights around them within the city. Those sort of problems are ideal for this sort of graph database. And having all of that data locally and being processed locally in memory, very, very close to where those sensors are is going to be the key to developing solutions in this area. So Neil, I got one question from you, or one question for you, I'm going to put you on the spot, I just had a thought. And here's the thought. We talk a lot about, in some of the new technologies, it could in fact be employed here, whether it be blockchain or even going back to SOA. But when we talk about what a system is going to, is going to have the authority to do about the idea of writing contracts that describe very, very discreetly what a system is or is not going to do. I have a feeling that those contracts are not going to be written in relational terms. I have a feeling that like most legal documents, they will be written in what looks more like graph terms. I'm extending that a little bit, but this has rights to do this at this point in time. Is that also this notion of incorporating more contracts directly into how systems work to assure that we have the appropriate authorities laid out? What do you think? Is that going to be easier or harder as a consequence of thinking in terms of these graph-shaped models? Boy, I don't know. Again, another thing I haven't really thought about, but I do see some real gaps in thinking, let me give you an analogy. OLAP databases came on the scene back in the 90s or whatever, and people in finance departments and whatever, they loved OLAP. What they hated was the lack of scalability. And now what we see now is the scalability isn't a problem and OLAP solutions are suddenly bursting out all over the place. So I think there is a role for a mental model of how you model your data and how you use it. It's different from the relational model. I think the relational model has prominence and has that advantage of what's it called, occupancy or something. But I think that the graph is going to show some real capabilities that people are lacking right now. I think some of them at the very high end, things like I said, getting to causality. But I think that graph theory itself is so much richer than the simple concept of graphs that's implemented in graph databases today. Now, I agree with that in totally. Okay, let's do the action on them around. Jim Cabilus, I want to start with you. Jim, action item. Yeah, for data professionals and analytic professionals, focus on what graphs can't do, cannot do, because you hear a lot of hyperbole, they cannot be, they're not optimal, they're not useful for unstructured data or for machine learning in database. They're not useful for schema on read. What they are useful for is the same core thing that relational is useful for, which is schema on write applied to structured data. Number one, number two, and I'll be quick on this, focus on the core use cases that are already proven out for graph databases. We've already ticked them off here. Social network analysis, recommendation, engine influencer analysis, semantic web. There's a rich range of mature use cases for which semantic techniques are suited. And then finally, and I'll be very quick here, bear in mind that relational databases have been supporting graph modeling and graph traversal and so forth for quite some time, including pretty much all the core mature enterprise databases. If you're using those databases already and they can perform graph traversals and so forth, reasonably well for your intended application, stick with that. No need to investigate the pure play, graph optimized databases on the market. However, that said, there's plenty of good ones, including AWS is coming out with Neptune. Please explore the other alternatives, but don't feel like you have to go to a graph database first and foremost. All right, David Foyer, action item. Action item, you are going to need to move your data center and your applications from the traditional way of thinking about it, of handling data, which is sequential copies going around usually taking a two or three weeks. You're going to have to move towards a shared data model where the same set of data can have multiple views of it and multiple uses from multiple different types of databases. George Gilbert, action item. Okay, so when you're looking at, you have a graph oriented problem, in other words, the data is shaped like a graph. Question is, what type of database do you use? If you have really complex query and analysis use cases, probably best to use a graph database. If you have really complex update requirements, best to use a combination perhaps of relational and graph or something like multi-model, we can learn from Facebook where for years, they've built their source of truth for the social graph on a bunch of sharded MySQL databases with some layers on top. That's for analyzing the graph and doing graph searches, I'm sorry, for updating the graph and maintaining it in its integrity, but for reading the graph to have an entirely different layer for comprehensive queries and manipulating and traversing all those relationships. So you don't get a free lunch either way. You have to choose your sweet spots and the trade-offs associated with them. All right, Neil Raiden, action item. Well, first of all, I don't think the graph databases are subject to a lot of hype. I think it's just the opposite. I think they haven't got much hype at all and maybe we're going to see that. But another thing is a fundamental difference when you're looking at a graph and a graph query, it uses something called open world reasoning. A relational database uses closed world reasoning. I'll give you an example. Country has capital city. Now you have in your graph that China has capital city a king, China has capital city Beijing. That doesn't violate the graph. The graph simply understands and intuits that the different names are the same thing. Now, if you love to write correlated subqueries, then for many, many of your relationships, I'd say stick to your relational database. I see unique capabilities in the graph that would be difficult to implement in a relational database. All right, thank you very much, guys. Let's talk about what the action item is for all of us. This week we talked about graph databases. We do believe that they have an enormous potential, but we first all have to draw a distinction between graph theory, which is a way of looking at the world and envisioning and conceptualizing solutions to problems and graph database technology, which has the advantages of being able for certain classes of data models to be able to very quickly both write and read data that is based on relationships and hierarchies and network structures that are difficult to represent in a normalized relational database manager. Ultimately our expectation is that over the next few years we're going to see an explosion in the class of business problems that lend themselves to a graph modeling orientation. IoT is an example, very complex. Analytics systems will be an example, but it is not the only approach or the only way of doing things. But what is interesting, what is especially interesting is over the last few years, a change in the underlying hardware technology is allowing us to utilize and expand the range of tools that we might use to support these new classes of applications. Specifically, the move to flash allows us to sustain a single physical copy of data and then have that be represented in a lot of different ways to support a lot of different model forms and a lot of different application types without undermining the fundamental consistency and integrity of the data itself. So that is going to allow us to utilize new types of technologies in ways that we haven't utilized before because before, whether it was object oriented technology or OLAP technology, there was always this problem of having to create new physical copies of data which led to enormous data administrative nightmares. So looking forward, the ability to use flash as a basis for physically storing the data and delivering it out to a lot of different model and tool forms creates an opportunity for us to use technologies that in fact made more naturally mapped to the way that human beings think about things. Now, where is this likely to really play? Well, we talked about IoT, we talked about other types of technologies where it's really likely to play is when the domain expertise of a business person is really pushing the envelope on the nature of the business problem. Historically, applications like accounting or whatnot were very focused on highly stylized data models, things that didn't necessarily exist in the real world. You don't have double entry bookkeeping running in the wild. You do have it in the legal code, but for some of the things that we want to build in the future, people, the devices they own, where they are, how they're doing things, that lends itself to a real world experience and human beings tend to look at those using a graph orientation. And the expectations over the next few years, because of the changes in the physical technology and how we can store data, we will be able to utilize a new set of tools that are going to allow us to more quickly bring up applications, more naturally manage the data associated with those applications, and very important, utilize targeted technology in a broader set of complex application portfolios that are appropriate to solve that particular part of the problem, whether it's a recommendation engine or something else. All right, so once again, I want to thank the remote guys, Jim Kobielis, Neil Raden, and David Floyer. Thank you very much for being here. George Gilbert here in the studio with me. And once again, I'm Peter Burris and you've been listening to Action Item. Thank you for joining us and we'll talk to you again soon.