 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We would like to thank you for joining the latest installment in the multi-dataversity webinar series, Advanced Analytics with William McKnight, sponsored today by Data.World. Today, William will be discussing graph database use cases. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A section, or if you'd like to tweet, we encourage you to share highlights or questions by Twitter using hashtag ADV analytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, the chat defaults descended just the panelists, but you may absolutely change that to network with everyone. And to open the Q&A panel or the chat panel, you'll find those icons in the bottom middle of your screen for those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recordings of this session, and any additional information requested throughout the webinar. Now let me turn it over to Trent from Data.World for a brief word from our sponsor, Trent. Hello and welcome. Thank you very much, Shannon. I will start to share here. Let me know if you can't see it. Otherwise, I'll assume you can. And we'll get started. I wanted to really start with this quote that I find really compelling, and it really tells you that it's a great time to be learning about graph databases. This is from Rita Salam, an important analyst at Gartner, and she says that they believe that by 2025, graph technologies will be used in 80% of data and analytics innovations, and that's up from just 10% in 2021. So it's a real growth time in this area, and it's a really interesting topic and a great time to be learning about it. So what we will, let's see. Oh, there we go. Hang on a second. Technical issues. Let's do just a little bit of a delay here in my, okay. All right. So today you'll be hearing from William McKnight, president of the McKnight Consulting. My name is Trent Merra. I'm a sales engineer with Data.World. What we are is we are the enterprise data catalog for your modern data stack. I will be explaining more about what that means. But first, let me tell you what a data catalog is. Right now, you know, we really are in a time when we have so many options in terms of data platforms, ETL tools, BI tools, lots of places to keep our data, whether it's in the cloud, on-prem, desktop tools even. The challenge is that it's hard to sometimes even know the data you have, much less to know where it is, can you trust it, or even collaborate on it for that matter. So what we do as a catalog at Data.World is we go out to all of those data sources and we catalog them for you. We let you know what you have and where it is. You can specify if it's data that can be trustworthy, if it's something that maybe contains something confidential or something that has to be kept private, all of those kinds of things. So that's everybody a common source of truth across the whole enterprise. And once they have that, they can start to collaborate and they can start to derive value from all of that data that they're storing in these different places. So that's what a catalog does. And then specifically, the reason that we wanted to sponsor this webinar was because we are the only data catalog powered by an enterprise knowledge graph. I also will be telling you what that means and why that's important. First, just a quick peek at what our tool looks like. Our tool is really designed to be very inviting to people all throughout all personas in the business from not just technical users but but also all of the analysts all of the business domain experts because what we found is that to get adoption of a catalog, you really need the involvement of everybody and so we built a tool that is is very inviting. It doesn't scare non technical users away it still has all of the technical bonafides but it invites everybody to participate. It's really a lot like Google, when you look for something when you're looking for a data resource, you're going to get a knowledge panel that gives you context about the idea that you're looking for who who is connected to it. What other objects in your environment are related to that, and the search is also very similar in terms of having search facets to filter down. In a very intuitive search, everybody can suggest changes to the catalog so they can all contribute their domain knowledge to the catalog. And that's really how you get this this collaboration and building a really useful tool. And it's very again very social in terms of having tagged discussion threads all throughout because we really find its people plus technology is what's required to make this kind of undertaking work of cataloging all the all the companies data. So, first, let me talk about why you would want to have a catalog that uses a graph, and ours in particular is called a knowledge graph, but it's, it's all really the same type of thing at heart. The point is that when we're building a catalog. We really have to be able to account for the wide variety of data that companies have now relational databases don't work well for wide varieties of data that work great for large volumes but not so much for varieties. So if you think about a relational database, it stores things in tables. So if we have a customer table on the left product table on the right. We can put a product product with a customer and we that's an order and so now we have these tables that are storing these things. If you have a relatively limited schema. Relational databases are great for that. It's all you need. What happens if you start to want to really track lots of different types of information about your customer so you want to know what they're, what's their profession, how do they prefer to be communicated with what are things do they subscribe to whatever they browse. You know the more this information that you you want to track about a customer and connect together the more tables you have to create and it's cumbersome to build and maintain all those tables. When you want to extract information from them and make use of them, you have to join those tables and as you get large amounts of joins occurring that becomes a poor performing way to query that data. So, you know in a catalog we need to connect all kinds of things we can need to connect a call a column of data to a report that uses it or a policy that applies or a process that builds it. It's unlimited what we might want to connect. So, that doesn't work to put things in all kinds of tables. So what if we store like this, these are called triples. Okay, so this isn't a table. What this is is this is just a series of independent statements. So the first the first they're called triples because they got three parts so you can see Jane Ward at the top that's the subject. And then bought is the predicate it's what we're saying about that subject. And then the object is cycling Jersey so it's just it's pretty intuitive Jane Ward brought a cycling Jersey. Okay, and then on down the line so did Frank and Jane Ward subscribes to the New York Times etc etc. So these are independent statements called triples we don't need to create tables to store this, and we can really create these triples about anything that we want to that we want to connect in and so that's really the power of it. And that's really the heart of really how how graphs work and what's nice is when you start to build that that those that list of triples, what emerges from it is a schema and we can visualize it like this. So this is that same data and kind of a visual way. And what's nice about graph type of platforms grab the graph type of data model is it's got query languages and the one that we use data dot world is is the W3C standard sparkle language. It lets us ask questions about connections in this graph. So, which customers have matching per matching purchase and subscription, or matching profession and subscription, or either a purchase or profession match. Short path queries is what we call these. The longer those paths get the, the more important it is to have a good, a suitable model for it and that's really what's so great about graph. So again, relational databases are great for large set operations grouping a million rows of data on a particular dimension but if you want to do those kind of path operations to really find connections in a complex data set, you want to use a graph. So again, that applies to catalogs because of all the different things that we need to connect. And the last in that list is is lineage. That's something that we've added recently to the data that world platform because lineage is really just a graph type of connection of a various tables that were derived from each other and there's upstream and downstream. And so that's an example the visual example that we have in the tool, because that's also part of the knowledge graph. So anyway, come check us out. We are at data dot world. We're extensible. We're simple. We're automated terms of getting things up and running. We were great for data discovery, agile data governance data ops, and actually a few other things as well cloud cloud migration, for example, we have some other good content on the data that world site, what blogs, webinars, white papers if you want to learn more about the topic. We have a great podcast that's really popular as well. It's called catalog and cocktails, I recommend going to YouTube and just starting at the first season, look at stuff that that interests you. The lower left there has the kind of topics that we talk about on that, on that webcast. And, and so if any of those look interesting to you go check that out. We're on the fourth season now. So people are really liking it. So you can see recorded demos of data dot world there at the data dot world website slash resources slash demos. We've got customer testimonials. If you do get interested in the real technical part of things like I do we have a nice sparkle tutorial as well that really teaches you a little bit of the guts of knowledge graph. And if you were to just Google data that world sparkle tutorial that would get you there. So I want to thank you, and I will now pass it back over to Shannon. Thank you so much for kicking us off and thanks to data dot world for sponsoring and helping to make these webinars happen. If you have questions for Trent, feel free to submit the questions in the Q&A section or of your screen as he'll be joining us in the Q&A at the end of the webinar. Now let me introduce you the speaker for the series William McKnight William has advised many of the world's best known organizations, his strategies for information management plan for leading companies in numerous industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database data lake streaming and data integration products. And with that, I will give the floor to William to get his presentation started. Hello, and welcome. You're muted William. Sorry about that. Oh my. Okay, thank you for everything. And thank you Trent for that excellent presentation. Thank you data dot world for the sponsorship. Today we are here to talk about graph database use cases and as Trent was saying there's compelling reasons why you might want to think about graph databases for not only some of the workloads you have today but some of the workloads that you may not be using today because you don't feel like you have the right technology in place. At least that's where a lot of my clients have been and and still are. And so we are we are seeing that graph databases are becoming an accepted part of the standard stack within the enterprise, but it does take some time it takes some knowledge. It takes a great fit to a workload. And that's what I'm here to help you with today is focusing on the enterprise. What are some of the workloads out there that people are doing with graph databases to success. And maybe that will stimulate some thinking on your part of where you can take your graph database journey, whether you started it or not. And first, I wanted to mention that we will be continuing this webinar series with diversity through 2023. These are my topics for next year. And as you can see, I hope they are pretty relevant to the modern data architect the modern data analyst, the director the VP, the CIO CTO, etc. All things that I come across in my practice, I've tried to time them out to where we have some projects going on where they may be hot and heavy at that moment. And I may have some things to say and share from from industry. So I look forward to seeing you all back here in 2023. I know it's only September. We shouldn't be talking about that. But I'm just excited because that'll be my fourth year doing this. So now moving on. Trent really expounded on this. I only have a slider to talk about this idea that relational databases cannot handle data relationships. Well, and Trent talked about how relational databases may not be appropriate for some things. And I'm focusing on data relationships because that is the essence of a graph database. It handles great relationship management. So we're going to see that as we go along here. But first of all, let me if Trent hasn't already shot this shot this dead dog, I will go ahead and bury it. You cannot model store data and relationships without complexity, performance degrades with the number and levels of relationships and the database size with relational databases that is query complexity grows with the need for joins and adding new types of data and relationships requires schema redesign increasing the time to market. So this also gives me an opportunity to talk about, well, what about the graph add ons, as I call them that a lot of relational database databases have added. And for one, I think that adding to the relational databases, making them multimodal, for example, another topic we can talk about. There's a lot of things that relational databases are doing and incorporating. And certainly graph databases are graph capabilities, I should say graph algorithms, etc, is one big area that they are focusing on. However, so our graph databases, and I have, I feel like the graph, the regular databases are really not keeping up with the add ons that they're doing with graphs. So I'm going to focus on graph databases today, and in my practice for the foreseeable future. However, if you have something on a small scale or if you will just want to try out some graph algorithms, if they're available, I do encourage you to try out what the capabilities are of your enterprise database. Okay, use the right database for the right job. This is basically a theme of advanced analytics, right? Use the right database for the right job. Over here on the graph side, the side we're focusing on today, it's connected data. And that can mean a lot of things. We'll talk about it. It's focused on data relationships and then discrete data or minimally connected data, such as some of what Trent showed you was good for other databases, might be good for an other NoSQL database. In other words, a column store or key value store or document store. And then, of course, relational databases are the catch all for everything. And, and truly, they make sense for a good 80% of your data, 80% of your workloads, etc. That is, if you include the data lake component to that, they do make sense for a lot. But graph databases are designed for data relationships. And it's not just about the relationship exploration, it's also performance. Performance is superior in graph for these, for these connected database queries. So in one experiment, you take 1000 people, they average 50 friends per person. So think about LinkedIn or Twitter or something like that. You can do a path exists and limit that to a depth of four. And in a relational database, it'll end up taking something on the order of 2000 milliseconds. And in a graph database, only a couple milliseconds. And we are still definitely in the era where performance matters. And it matters to the degree to it will performance will come out just such a wide degree depending upon the style of data store that you have that it makes sense to have a bunch of them in any enterprise. So definitely graph databases fit in there somewhere near the top. Yes, there are these development benefits and deployment benefits. I just mentioned performance. So let's talk about some some of the basics, maybe drill down a tad on what Trent was saying. Before we get to the use cases, we got to know what is a vertical because this word comes up time and time and time again in graph database terminology. And so you just got to know. And so vertices are basically your major nouns. These are things. These are tangible items of your enterprise, be it people, places, or things. And there I list out a bunch of people places and things that hopefully you resonate with one or more of them for your particular set of workloads. But what separates them for the purpose of graph analysis is the interaction of relationships. These things aren't isolated in your enterprise. They are related to other things, other nouns, other vertices of the enterprise. So in other words, the edges that connect these vertices must have a push pull effect. That is a change in one vertex tends to have an influence on adjacent vertices. A person with a virus passes it to another person. A fraudulent bank account passes money to other fraud accounts. An action in one part of the network affects adjacent parts, not unlike the depiction of gravity, if you will, in physics videos. So before we race out and put all our downs into a graph database, just keep in mind that it's going to be the ones that you want to do some graph analytics on. So you want to ask a fundamental question about how does action in a vertex affect another to make sure that it's a great vertex for your enterprise. A fundamental point here is that in a network setting you should evaluate your actions, not in isolation, but with the expectation that the world will react to what you do, the cause effect of the enterprise. So the other thing you have to know about is edges. And if you get vertices and edges, you're a long way into the terminology of graph databases. Edges are simply what connects those people, places, and things of your enterprise. And so I've listed out here some things, relationships, people have relationships, people have family relationships, business relationships, reference relationships, proximity relationships, et cetera, that you might want to start keeping track of. And a graph database would be the way to go for that in all of these types of relationships. So between the vertices and the edges, that's what makes up the fundamental part of a graph database. Now, Trent showed you a triple. And I'm going to show you a couple of different ways to go about that. That's sort of the basic record layout of a graph database. I'll come back on that in a moment, but model actions depends on what you want as vertices, because in this case you could have Bill sent this email to Jim, you could lay it out just like that. And so we have either Bill and Jim are the vertices, and email is a vertices as well in the first example. And sent would be the edge, because that's the action. That's the connection between the bill and the email. Notice bill and email are not the same. They're not two different people. That's okay. That happens all the time. So you can either do it that way, or the simpler way to do it is just have email be the edge. So Bill emailed Jim. Same thing, just two different ways to go about skinning that cat. And usually we go about doing things in the second way. By the way, I failed to mention that I find graph databases and this kind of modeling a lot of fun of all the data platforms out there. If I can get my hands on graph database, graph database application, I am in hog heaven because I know that that's going to be a lot of fun because this part of it is creative. And it's very much something that you can learn a lot in doing. You learn a lot about the business and the business is going to learn a lot from this type of model. And also I'll mention that we've recently done a paper where we presented the first demonstration of a massive knowledge graph that consists of materialized and virtual graphs that span multiple hybrid clouds. We showed that it's possible to have a one trillion edge knowledge graph with sub-second query times. So if you're concerned about scale, there's been a few of these. We were one of them, but there's been a few of trillion edges in a graph that have been demonstrated out there. And I'm happy to share with you more information on that. These things definitely scale, speaking of scale. First of all, I'm going to show you one of the two types of models for a graph database. So if you're waiting into graph databases, you've got to choose between a property graph or a semantic graph, also known as some other things, which I'll get to on the next slide. But first of all, property graph, here we see a domain model for a property graph. And this is Northwinds, so you should be somewhat familiar with all these things here. So in this property graph, you have entities, which are the vertices, also called nodes, and you have the links between them, which are called relationships, also known as edges, as I talked about before. Nodes and relationships, or vertices and edges, can also contain properties and attributes as you see in my domain model. Right? Okay, so that is different from the semantic graph, which I'll show you in a minute. And as a developer, this is the big difference. Okay, so you can put all these attributes here. It's a matter of convenience. And I'll show you how we do it in the other style in just a minute. But before we leave here, I'll say that I'm not going to do a competitive tear down for you here, but Neo4j and Tiger Graph are very much in the property graph area, and others are in the other one. And really, you can get there from here for the most part, for most graph workloads within the enterprise, including all the ones I think that I'm going to share with you today in terms of use cases. I think you can get there from here in terms of being a property graph or a semantic graph. Speaking of semantic graph, let's look at that. So, John knows Frank. Great. I would really love to say some more about that nose. I would like to say some more about it. I would like to say, well, how does he know? What's the provenance of that relationship? And how confident am I in this relationship? And so what we do there is we create additional triples, and the subject of those additional triples is the relationship. So, yeah. Or you could say the triple itself. So, here's a few basic triple stores that aren't related to the diagram. Bob is 35. Bob knows Fred. William likes running. On and on and on. And those are your records in a semantic or RDF or knowledge graph. And let me come back on that in a moment. Let me draw this out a little bit more. It's in the image here you have the subject John R. Peterson. The predicate is, he knows, what does he know? He knows the object. The object is Frank T. Smith. So, in a triple, it's subject, predicate, and object. And the subject of the second triple that makes up this diagram is the first triple. So, the first triple is the subject, the predicate is confidence, the percent, confidence percent, and the object is 70. And we have one more along those lines. So, hopefully that makes a little bit of sense, but that is the basic difference in a development anyway between the two. I would say that these types of graphs, they are more numerous. Probably a good 75% or more of graph databases, so-called graph databases out there, subscribe to this. This is actually based on a published paper. So, that sort of facilitates the development of companies around it, and certainly it has in this case. Now, let me talk about knowledge graph, that term. We see it thrown around a lot today, and it's confusing. There are a couple of different places that people could be coming from when they're talking about a knowledge graph. The first is, it's simply a semantic or an RDF, knowledge graph. That's a knowledge graph, and that's what I'm going to use. But also, there are graphs that might be property graphs that provide knowledge to the enterprise. And some people call those knowledge graphs because they're full of knowledge. And certainly semantic graphs, RDF graphs, can be full of knowledge as well. So, anyway, I always try to help you with, once you leave the webinar and you go back out there in the real world, people are going to be talking about these things. I always try to help you understand that there's no one-size-fits-all necessarily with a lot of the terminology in our industry. And I want to help you understand and navigate that and continue to learn, continue to learn, not be handcuffed by, William gave me a definition and I can't think of anything else. I'm trying to fit what you just said into that and it's not working and I don't want that. So, I'm giving you both. The other strong aspect of graph databases beyond the modeling, the performance and all that is the visualization. Now, I'm not a visualization expert. My data analysts certainly enjoy this and I mean business analysts. I mean analysts at our enterprises that are using the graph databases that are built and they can do so much with this. I'm not going to focus on it today, but they can, you know, zoom, drill in and get all kinds. As you can see here, by default, this graph has color-coded different sections of the graph, which would be impossible with a relational database. And so these are just wonderful features for an analyst to learn more about the business, learn more about the relationships, the paths and so on. And I'll leave you with that. Now, let's talk a little bit about the algorithms. Now, I'm not going to draw out all the algorithms here today. That would be a full day or two of training. But you need to know something about the basic algorithms and the reason why I always want you to know more about algorithms when it comes to graph, is because it helps to demonstrate the capabilities around showing you the relative importance of the vertices and edges in the model. And that is a strong but often overlooked value of a graph database. So let's start with PageRank. And I'll go quick because I feel like by now probably most of you have already seen something like this, or have learned about PageRank in one way, shape or form. So I'll go quick, but I got to lay it out. Okay, you got these four web pages. And by the way, this is how, I don't want to say I know how Google does it, but this is kind of the foundation of how Google does their relative importance searching. And so when it determines what's going to show up near the top when you search for keyword, it does something like this for us. And so PageA references PageB. This could be, I don't know, pick a website, IBM.com references McKnightCG.com. Yay for McKnightCG.com, right? I might reference PageC if I'm PageB. I might reference PageC, which is, you name it, oracle.com. Oracle.com might be also referenced by DataVersity.net, okay, PageD, okay. And at the same time, PageA, which I said was IBM, also references Oracle. Oracle also references IBM, et cetera, et cetera, et cetera. We know the web is really confusing out there in that way. However, if we can model it in graph database sort of automatically, we get a lot of value out of that. But the question becomes, I do a Google search or, let's say I do something that's bringing it to the enterprise. I try to find out which products are more important out there, okay. And I'm not necessarily doing website reference here, but I'm doing something else. To try to figure out which page is, which product, which page, which person, which et cetera, VertiC is most important in my network. All right, so I'm going to do something here. I'm going to take 85%. You might say each page is given a dollar when they start out and they can start spending that dollar. However they like, but that's all they got to spend. So page A references a couple of pages. So they're going to, they're going to give whatever it is, 40, 42 cents, 42 and a half cents to each page B and page C. Divide by two because they have two outgoing links, right. See, this is the mechanism that a Google would use to make sure that we're not out here creating a bunch of dummy websites that have no reference from any other website. And we're pointing that at our website to try to get credibility for the website. Yes, that's how people did SEO in the early days, but that's sort of been stamped out by now because you've got to have credibility around those pages because page rank will smoke that out. So let's keep going. After the first round of results, if we do all the math that's inherent within this diagram, page A has a total of 100. Oh, by the way, everybody gets 15 cents or 15 points, if you will, 0.15. So that just comes by default because they figure, well, you know, in a website situation, some people are going to directly enter your URL and come here, not come here by reference. Anyway, going on and on and on through about 20 iterations, you find out at the end of the day that page C was the most important page, right? Because we keep iteratively finding out, well, that page referenced me, but how important is that page? Well, that page is important because other pages referenced it. Well, how important are those pages that referenced it? And, you know, you can kind of go crazy without a graph database. And so we find out that page C is the most important, 1.577. And it's the most important one because it has all these other pages out there referencing it, A, B, and D. And they're not all dummy pages like, well, page D looks like it's not very important, but page A is important and page B is somewhat important and they reference page C. Okay, so that's a great example of a graph algorithm. Always my go-to example because it just brings out so much. And the rest of these are kind of plays on that. Where we're trying to figure out what's most important here. Betweenness is another one where we find bridges across different, what they might classify as within the graph database as a community, a cluster, if you will, of vertices. So if you get a high score, that means the edge links different communities. So between the communities that you kind of see here are the orange dots. And those are the connection point, great connection points between the different so-called communities. So you think about Google Maps. This is how Google Maps does it. They find the distance between two locations and all the alternate routes. And then they figure out, okay, where are the traffic jams? And that's going to make the edge a lot thicker because there's a lot more going on there and we want to avoid that. We want to go down nice thin edges to get to where we're going. So betweenness is the way to do that. It's a centrality measure, so-called centrality measure of a vertex within a graph. Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. Hopefully that makes sense. So you can get there from here. Closeness is similar. Okay, it's the shortest path between any two vertices. So what is the great connecting point between me and you on, let's say, Twitter? We're all connected. If you're on Twitter, we're connected somehow. Maybe we're directly connected. Maybe we're connected through somebody else who's connected to me, through somebody else, et cetera, et cetera. Some of us have heard of the game, the parlor game, called Six Degrees of Kevin Bacon. This is that played out in the enterprise, right? So it's where players try to find the shortest set of connections from any actor to Kevin Bacon based upon the movies that actors have starred in together, and you can throw in directed by and blah, blah, blah. So an actor who is co-started with Kevin Bacon is considered to be one degree away while an actor who is co-started with actor X is two degrees away, and so on, and the same can be applied to any social network, et cetera. So we find out how close people are, how close any two vertices are. How about products when we want to pitch the next best product? When Amazon wants to pitch us the next best product or eBay based upon our purchase pattern. Well, it's not just our purchase pattern, right? It's everybody's purchase pattern to date. So what products are closely connected through sales? Pitch us that one. Eigencentrality measures the importance of a vertex by the importance of its neighbors. Hopefully you see some themes here, different ways to skin the cat in terms of finding out who the important vertices are that we need to action upon. Like let's say we need to get a policy out or get some PR out. Who do we go to? Which network news or which news outlet do we go to? Which one will get the word out the best? Well, it's probably the one that is connected to the ones that are connected to blah, blah, blah, blah, blah. You know, the most important and the most numerous vertices out there. Hopefully that makes sense. One more. One more algorithm. How about cascading churn? Now, what I've shown you here before I jump into that is something I want to point out. Look at that query on the slide. Now that is a query from a relational database that we work with that can simulate graph algorithms in a relational database. And how they do that is by you use the term as edges and as vertices. So you say, OK, that table right there in the relational database, that's going to be my edges. In this case, it's the calls table and that table over there is going to be going to be the caller from table. Those are going to be my vertices, et cetera, et cetera. So you see how that all plays out. Now, obviously you get the advantage of graph algorithm there, but you don't get the advantage of everything being stored as a triple store, which highly facilitates these graph algorithms. So the performance for one thing is going to be pretty slow and something like that. Anyway, if that's what you need and that's all you got right now, then you can check your relational database to see if it has anything for you like that. Now back to cascading churn. If two people churn, what is the likelihood others will? Who are they connected to that might churn? The two churners affect the central influencer because they're directly related. This is something that telcos are very interested in and they go by who people call and who people text with and so on, anything they can get their hands on, to know who that they are, who people are connected to, which other people they're connected to. So they can effectively take care of those that are close by someone that either looks like they're going to churn or has churned. But finally, what may happen here, which is the worst case scenario is all the contacts churn because the churn just sort of churns all the way through the network. Now an individual focus model will underestimate churn by 6x. So that's been shown. So what you want to do algorithms like this to figure out your churn. So great questions for graph databases. And when I show you the use cases, you'll see these are some of the questions that are asked. And a lot of it has to do with networks, workloads, network workloads, hierarchies, trees, ancestries, structures like that. So this is how you identify a graph workload. If you're talking about a casual and you use any of these flash words as I call them, then you ought to think about a graph database for it. Yes, I know you're going to have to duplicate some of the data out of a relational database is probably worth it. You are planning to use a relational performance tricks. Okay, the self referencing tables and so on like that. If you're trying to do something pretty complicated to get your relationship information out of a relational database, you might want to consider a graph database. Your queries will be about pathing and you are limiting queries by their complexity. It's back to what I was saying at the outset is maybe you're not doing some things that you should be doing and discovering relationships because you don't have the capabilities. So let's look at some use cases for graph. There's so many and I'm going to bring in just one from each major industry. Hopefully I hit your industry in one way shape or form here today and you can find your way into the graph database world. Healthcare. Healthcare. Fraud is huge. Fraud is huge in healthcare. So what they do is monitor drugs and treatments and patients, doctors, pharmacies, medications and so on and find out what are the outliers. What the edges are super thick compared to all the other edges? What edges would be considered excessive? So is a doctor over prescribing a certain medication? Is a patient seemingly being over subscribed to a certain medication? Is a facility over prescribing a certain medication? And these are things that the network, be it the insurance company or whomever, the hospital, can intervene on. Once they know. And you just can't really get there from here when it comes to a relational database. I just say it's really hard. And I've done this particular use case both with and without over the years, you know, graph databases. So I can attest that it's a lot easier and a lot more informative to do it with a graph database. So yeah, some of us can relate to some of some of this. It's a very relatable thing in terms of healthcare, healthcare, you know, fraud with prescribers, medications, doctors and so on. Online shopping, online shop. A lot of us are doing online shopping or at least that's one aspect of our business. So what online shopping needs to do is they need to act fast always. You're online. You're online. You need to be pitched correctly and with the right product. You need to recall or the company needs to recall past similar interactions. And they need to look at the breadth and the girth of the relationships and have probabilistic models based upon that that draw out from the product catalog and the shopper attributes. And so what other things might we need to know from a shopper in order to understand where to drive them in our product catalog and where to drive them into our promotions, etc. So when a shopper searches for, let's say, red purses, for example, the app needs to know what details to ask about next, such as type, style, brand, budget or size. And these are nodes in a property graph. That's it. And it accumulates this information by traversing through the graph. The application is continuously checking inventory for the best match. That's also another factor that comes into what you're going to pitch. What do you have available and what's going to get there in the timeframe that this customer will expect. Great stuff. Now here's the major insurer. They needed insight into their risk environment. Now there's a lot of risk in insurance, as you might imagine. It's all about risk, right? So some of the big risks that they're looking at are people appearing in multiple policies and claims, submitting multiple claims for the same thing or having multiple policies to submit on for some of the same things. So people with those really thick relationships between their claim and their policy, that's an outlier that you can look into more. Premium leakage, i.e. underestimated mileage, undeclared drivers, false garaging. That's where you say the car is parked in that good neighborhood over there, but really is parked out front here in a not so good neighborhood, trying to get a better rate. Okay. So there's a lot of factors that you can bring to play here in a graph database to understand what's normal. What's some of the normal attributes for a car in this area, for an insurance policy in this area? And padded claims, what's normal for a claim of this nature? And that's evolving, believe it or not. And so padded claims are clearly a huge risk to an insurance company. They can save millions by taking one point, one percentage point off of any of these major risks, and they're doing so largely with graph databases and third-party data. By the way, you might be thinking, well, how do they get a lot of that data? You know, they only know so much. That's true. But there's a lot of third-party data that they're bringing to bear and getting in their graph databases. They have a policyholder graph with all the risk indicators on there. And the risk indicators are spread around in the graph. Another major risk is workers' compensation fraud along the same lines as padded claims. What's normal for a risk of this nature? And let's bring not one attribute to bear on it, but let's bring 100 attributes to bear on this claim to see what it should be. Now we have television media. Television magazine and media, so a modern big media company. So they need to obviously pitch the right stuff. They need to schedule the right stuff. It's going to draw people in. They are increasingly having opportunity to be much more personalized in their approaches, and that's only going to drive much more of this personalization through graph databases. Analyze the content and the consumption for personalizations. I'm keeping in mind that online at least most users don't log in. If they do, they log in with a cookie, but cookies have been proven to be quite unstable, not dependable. So what they will do is they will take a cookie out to a third party to enrich it with other information like name and so on, contact information, etc. And this whole thing, they need to continually vet the third-party data. They can't just take it and go with it. They need to continually vet that. And graph databases are great for that. They're great for determining the valuable connected providers in audience segments, which providers and which audience segments are driving the most eyeballs, the most viewership, the most revenue, enabled evaluation of the accuracy of vendor data. So vendors will provide data on their supposed viewing. I had a great client, Nielsen Media Research. So I got right in the center of a lot of this kind of activity because they're still on top in terms of informing providers what the viewership was, right? So yeah, these companies want to cut the cost of using unreliable data. So they're double-checking on Nielsen. They're double-checking on all the data that they get in that kind of way. So this particular company began leveraging a community detection algorithm called Weekly Connected Components, or WCC. And they used that to find subgraphs within their multi-billion node dataset that can be attributed to distinct profiles. And then they used the more accurate profiles to create audience segments, which is the holy grail of any media property's advertising business. So a Weekly Connected Component is a subgraph of the original graph where all vertices are connected to each other by some path. Okay, I hope that made a little bit of sense. That's another of the 100 algorithms that I didn't show you within a graph database that is particularly relevant to this industry. Now let's look at cybersecurity. Now this is probably my only example of a vendor company and how they use graph databases to do what they do for their customers. Of course, Data.World does this with their data catalog. And I think that is great. Hopefully you can see the power of a graph database. It's not going to be easy for some to change over from a relational database foundation to a graph database foundation. So I think Data.World is absolutely onto something there and I look forward to seeing more about them and learning more. But anyway, in the cybersecurity world, the services they provide are threat intelligence, in-point protection, and disaster recovery, etc. And they use third parties and heck, these companies are third parties. The cybersecurity company in this case is a third party and their IP is really understanding the web. They don't do it alone. They all kind of partner and share information and so on and it's kind of like that. So cybersecurity companies need to understand the web. They need to understand what websites are starting to alert because of cyber threats, what people are behind those websites, what people are connected to people behind those websites, where are the troublesome individuals, where are the troublesome websites, who works for who, upon what devices and what IP groupings are some of the bad activity happening, and they need to protect their clients from that activity. And it's a judgment call. You don't want to protect too much. You don't want to over protect. You don't want to under protect. So they need information and graph databases are how they get the information that they use to do their job. Automotive. Wow. We're all doing predictive maintenance out there. If we have any kind of manufacturing, automotive is no different. They need to identify which robotic parts we're about to fail so they can replace the failing parts all at once. All at once. So what parts are connected? So it's a virtual model of a physical car. And you can do this to anything mechanical. So the radiator, for example, in your car dissipates heat that builds up in the cooling system, right? As coolant runs through the radiator, the walls of the internal passageways start to develop a thick layer of residue and debris running through the cooling system may also cause a blockage. When this happens, the radiator's cooling ability drastically decreases. And when one component fails to work properly, other parts throughout the cooling system also run that risk of failure. So the parts that commonly cease working after the radiator goes bad are the thermostat, the water pump, and the heater core. And that's the extent of my automotive knowledge. But anyway, there's probably a hundred other connections like that under the hood of my car and your car. And just imagine taking this up to an airplane level or a ship level or something like that. So this is all kind of known, you know, what I just said about the radiator is kind of known, right? But with the graph, you can identify more subtle connections. And you can identify similar parts as well throughout the car that may or may not be connected, but they're similar. Maybe they need to be replaced when you're replacing something that's related. One more for you, actually two more. Pharmaceutical research. This is a lot of fun because we're talking major big data sets here. We're talking DNA now. And DNA contains all the genetic information for an organism to develop, function, and reproduce. It encodes this information as a specific sequence of nucleotide bases. So scientists have found the first genetic instructions hardwired into human DNA that are linked to being left-handed. And everything else that we are is eventually coming out of the wash in terms of all the DNA research that is going on. So, but there's so many cells inside DNA. So I've worked in pharmaceutical research companies where scientists will absolutely completely focus on a few, a very small subset of the overall DNA. And that will be their career. Okay, but you need to put those pieces together across the across the organization. So sharing that knowledge, having a common place, a graph database where all the information can go and be shared inside the organization really facilitates research and accelerates progress. So graph allowed bioinformaticians to more easily identify useful signals within large sets of noisy data and to answer highly specific questions. And this is going to really help us all when it comes to all these diseases that are based in what this can solve, like cystic fibrosis, etc., etc., so much. And finally, I'll give you a financial services example, anti-money laundering. That's huge. And it's been, I won't say it's been stamped out, but it's been suppressed a bit anyway due to graph databases. But those graph databases will have to stay in play because this activity will absolutely continue. So we're talking about looking at companies with unknown owners, trading partners that exist in high-risk geographies, customers that are associated with a high-risk entity and creditors in a high-risk geography, etc., etc. How do you know those things? You know those things through connections. You know those things through using some of the algorithms that I showed you before. So anti-money laundering, it's all about identifying connections. And looking at things like our two activities targeting the same bank account, happening at the same time across the world, and things like that too, of course. But I want to summarize here and get Trent back on and see if you have any questions about all of this. My summary is that graph is a fast-growing data category. It's all about the use case, and these are some of the ones that are good for graph. There's one out there in your industry. There's one out there in your industry. I didn't even get to things like real-time fraud detection, care path recommendations, personalized offers, energy infrastructure optimization. Another huge area for graph databases. Think about all the connected components in our energy grids. So reimagine your data as a graph. The whiteboard model is the physical model. You can go to a whiteboard, do that category graph, and then you can drill in and fill it out with all of the vertices and edges of your enterprise. And remember pay-trank if you forget all the other algorithms. Remember pay-trank because that will help you remember that graph databases are good for prioritizing the components of your business, the vertices of your business. And back to you Shannon for any Q&A. I am. Thank you so much for another great presentation. I can try and fit as many questions as I can here in the next few minutes and great questions coming in. How are you differentiating knowledge graph and graph database? Yeah, that's the big question, right? So I tried to do that on the slide that I was talking about semantic graphs and RDF graphs. I am distinguishing that by saying that semantic and RDF graphs are knowledge graphs. Because that's how a lot of people out there, I would say probably most people are referring to knowledge graph. That's what they mean. They don't mean a property graph, they mean a semantic graph. However, keep your head on a swivel on that one because it's quite possible that somebody could refer to a property graph as a knowledge graph because it generates knowledge. And that's okay too, I'm just saying where I come from. No, I think that covers it. Awesome. So how to deal with circular dependencies loops? Are they okay or not? Do they have performance penalties? My quick take on that is that I think we even saw some circular references within the examples that I gave and maybe the truncate. And that's perfectly fine. And graph databases are a great way to manage that and understand that kind of reference. So perfectly normal performance hits. I mean, it is what it is, right? The data connections, they are defined by your business. And if that's how your business is, that's what you need. And then the only question really becomes, well, how is what I need to do to that data, which represents the business? What is the best way to go about doing that? And I think if you do that, you find out that it's graph databases. Alright, and I'll let you jump in whenever you want to. Assuming graph database is not good for OLTP transactions, will it be wise to say relational database is a must for OLTP transactions and then another graph database built on top of that gets refreshed daily to do like fraud analysis, etc. I mean, yeah, I think that's a good way to think about it. Although I will caveat that OLTP, you should also think about no SQL databases that are non-graph for your OLTP because they make a lot of sense for a lot of OLTP. So I'm talking about document stores, key value stores, wide column stores. It just depends on your workload. I've actually probably given a presentation or two in this series on that. So you can use those for your OLTP. But I mean, I mean, the reality is mostly it's relational databases today moving in a moving in that direction, but it's mostly relational databases. So yeah, and the graph database would would be cultivated from your OLTP relational database. I don't know of any instance where a graph database is the only place in the enterprise where a piece of data is kept, although I could be wrong about that. And by the way, by the way, quickly, graph databases do perform somewhat of a real time function. So some would call that OLTP as well. So I would say they kind of sit in the gray area between transactions and analytics if I were placing them somewhere in there. I'm going to show you if I can slip in one more question in this one minute we've got left. What is an RDF based graph versus a property graph. And I believe this came in trend during your presentation. Yeah, RDF is what resource definition format framework resource definition framework. And that is the standard defined by that the W3 uses W3C is defined for a triple store for the way of modeling data in terms of in terms of triples is that's that's really what an RDF is. And then there's different serializations as they call them there's different ways that that exact can exactly be constructed syntactically but it really is just that idea of these triples that you've been seeing. Yeah, and I'll just add on that there are reasons for which we don't have time to go into but you cannot you cannot have attributes on relationship or on the vertices themselves and so that's why we did the what may look odd at first thing that we did on that one particular slide of the line where we had the relationship itself had had had relationships out to other vertices. So that's what that's how you have to model in the RDF world. So William and Trent thank you so much for these great presentations but I'm afraid that is all the time that we have slated for this webinar. Just a reminder to all attendees I will send a follow up email by entity Monday to everybody with links to the slides and links to the recording from today's session and thanks to data.world for sponsoring today's webinar helping make these happen Trent mentioned you know the podcast it's a great podcast check it out. I hope you all do it's it's it's very entertaining and very educational. So thanks everybody I hope you all have a great day. Thanks guys.