 And here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVercity. We'd like to thank you for joining the latest installment of the Monthly DataVercity Webinar Series, Advanced Analytics with William McKnight. Sponsored today by Cambridge Semantics and TigerGraph. William will be discussing graph databases on the edge today. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A on the right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag ADVAnalytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen for that feature. And if you'd like to continue the conversation after the webinar, you can follow William and each other at community.dativersity.net. As always, we will send a follow-up email within two business days, containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now, let me turn it over to Steve for a brief word from our sponsor, Cambridge Semantics. Steve, hello and welcome. Well, hello, Shannon. Thank you very much for inviting me. Let me just get my slides up and running here for you. Thanks very much to William United and to Shannon for giving me a couple of minutes here. You know, I work for Cambridge Semantics. We make a graph database that's an RDF triple store. And in the graph database world, there are a couple of different types of graph databases. There are property graphs and RDF triple stores. And so we make an RDF triple store that also supports property graphs. And it supports property graphs under this new standard, new proposed standard from the W3C. So really that's my commercial for today. But one of the use cases for using a graph or an RDF triple store with property graph is that you can create a knowledge graph. And knowledge graphs are a really great thing. Not all graph databases are really good at creating a knowledge graphs. So let's just talk a little bit about this use case. You know, a knowledge graph is really put together when you want to integrate data from multiple data sets, both structured and unstructured data and leverage, you know, ontologies and standards in order to put that data together. And that data may come from text or unstructured data. It may come from standard Oracle database or an object store like in S3 bucket. It may come from Elasticsearch or NLP processing. You want to take all that data you want to put it together and build a knowledge graph with a combined understanding of all of that data. We've been trying to do this for a long time. We've did it with data lakes. We've done it with, you know, data warehouses. But this is a new way to create a knowledge graph and integrate these data sources together. And I think it's a superior way. One of the things that when you're building a knowledge graph, there's some things that you'll need, some functionality that you need, you'll need to leverage diverse data sets. So you want to take the output of an NLP. You want to take structured and unstructured data and put it out and use it as part of your knowledge graph. Data is never perfect. There are nulls. There are this data that, you know, doesn't conform necessarily to standards. So you want to make sure that when you're leveraging that, you can leverage some of those standards and contexts for putting that data together. Of course, you want to perform analytics on it and you want to not only perform graph analytics, but you want to do deep analysis, BI-style analytics. You want to do averages, aggregate functions. Maybe you want to even do some precursors like machine learning and data science. So you want to be able to do a wide variety of analytics and of course you want to be able to scale. Now, in my use case here, I have a very simple graph where we have a customer records data set, Sue and John are in our customer records. Sue is married to John. I'm going to, for my knowledge graph, pull together DMV data. So I know that John drives a Ford and that's information from that data source. And then I may also want to pull together a credit bureau data. So here I see that John has a credit score of 710. So from my knowledge graph, I might want to ask, give me all the information about John. Tell me all the people who have a credit score of 710. Tell me all the people who drive Fords. I want to be able to spin around and look at this data from different angles. Now, if I were to do this in a standard relational database, I have a lot of steps that I would need to go through. I have to design and implement schema, understand which data sources I want to integrate, build business glossaries or functions around that, plan for data lineage, make sure that when I do these joins and do these analytics, that I'm not over-taxing the system. So my metadata management has to be structured that way. Then I go and I can import the data and I have to think about workloads. I have to think about how I handle that relationship data that's in the data. So John is married to Sue. Do I take that and put that in a separate table? Do I throw it away? I want to be able to handle all of that. Then I do the analytics. Usually I do the analytics that he thought about in step one. If I tried to do analytics that are new and that I haven't thought about in the beginning, it may take a really long time. I have to go through this extra step or some tuning with the database. Then if I want to do something like page rank or shortest path, again, that's going to be a lot of machinations, a lot of tough stuff. If someone then comes along and says, hey, add this data source, let's add an additional data source to this, then I might even have to go back and refactor or create an operational data store in order to get that analytics done. Now with graph databases, particularly knowledge graphs created with RDF and all that RDF like we are, you can kind of throw away a lot of those steps. We are storing subject predicate object. We're storing in a very simple data model. In my case, we can store away the fact that John has a credit score of 710 in a subject predicate object style. John is the subject. This concept of having a credit score is the predicate. Then the score of 710 is actually the object. With the addition of properties now, we can also add things like provenance. A property might say, according to trans union or it might say, John has a credit score of 710 on X date. Those are the properties that we can add to this as well. We don't have to really worry about designing schema. We don't have to plan as much for joins or data lineage. Data quality will come as part of the process that we have with using ontologies. I'll talk about that in a sec. Then in terms of importing data, I can import all the data in this triple format. There's no specific tables needed and the processes are re-simplified. When I come to the analytics part and I start asking questions, I can ask anything and turn it on its head and ask it from any angle. I don't have to worry about how the data is optimized as much. Therefore, I don't really have to refactor as much. Then there is this whole concept with RDS stores in particular about using ontologies to provide extra context, classification, and equivalencies. I can leverage some existing ontologies or build my own ontologies and come up with standard nomenclature from things. I can come up for classifications with things and then I can use inferencing as part of the ontologies. Some examples of that here, I can call a cat, a feline, or a cat. I can also know that a cat is a mammal and a carnivore. I can use that in the context of the database when I'm doing analytics. If I want to see all the carnivores in my database, a cat will come up as part of that. That's the classification part of ontologies. Also, I can do this inferencing where if I know that John is married to Sue, I can infer that Sue is also married to John and that will be relevant for my analytics as well. The good part is that dataversely does have a lot of great information on that. You can do a search in dataversely on ontologies. Great information, great articles on there. There was an article just this week from Thomas Frizzendale on data modeling and creating knowledge graphs. Really good timing, Thomas. If you get a chance to take a look at that article, that information is there in dataversely. Just to wrap up, we do provide AntographDB, which is one of the fastest RDF triple stores on the market. It supports data harmonization through the RDF data model. It supports analytics, both graph algorithms and BI-style analytics. It supports ontology, reasoning context through RDFS and OWL. We can help you with building that process. If you want to try it, you can go to ansograph.com and download it or feel free to shoot me an email. We can talk about your project as well. Thank you so much for kicking us off. If you have questions first, go to the Q&A section in the bottom right-hand corner of your screen as he will be joining us in the Q&A at the end of the webinar. Now, let me turn it over to our second sponsor for today to hear Gaurav talk about TigerGraph. Gaurav, thank you so much for joining us. Excellent. Thanks a lot, everybody. I hope you can see my screen clearly. Yep, looks good. Let's start. I'm Gaurav Deshpande, Vice President of Marketing and thank you for this opportunity to get our city and thank you for hundreds of you that have taken time out of your busy schedule to join us today. So first I'm going to talk about quickly about the evolution of databases because one of the most common questions I get when I say we have a graph database and analytics solution, they're like what is this graph database? How is it different from a relational database like Oracle or D2? How is it different from a key database like MongoDB, for example? And this is a simple chart that shows you those differences. The first one, relational database was built in 1970s. So whenever you're trying to understand relationships, essentially what you're doing is each data domain like product, customer, order, supplier, location are also shaped in separate tables and you're doing complex table joins. So as your length of the table grows in terms of number of rows and as you join across multiple of these tables, it becomes computationally very expensive and that's why analytics has been so slow with relational databases, especially deeper analytics. E-value databases came along, came into their own last decade and this decade and solutions like MongoDB are fantastic because they don't require any hard schema. You can store any type of data that you want in there. The problem is everything is stored typically in a large table, a single table with billions of rows. So when you're trying to do analytics, you're essentially scanning the same table multiple times and that basically means that it's slow in terms of deep analytics. When you look at graph databases, all of these business entities, product, customers, payments, order suppliers are pre-connected and therefore it's much, much faster to do the relationship analysis and get new insights off of that. We use graphs every single day. So every time you're using Facebook, LinkedIn, Twitter, you're using a graph database in the back end. So when you search for graph disbanding on LinkedIn and you see that you're connected to be with a second or third level of connection, that second or third level of connection information is coming from a graph database. Every time you're running a search on Google, it's using PageRank, which is a classic graph algorithm. Every time you're shopping on Amazon and Wish.com, the product recommendations that come up to say, hey, you might be interested in these products, that's directly off of a graph database. Now Wish.com is a TigerGraph client and with that, my next chart is that not only you use a graph database every single day, you use TigerGraph every single day. A lot of you over 300 billion customers shop at Wish.com. It's very popular website for casual items at the $50 and if you're shopping at Wish.com, your recommendation engine that's powering that site is TigerGraph. If you go to your local restaurant and two out of those three use Intude's QuickBooks payments which is a payment gateway, the fraud detection for that is with TigerGraph. So every day millions of customers are benefiting off of fraud detection with TigerGraph. If you're using Zillow to look for a home, the popular website that is the recommendation for the homes that come back to you. So they will show you recommendations on the site to say these are the homes that you might like based on your prior browsing and search history, that's TigerGraph. When you get emails from Zillow that says here are the homes that you might also like, that recommendation is tailored to your particular browsing and search, that's TigerGraph. If you watch HBO at home, if you watch Game of Thrones, Westworld, Chernobyl, any of those we are working also in the back end to do entity resolution and recommendation for you. So with that let me take an example from financial services industry for fraud detection one of the most popular graph use cases. So here we have to take any payment provider like Venmo, PayPal, Square, doesn't matter. You have a, typically have a user, a user one creates an account one. They typically do a two factor authentication so they use phone number one as well as email to set up their account and that account will be linked to an American Express card. So far everything is okay. That particular user initiates a payment which is payment one. That payment one is initiated using a device. It's an Apple iPhone 6 device and that's then associated with the phone number and an account. That's going to account to that link to bank of Montreal. When you look at this information there's nothing alarming about this. So when you look at the user for this particular account receiving the money from user one there's still nothing concerning about it. So regular analytics what it will go in the matter of three hops right looking at the payment going from account one to account two and then onto the user for the account since user one, user two are all brand new there is no prior history for either of these users phone numbers or these emails nothing is flat. So if you look at this particular scenario regular analytics says everything is good no fraud here. Now when you drill down deeper and you go deeper from the user two to the phone number that was used for authentication for two factor authentication to the device which was used for that with that particular phone number you go back and look at the history of the device in a graph database like TigerGraph you find essentially that this particular device was used for a fraudulent transaction so essentially now starting with the user going to the phone number going to the device and then looking at prior history of payment with that device you find that in six hops with deep analytics with TigerGraph you've actually found the culprit and this particular transaction is likely to be fraudulent and it stopped in real time so that's what we do for customers from other largest banks in the world in fact four out of five largest banks in the world use TigerGraph for fraud detection that's one of the most popular use cases, recommendations and series and other so that's what I wanted to mention quickly now with respect to performance you know we are about 40 to 33 times faster than other graph databases you can find the benchmark right here I'm not going to delve too much on it mention that the fact that the performance is faster basically means lower cost of ownership for you that's the first part and second part you need a lot less hardware and the second part is you can do things in real time that are possible to do with large data sets that you can't do with other graph databases last but not the least how do you get started with TigerGraph we have a cloud offering it's a public cloud offering best part is you can go to tigergraph.com forward slash cloud and register for free we have a free lifetime tier for non-commercial usage so if you are looking to just learn graph database learn analytics you can literally start in minutes in 5 minutes you can register at tigergraph.com forward slash cloud you can select a starter kit for a use case like fraud detection like recommendation engine like customer 360 like enterprise knowledge graph and you can start to explore the data with our data science workbench which is graph studio you can build literally in hours your proof of concept we provide you with your money as well as the free tier that is free for lifetime you can start to build your POC for free right now at tigergraph.com and there are over actually there that number is old 12 we have over 15 starter kits now with pre-built schema with pre-built set of best practice queries and a sample data set so all of that is included and you can literally instantiate all of that stuff in 5 minutes at tigergraph.com thank you so much and if you have any questions we'll be joining us as well in the Q&A portion of our presentation at the end of the webinar today and thanks to both sponsors for helping make these webinars possible I just want to introduce our speaker for the series William McKnight William is the president of McKnight Consulting Group which focuses on delivering business value and solving business problems utilizing proven streamlined approaches and information management and with that I will give the floor to William to get today's webinar started for the hello and welcome hello and thank you I trust you can see my screen if not please let me know all right so I'm just really excited to be here today and sharing with you this very important topic about graph databases I have been able to take some clients from absolute zero into the graph world and they are over the moon about some of the things that they're able to do with the graph even things that we didn't envision when we got into the graph database initially so my passion is and maybe you've picked up on this over the past year of these events my passion is getting clients into the right architecture the right platform and the right tools to succeed with data and being that as my passion that has led me very distinctly into the graph database world so I'm really excited about what graphs can do I think the idea that everything being in rows and columns is going to eventually in short order be a thing of the past a sequel becomes less important and the technology evolves the technology behind data is going to be to be evolving and graph is going to be highly relevant in that future I think of it like a Venn diagram there's things that databases can do that is relational databases and there's things graph databases can do really well now let me say you can get there from here if you're just in a relational world you can do almost probably well I'll say you can do all the things that I'm going to be talking about here today and all the things you heard from Gerov and all the things that you heard from Steve however you're doing it the hard way and doing it the hard way means you may not even get to any kind of great in result corporations aren't great at doing things that are hard they want to do things that are easy they want to do it the easiest way and I hope to show you some of the things that you can do the easy way with graph databases today and speaking of that Venn diagram if anything the graph side of it is pushing over pretty hard now I'd say into the relational side so let's see if you agree as I go on here and I'm basically double clicking into some of the things that Steve and Gerov talked about today and I think you've come to the right place if you want a total hour immersion in graphs based upon what I heard from them so here are some examples there are many I just want you to know that this is proven technology this is out there has been running in production for many years in many different types of industries but it is centered around a set of use cases and Gerov kind of got into this I think these clients kind of represent some of those use cases so games this is going to know in its games who's connected to who which enables them to do the right thing by their customer to reduce churn crunch base is a purveyor of business connections and business information obviously that environment has gotten nothing but more complicated and they use graph databases to keep track of all of that and then we have various retailers like Walmart and eBay any time you see it you might like because of your pattern just that or the other thing that's probably coming from a graph database where those connections are made much more readily eBay I don't know if you know this or not but every screen they render is unique and they keep track of it performance and now all that goes into the mix when they decide the next screen to produce for you and everyone so that's a lot of information tele nor is a telecom telecoms are actually very big on graph databases and they're able to know obviously who you're calling who you're texting and that sort of thing they know the impact of doing various things to customers in terms of how that's going to ripple through the network they also know based upon patterns where to shore up their network because the network plays right into the graph database concept so some of these are recommendation engines fraud detection Bob talked a lot about this and by the way a lot of fraud detection today it's not individuals anymore working in isolation it's a lot of groups of criminals that are working in real time and trying to fool the system and so graph databases need to be fast to get right on that right away and we've seen fraud go down I think as a result of graph databases and I think the origin of another thing I'll mention is graphic route optimization that's a big one social network analysis and genome and other forms of scientific research as a matter of fact I think the genesis of graph databases came from the scientific community and Steve kind of touched on this when he talked about shared ontologies graphs were built to share ontologies in a format that lend itself to those ontologies now with graph and now we have so many more applications of it so don't take my word forward here some quotes from Gardner recently about the intended or the expected growth in graph databases so I think they're going to grow something like this I think they're going to grow in your shop in every shop that is and I think what's going to make it grow is education as people get more exposed to graph and what they're all about and the algorithms behind it and they learn that number one it's more than the social graph I've talked about the LinkedIn graph I might do that as well two or three times in the next half an hour yeah that's important but that's only the tip of the iceberg in terms of what graph can do and also some people think about graph as that great visualization it's a great way to see data yes it is that's part of the value proposition but there's so much more and the thing I like to share is the algorithm because that helps you to see the relative importance of the actors in your network so this might be people this might be products this might be sites parts all kinds of things can go into a graph database and by the way every graph database is not a filled with homogenous nodes in other words all the nodes look alike okay so let me back up graphs are based on vertices and edges so let's talk about that I want to get the terminology down for you as we go into this a little bit more so what can be vertices let's start with that these are basically the nouns of your organization they're the major nouns of your organization now you don't just want to put all nouns that you can find in your company into a graph database as vertices right but you do want to put the ones in there where a change in one of the vertices has an impact on adjacent and other vertices so we're trying to determine impact and relative importance and passing and things like that so this ends up being a lot of the people your customers employees and so on a lot of B to C customers that is people in companies or the companies themselves different places obviously mapping type of applications lend themselves very well to graph databases and the various things of your business bank accounts all kinds of contracts and so on your products and so on so these vertices so called are connected by edges which are the relationships so I might say relationships I mean edges are like for example this could be a passenger that takes a bunch of airline flights or piece of information that's passed from one person to another person in a social network or computer user or piece of software visiting a sequence web pages by following links one of the one of the examples that really demonstrates this to me is I had the opportunity a few years ago to be part of a scientific company doing graph databases and they had this worldwide reach and they had they work on DNA okay way over my head but the point is there's various sections of the DNA and they wanted to understand who's working on some similar section of the DNA that I am so we can share information and that was captured in a graph database and shared so that the scientists who were working on various areas of the genome could share their research and make it go up exponentially as a result speaking of employees doing things another way to represent some graph capabilities is with the employee graph because now a lot of these systems that manage employees you know manage benefits and so on like that they also keep track of all the connecting points within the employee group so you go to meetings with certain people you call certain people you text certain people you're on emails with certain people so that's a way to understand what the various clusters of people are within an organization so sometimes graph databases and graph database possibilities are built into the products that we're now buying okay here's another example this is an example of the speed of graph databases okay so there's the functionality but there's also the speed and this goes to show you that every node doesn't have to be a heterogeneous because you may enjoy the performance did I say heterogeneous I mean homogenous you may enjoy the performance of the graph database so this represents that so in this experiment we got a thousand people an average of 50 friends per person that sounds like a lot to me but maybe I just need to get out more but let's just say they have 50 friends per person and there's an algorithm for path exists you know limit to a depth of 4 so what we're getting here into is 4 levels of friends we want to report with up to 4 levels of friends for each person in a relational database as you can see takes up to say 2000 milliseconds in a graph database 2 milliseconds and here's the kicker on this even if the number of persons were raised I don't know a thousand fold the performance doesn't change it's still a couple milliseconds to do this and so that's one of the one of the ways that I think that it scales because the performance scales no matter how many nodes you get in the network so you might say well okay that's a little bit more of the example the social graph I know about the social graph I've talked about the social graph the LinkedIn graph and so on but what about what other things can it do so here's an example with heterogeneous notes so this uses healthcare fraud okay so we're monitoring drugs and treatments we got prescribers we got consumers we got some different looking of vertices in this graph and some different edges as a result so because patients are connected to doctors and pharmacies and so on so we're trying to find the excessive relationships here we're trying to find doctors who are over prescribing certain drugs and as you can imagine that's pretty important so graph databases are used to look at relationships look at which relationships are well beyond the norm and those are relationships that the company will want to drill into to get a handle on this situation so there's some examples I'll have more but relations relational databases can't handle data relationships very well now before graph databases came along I've been in this business a long time I've tried to do some of these things right clients demanded it I tried to do a lot of it comes down to self referencing tables self referencing tables which get pretty complicated and which does not perform very well and I do not have an example of it but the SQL behind that can get pretty long and complicated whereas the the corresponding data access layer for graph database would be far less than that so database options are not suited to model or store data as a network of relationships the performance will degrade with the number and levels of relationships that makes it harder to use for real time applications build this out here some of the things I wanted to say about it and this is not flexible as well to add or change relationships in real time so again I think the thing that's going to drive graph databases out there is education because this resonates with a lot of people at a technical level oh yeah I've been trying to do these things but I've been trying to do them with relational databases that are hard and slow maybe there's a better way and actually I think there's a better way with graph databases that you can jump on to pretty quickly pretty quickly you might want to try some of the free versions of our sponsors here Cambridge semantics and Tiger Graph and I'm so proud to have two of the leading graph databases that sponsors for this what we're looking to do is find fit for purpose platforms fit for purpose platforms use the job now nobody is here saying that well you know use graph databases for everything right there's a place for the data warehouse there's a place for cloud storage there's a place for other forms of relational databases in the enterprise but when it comes to connected data I'm going to give you some tips here to help you know where to find the workloads that make sense for graph databases when it's connected data you can't beat the graph database now before I jump into the algorithm I just want to show you this as an example graph visualization because a lot of you may have seen stuff like this but now you're going to know that comes from a graph database and I really enjoy the visualization my analysts enjoy the visualization aspect to graph you can drill into these vertices and learn more you can pinch you can focus in certain areas and where this is really good is when you can you can pull out and see the big picture and then you can say well I want to see more in this area draw a square around it and you're drilling in and it's a beautiful thing now let me talk about the graph algorithms here because this is to me this is what sets graphs apart not the visualization that's great all right it's not necessarily the homogenous nodes LinkedIn kind of thing social graph that's great too but these algorithms can help you determine again the important nodes in your network the important components of your business for a given situation or just in general so the first one I want to talk about is page rank and I think I think Steve talked alluded to this now let me first say it's an art form to describe page rank and I hope I do it justice here but it's easy to get kind of your tongue twisted but I've done it a few times so let me see I will start by saying that this is one of the things that really made Google a success okay and Larry Page is one of the founders of Google recently stepping down but this was his brain out back in the day and it also is a coincidence I guess that the pages are actually web pages so for those two reasons we now are stuck with page rank but it makes sense and this is again what set Google apart because they they said about about doing this you know determining what websites to show you for a search they said about to do it in an automated way this way as opposed to Yahoo that did it the old manual way and it was a very inconsistent way so what Google decided is hey you got a website out there and you got a bunch of websites pointing to your website well your website must be important but it goes well beyond that because it it depends how important those sites are that point to your site to determine the importance I could have a hundred websites pointing to my webpage but if they're not important neither is mine some of you may remember the back in the day when if you ever had a website and you were ever doing any kind of SEO on it right some people said well hey let's just create a bunch of these dummy websites and point to my website and it'll look important to Google and that's the goal here well Google smoked that out with stuff like this and that is no longer going to work for you now quickly page rank each of these pages is given some level of importance they're also given a and I'm going to throw this in there the damping factor and so there is some merit to the fact that somebody might just type in my website McKnightCG.com and come on to my site directly without clicking through on somewhere else and so the magic number for that is was 0.15 it was sort of a magic number for Google back in the day so every website is going to get that now putting that aside and if you didn't follow that that's okay you'll still get the gist of this we're trying to determine which of these web pages is most important so if I'm doing a search Google is trying to determine which websites to show me alright so page A points to page B page B points to page C so in other words page A is well page A points to page C so page A is giving half of its importance to page B and the other half to page C whereas page B and D are giving all of their importance to page C isn't that nice so let's go through an iteration of this and there you see the damping factor is included which is 0.15 so everybody has 85 cents to give out let's think of it that way 85 cents to give out well page C decided to give it all to page A page A decided to give half of that what is that 42.5 cents to page C and the same amount to page B and so on so if we get into the results of this after one iteration we see that page A has its damping factor plus it has 85 cents from page C so it's total is 1 okay kind of boring page C has its damping factor but it also has all of the 85 cents from page D from page B and half of it from page A wow nice nice sum there it looks pretty important well you go on and on you iterate and hopefully the light bulbs are going off on this and after a certain number of iterations it gets to a point where it's not changing anymore we call that convergence and as a result after 20 iterations we get to convergence we find that the most important web page probably no surprise if you were paying attention is page C with a number of 1.58 page A shortly behind because page C increases page A's importance page C being so important now as you can see giving all that to page A makes page A important so if you can get some of the bigger websites pointing to your website good for you that's going to be nothing but good but that's page rank and that's website alright we're not all dealing with that but think about the other vertices of your business and how page rank might apply there and might help determine which of those are most important now let's get into some of the other algorithms some of the other main algorithms so between this this is a centrality measure centrality is a big word in graph database yes there is a little terminology and centrality means that that node or that vertices is central to the network a lot of other paths come through that vertices so it's central it has high centrality we say between this is a centrality measure of a vertex within a graph between the centrality quantifies the number of times a node acts as a bridge along the shortest path and I'll come back to that the shortest path between two other nodes so it measures the importance of a vertex by counting the number of shortest paths that pass through so if this were the LinkedIn graph some of the nodes with high centrality would be well let's not say LinkedIn let's say Twitter alright more relatable so the Oprah Winfrey node would be very important because of all the connections and all the important people that connect to her as well and that sort of so if you're highly popular in that way you're going to have high centrality high between this and so on between this another way to say it is it measures the degree to which a process participant controls information flow or money or disease or whatever it is that you're passing they act as brokers the higher the value the higher the information flow traffic that moves from each node every other node in the network so an example of this might be look at the highway map of the United States cities in the Midwest or in the middle of the country like I'm here in Dallas or something like Chicago or St. Louis they have higher centrality because there's so many shortest paths that combine any of the cities on the east and the west coast that have to pass through them so that's a way to look at centrality the way to look at importance closeness is the shortest path between any two vertices so some of you know the game six degrees of Kevin Bacon so apparently every other actor in Hollywood is somehow connected to Kevin Bacon by no more than six degrees either in movies that he's been in or shared directors or so on and so forth now if we divide one by the average shortest path from an individual to all other individuals in the network then we have calculated their closeness centrality so if you're looking for closeness the closer you're going to be to one the more centrality the more importance that you're going to have in this network so individuals who connect to most others through many intermediaries get closeness scores that are increasingly nearer to zero so this is a way to show how central you are to the flow of information in this network and so therefore because we know about marketing dollars we have to know that we're not wasting half of it but we don't know which half that's old we can't do that anymore we have to know where to focus and stuff like closeness helps us focus and then there's Eigen centrality funny words but it's a measure of the importance of a node in the network as well so it assigns relative scores to all nodes in the network based on the principle that connections to high scoring nodes contribute more to the score of the node in question than equal connections to low scoring nodes so for example here we see the orange node in the middle here it must be pretty important why? only three connections but there are two people that have a lot of connections otherwise so you might say well in the middle there we got the CEO he only talks to the sales managers and they talk to all the members of the sales team so therefore the CEO is most important and then we have stuff like clustering coefficient with cascading churn and so if two people churn what is the likelihood that others will there you see it's not sequel it's one of the graph languages but it's very very similar to sequel you'll catch on to it if you know sequel and I'm trying to show you here how short the query languages to do some pretty complicated things so think of the question if two people churn what is the likelihood others will gone are the days when that's going to take an IT project to be multi months and so on and ultimately not get done because in a graph database that can get done and then we have loopy belief I wanted to share with you I know another funny word there is some terminology here this is where we might compute data into the graph that doesn't exist but probably should so that we can do all the other things that I just talked about so it's a way to get at missing data and imputed into the network loopy belief propagation is one method of filling in missing information so that we can utilize these great networks even if we are missing information so those were some of the main algorithms that I want you to know about as you think about your workloads and think about what might fit in a graph database here are some of the questions if you're asking these questions then you may have a strong caveat for graph database in what order did a specific set of related events happen are there patterns of events in our data that seem to be related by time and so on there I keep thinking I'm about to say you might be a redneck if well your workload might be fit for graph database if any of these questions make sense to you if the workload is identified by network hierarchy tree ancestry structure if that's how you describe a workload I gotta tell you if you're describing it that way to me some warning bells are going off and I'm rubbing my hands together thinking about graph database solving that workload if you're planning to use the relational performance tricks self referencing tables and so on to try to get there the hard way you might think about the easier way here if your queries are going to be about passing what is the path between two of the players in my who are the elements or I want to say vertices but nouns in my business yes you are limiting queries by their complexity you're not doing things that you would like to be doing because it's too complicated that's another sign you are looking for non-obvious patterns in the data so a quick POC with a graph database may impress you to where you'll want to go forward with it now let me talk a little bit about graph modeling now we don't model with well this is how we model we model at the domain level we have to know what some of the vertices are going to be and what some of the edges are going to be so an employee might sell an order order might have a product supplier might supply that product and that product is part of a category and so on and so forth at this level this is the level of modeling that you want to do before you get into your graph database application and so the employee product supplier order and category and so on are going to be the vertices of this and the edges are going to be things like sold things like product ordered things like supplied or things like part of it's a beautiful thing model actions depending upon you what you want as vertices so there's different ways to skin this cat I don't want to get into too great detail here today but if you're thinking well I don't know if it's this or that and therefore I'm confused well it might be might be either one that's perfectly okay so bill might send an email to GM the email might be a vertices or emailed might just be an edge if you don't want to keep track of the email as a vertices both are acceptable it all depends upon you and what you want to be as a vertices what things you want to see importance of then you have a semantic graph that both of our sponsors are way into this right this is the model that they are so you might have John knows Frank how does John know Frank what is the providence of that what is the confidence percent that I can apply to this relationship how certain am I that John knows Frank it may not be a hundred percent so in there may be ten other types of things that you might want to put there as edges so we store this in what's called a triple right okay triple Steve alluded to this subject predicate object so we have a few going on here I'll point them out one subject is John Peterson he knows and that's the predicate and the object is Frank T. Smith so that's actually stored that's how it's stored you know a relational database stores data with all of the columns of the table and graph database stores triple sometimes known as a a quad store which effectively is the thing in case you hear that so here we have defined a triple we call that triple number one so a triple can be a subject so triple number one being the subject here predicate it can be confidence percent and the object of that is 70 and that's just what is stored and so all these triples get stored gets rendered into the graph and gets thrown into the algorithms that you run upon it triple a little bit more on that as I mentioned the price of subject predicate object you can have a lot of fun with this 35 Bob knows Fred William likes running and so on and so forth now if you have any questions for myself or graph or Steve on graph databases feel free to be putting those in now to the q&a which we're getting really close to but here's my conclusion okay graph is a fast growing data category it's all about the use case here's what's good for graph among others but these are some of the big ones real-time recommendations fraud detection network and it operations actually mapping your network identity and access management looking up permission supporting security in that way doing different forms of graph based search maybe based upon some of the algorithms that I shared with you today identifying relative importance and this is the thing that I I keep coming back to and that's what these algorithms do and that's why I spent a little time on them here today so you can see it's maybe more maybe more than what you thought I reimagined your data as a graph it's easy to do you can go to whiteboard you can do that domain model that's it that's your physical model and then the data would naturally fall under that model sort of triple and if you remember nothing else from the algorithm section remember page right remember how Google does it and you can do a little piece of what Google does with these algorithms in your shop and page rank to me is just kind of the quintessential graph database algorithm because it brings out what's important and that allows me to focus as a business and we know we must focus with our limited time and resources and graph is really great for helping us get to that that has been my part of today and now I'll turn it back to Shannon who's going to share your Q&A with us William thank you so much for this great presentation as always and if you have questions for William and for either of our sponsors today feel free to submit them in the bottom right hand corner in the Q&A section of your screen and just to answer the most commonly asked questions just a reminder I will send a follow up email by end of day Monday for this webinar with links to the slides and links to the recording sessions. So let's dive in here will graph databases be a suitable data source for data science since they usually prefer flat tables and William let's kick it off with you and then we'll open it up to everybody. I'm going to say that science is a broad category so let's break that down but as I did mention in my presentation a lot of the initial application of graph database was in the community and it continues to be true and the scientific community continues to be leaders in the use of graph database but in terms of data science in terms of drilling in you saw some of the algorithms here today they get way more complicated than what I showed you here and way more nuanced and that to me is really representative of data science right trying to understand things in the network that are hard to understand otherwise I would say yes I see it being I'm not sure where the question was coming from if it's you know science domain or data science but yeah I see it as being part of both of those the science domain and part of data science a really strong part of it too and I welcome Steve Grubb to add to that Yeah if it's okay I'll add a couple of client examples from Tiger Graph we have one of the Fortune 10 healthcare providers actually using in the data science team within that using Tiger Graph to map out customers wellness journeys and understand where people have gaps in those journeys when people are going off of what is recommended in terms of medications in terms of physiotherapy after a surgery and taking proactive actions to make sure that the members stay on the path and they don't have to have a resurgence for example so we are also being used by pharmaceutical industry for improving on the clinical side targeting drugs better because ultimately it's a network or connected set of entities in genomics and also on the commercial side to make sure that the right drug is delivered to the right audience who needs it so those are some of the examples of scientific community active using in pharmaceutical as well as healthcare graph databases yeah and William I'll just say that you know I think you nailed it with really focusing on the algorithms one of the things that is really required with data science is that flexibility around the algorithms using relationship information in the algorithms and so we'll graph databases be a suitable data source for data science I think they'll be a suitable place in order to do your data science and perform your customized algorithms one of the things that we're looking at expanding quite a bit in our platform and I know most of the graph databases have a pretty wide selection of both algorithms that are included as part of the platform and ones that you can customize and build yourself so yeah I think it's really important to data science great Shannon any other questions oh yeah lots of questions coming in here so we'll graph database also I already got that going I'm here what should it be the approach to identify nodes and vertices relationships and I know Rob you've already supplied an answer here in the written form Steve do you have anything to add to that that's a interesting one I mean that's natively what a graph database does right so to identify nodes and vertices and relationships in data so I think I need more clarification on the question to answer that that's what we do absolutely indeed anything you want to add yeah I learned something to that usually the confusion is what to model as a vertex model so that will help you see what to model as a vertex and what to model as an edge or an attribute especially you know you can model the same model something as an attribute like name email address phone number or you could also model it as a vertex it depends on the type of search that you want to conduct in the algorithm that you want yep it's a little bit of a context it depends on your workload yeah it depends on your workload and I'll make it even more complicated right you can you can do subject predicate objects or you can do properties as well and that will make it even more complicated but it's it's not that it's complicated I would say on the positive side flexible right so it depends on your workload what you want to identify so is there any advantage to building machine learning and graph databases at all considering the technology so again Steve I'll cook it to you first I think the challenge with machine learning is and I touched upon sort of the data harmonization aspect of it right so you can put X amount of data into machine learning algorithms they may get you the results that you want or not and so part of doing data science and machine learning is to let's go out and try to bring in additional data sources and see if they can be better predictors for machine learning algorithms so you know I think it's the combination of the the algorithms and the harmonization that really gets you the value with graph and that's how I'll answer that yeah there are three things that people do with target graph with machine learning first thing is we generate machine learning features for machine learning we generate graph based analysis features of the network data we do it for healthcare we do it for telecom industry we do it for financial services and that's the first thing that training data is fed into an ML solution to make it more accurate for fraud for recommendation engine and host of other use cases the second thing that we do with machine learning is natively inside tiger graph we do community detection page that minimum spanning tree and a host of other algorithms and as Steve correctly greatly pointed out and as William talked about algorithms is the key so page rank community detection these algorithms doing those at scale on the graph data is what customers do with our product and the third one is explainable AI where you're trying to understand what is the decision that was made by an ML solution machine learning solution and how did they come about that so the features that we compute based on graph analytics is actually being used by people to explain why a particular customer was like that's high risk for fraud or why a particular recommendation was made for a customer in terms of a product or service William anything you want to add I like that answer I would just add that what I've seen is that graph databases can provide focus to machine learning algorithms and I think that they kind of are growing up now together in an organization and working together and William can you comment on graph database usage in context of GDPR well GDPR is all about being able to understand what customer information that you have who has it in all sorts of attributes about the customer so that you can get rid of it so that you understand completely how that data is being used and I talked a little bit about how networks are being modeled internally and the connecting points there and I think that is you know one place where graph databases can support GDPR is by having a better sense a better understanding of how the networks are connected in an organization where data might be flowing and you know places to change down customer data and so on for GDPR and other things. The right to be forgotten right. Steve or Rob anything you want to add? Well I think the next generation of you know using a graph database for this would be to sort of create a data fabric you might have heard that and you know maybe that's a topic for another webinar but very often graph databases are the engine behind a data fabric we can actually you know take a look at all of our data sources and say which ones do we want to make applicable to this analysis and you know essentially when you have that right to be forgotten you want to make sure that you look at all of your data sources create this fabric and then make sure that the rules apply to all the sources that you have all the data sets that you have so I think you know it doesn't become important until you start to be in that data fabric that sort of layer on top of all of the data in your enterprise that will allow you to do that. Great points William and Steve I'll just add to that that data lineage in general that meets that is required for GDPR as well as CCA the California Consumer Privacy Protection Act both of those are covered by data linear solution so we actually work with multiple customers to do this and we actually have built out a target on Tiger Graph Cloud for data lineage which will release in a couple of weeks. Alright well I'm afraid that is all the time we have for today we are right at the top of the hour thank you William for another fantastic presentation and webinar in this series appreciate it as always and thanks a special thanks to Tiger Graph and to Cambridge Semantics for helping us make all of it happen and just a reminder to all of attendees I will send out a follow-up email by end of day Monday with links to the slides, links to the recording and all additional information. Thanks everybody I hope you all have a great day. Thank you. Thanks all.