 Hello, and welcome. My name is Shannon Kemp, and I'm the Chief Digital Manager for Dataversee. We want to thank you for joining the latest in the monthly webinar series, Data Architects for Strategies with Donna Burbank. Today, Donna will discuss graph databases, practical use cases, sponsored today by Cambridge Semantics. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them via the Q&A section, or if you'd like to tweet. You can also share highlights of questions via Twitter using hashtag DA Strategies. We very much encourage you to chat with us and with each other throughout the webinar to do so. Just click the chat icon in the bottom right hand corner of your screen to activate that feature. And as always, we will send a follow-up email within two business days, containing links to the slides and the recording of the session and additional information requested throughout the webinar. Now let me turn it over to Thomas from Cambridge Semantics for a word from our sponsor, Thomas. Hello, and welcome. Thank you, Shannon. My name is Thomas Cook, and I lead sales and pre-sales here at Cambridge Semantics for Anzagraph DB, and I wanted to share with you some information about a couple of the products that we have and also a use case around modern data integration and analytics using graphs. One of the recent wins that we're very proud of is with the Food and Drug Administration that we recently announced. And they came to us and said, it's very difficult for us to answer questions quickly because our data is in many different applications and databases. We have siloed data, and this is a common problem across every organization that we hear. And it's not the availability of the data, but it's the ability to ingest, integrate, and interpret the data that is the challenge. And that's where scalable enterprise knowledge graphs come to the rescue, and our Anzograph platform can help to solve this problem. By automatically linking together all of these different sources, you're able to answer questions across these different siloed data sources. And so here we are creating the single view of the drug product. You can see many different applications that have full databases behind them. Any kind of question, a lot of questions that need to be answered, the data is in each one of these different applications. It's very difficult to combine those and link it together in the right way to be able to accurately and effectively answer the question. So here's an example, using two of these different databases, the approved drug products and the FDA adverse event reporting system. Simple example of how to link those. We automatically can create the ontology from those data sources. Ontology is basically the schema of the graph. You can think of it if you're not familiar with the word ontology. And then we can link those together. So you can see product information from the orange book and the fairs data with patients and adverse effects link together with a meeting and automatically combining them. And here is another view of that, or take a look at the orange book. It has some information about the different products, which is the product, the application, the approval date, these are the applicant ones for that product. And then in the brief section, you can see product information, the drugs, the active ingredients, the strengths, et cetera. And then on the last one here is a BAMY-MEG. This is an unstructured data source. And so we can use MLT to extract the entities and relationships from that. And then the ANZO, Null Graph Platform, will link together all of that information. You can link that together. And so it is a scalable Null Graph Platform for modern data innovation and analytics. There is a MTP OLAP engine behind it called ANZO Graph BD. It can store hundreds of nodes and 40 of the facts to offer enterprise-grade cloud deployment with Kubernetes and security. There's also a catalog and UIs and a blind model and access to data. This is a quick run-through of the different steps required to create an online graph. So first, we onboard the data by ingesting and mapping those sources. We can fill in 200 different sources. And we can also virtualize the data or we can bring it into our Null Graph in the cluster. We can then create different data models to get the questions that we want to answer and then blend those together in what we call graph marks. And then again, you can access those with third-party analytics tools and we can also do analytics inside the data platform. You can also fill in JDBC and O-Data, until the BI will pull the tools and other analytical tools. And last, the ANZO data platform is built on top of ANZO Graph BD. It is a standalone graph database. You can download a pre-edition and there's also an unified edition. You can download that and try it today. This has more of a programmatic interface. This is a UI that the ANZO Null Graph platform has, but very strong scalability and performance capability. You can see some more analytical benchmarks at the bottom, to other leading vendors in this space. We can connect over 200 different data sources directly. Also, off of virtualization, we have the fastest data learning with that for 250 gigabytes per hour per node. It can scale horizontally to be tested at over 200 nodes. And then very rich analytical capability we're built entirely upon standard. So it is an RDS triple store, but we also support the Cypher query language. And we also support RDS Star, which allows it to be labeled proper to graphs in the database. And so please download it today and try it out or reach out to us. We have several webinars on both products, both the ANZO Null Graph platform or ANZO Graph DB, the standalone graph database. And one of the things you should join for today's webinar. Thomas, thank you so much for this great presentation. And if you have questions for Thomas or about Cambridge Semantics, you may submit them in the bottom right-hand corner of your screen in the Q&A portion. And he'll be joining Donna and the Q&A at the end of the presentation today. Now let me introduce the speaker of the monthly series, Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Limited, where she assists organizations around the globe in driving value from their data. And with that, let me give the floor to Donna to begin her presentation. Donna, hello and welcome. Thank you, Shannon. And thank you, Tom, for that great introduction. Very helpful. And as Shannon mentioned, this is the Data Architecture Strategy series. And it's great to see a lot of familiar names on the attendee list. So thanks for many of you who attend these each month. For those of you who this may be your first webinar in the series and are particularly interested in graph. And this is often one of the most common questions is, will this be recorded? And yes, all this, the webinar will be recorded and available on DataVersity for, I think, perpetuity, as if any of the other topics that were done this year in 2020 are available on demand. So please take advantage of that. It's another free resource offered by DataVersity. So the sad part is this is the last one this year, although it's 2020. So for many of you, maybe it's good riddance, right? It's been a strange year for all of us. We will be doing another lineup next year in 2021. So again, those of you who've been joining regularly, I hope you can continue to join some other hopefully interesting topics. I'm particularly excited about the one in March because we actually have a case study from a data modeling is always very popular on DataVersity and case studies are often very popular. So this is a large building and manufacturing company that is showing how they use enterprise data models. So that one might be of particular interest, but I hope you've enjoyed as many as you are able to. So, but today's topic is on graph and Tom gave a great introduction, but just to kind of cover what we are going to talk about is this idea of graph databases and there has been an amazing spike in popularity in recent years. Now, for many of the reasons that Tom mentioned, just a different way of looking at some of these mass data sets, especially across platform, across industry for areas in a lot of different industries. Tom mentioned kind of healthcare pharma, fraud detection for financial services, marketing, network optimization, and opt IT. There's a lot of different use cases and we'll talk about a few today. But for those techy folks in the column, we always get techy folks, which is great. This is an architecture webinar. It's been said that the data model is a metadata and the metadata is the database and that might be sort of seems complex, but hopefully by the end of this, you'll see the value of that and what that actually means more likely in a practical application. And that'll be the focus of this. It's a very high level. We're not going to get into how you write or any of the detailed semantics. There's plenty of resources on that. So hopefully what these webinars can offer you is one of the most exciting things, but also one of the most challenging things of being a data architect or a data management professional is just trying to keep your head around all of the different opportunities that are available because it's not 1990 where there's relational or not, right, or cobalt. There's a lot of different tools and to kind of at least keep in your head which tools are right for which job is something we're trying to do in this webinar series. So if you go away with nothing else to really understand where a graph might fit in your organization or all your architecture, then we've done something good today. So what is the graph database? So the idea of a graph database is this idea of kind of using nodes and edges to store relationships. And as I mentioned with that quote, it's really almost the relationships between the data points are as important if not more than the individual points themselves. And that really helps you discover new insights. So we have relationships in a relational database, customer buys product, but often, and we'll talk more about this in a relational database, we're thinking more about the nouns. What are all of the attributes set up on a customer and it is very prescriptive and that is good. These technologies can live together. You definitely need to do that hard work with data quality, et cetera, to make graphs sing. And if you don't, if every customer in your database is named John Smith, the power of a graph is going to be very valuable, right? And the examples Tom showed that those data sources were very well structured and were published. So you do need good data to start with this or your results as with anything, garbage in, garbage out. But this is a new powerful data platform to be able to look at data. So let's just go more into, I like to in my head again, there's so many different technologies to kind of have some sort of mnemonic about each one. So when I think of graph, I sort of think thing relates to things. If those nodes and edges, if anyone's familiar with Dr. Seuss, the thing one and thing two, that's how I think about it, is how do we get those things related to each other and kind of see those patterns? The formal way to say thing relates to thing, the idea of nodes and edges or vertices really, every tool or every modeling technique use slightly different words, but just the idea of, again, you should be familiar with this. If you're familiar with databases, you've got the thing and you've got the relationship. But one of the differences in graphs, and we'll talk a lot about that in this presentation, is that those relationships are really for first order relationships and the way you can sort of look at data with different lenses. As Tom mentioned, you got kind of new different ontologies, right? Because there's data and then you have different lens, different views on that data or different ways you want to look at it. So at the very core is this idea of the things and the relationships between the things and the nodes or the edges. One of the nice things about graph in my mind, and it is a different way of looking at data. I mean, one of the, again, good things in relational, you're very structured. You have a customer has these attributes, they have the constraints of how they can link to other things, but the human brain sort of works in a graph-ish way, right? You say something like, oh, I should just go visit Mary. Ah, Mary's brother, John, I wonder how she's doing. Is she still getting Stephanie? I remember he had that boat, oh, boats in the lake were great. They still have the house in the lake and they had a boat and it looks like that boat I had as a kid, right? Your brain, these are structured connections. I mean, it's not completely random. Each one of these has sort of a node that really see other, right? They have Mary has a brother who is John who had a girlfriend or partner who is Stephanie or John had an activity or linked to his boat, right? So there's structure around that, but it's not as formal as if you're building a data warehouse and you've predefined what you're going to think about when you go to visit Mary. Mary will buy product, right? That's sort of the relational mindset. So yes, this can be a little more loose, but there is some structure around it. That said, if your mind works like mine, you probably have something to squirrel, right? It's only some random data ports. My brain doesn't go in a straight line and you probably noticed that on these webinars. So that just sort of underscores the idea of the data quality is important no matter what data platform you're using, right? Again, if all of your customers are named John Smith, you're not going to get great insights of the data itself, but only data doesn't have value or is structured well or doesn't have good quality. So that should go without saying of a data diversity webinar, but it is important to remember and I've had people to argue with me on that, you know, now that we have big data, data quality doesn't matter. Well, data is garbage. You're not going to get good insights. That should just be core foundation. So, but this is a new way of looking at things and to go back to that again, sort of the traditional way of looking at the world doesn't mean it is a bad way. We still use this all the time, super helpful. But if you go way back, if you remember in school, whatever grades you learned, this one in is that idea of Linnaeus in 1735 had a hierarchy of taxonomy for organizing biological systems. And I remember memorizing, I think for me it was third grade, right? Kingdom, phylum, class, order, family, genus, species, right? It's a wolf and lupus or whatever. I should've looked that up before I said something. Right, but we still use that all the time. There's the periodic table of elements, right? So, and many of my customers have very defined taxonomies and hierarchies for that data. That is very helpful. We've made many, many scientific discoveries based on that and having some organizational structure to keep a track of, you know, plant systems and animal systems and biological discoveries. And I'm sure the examples that Tom gave in terms of FDA, they use hierarchies and taxonomies too, right? Doesn't mean this goes away. That's just not the only tool in the toolbox, right? And I think, you know, not to get too philosophical, but I do. I think back in the day in 1735, Linnaeus and probably a lot of folks had, well, gosh, we are onto something. If we can just put everything in the world in our little bucket and we can name it, we'll have everything in the world figured out. And that sort of quaint, and that would be lovely if our brains were big enough or the world were simple enough that everything can fit in a taxonomy and it just doesn't. And what I find interesting, if you look sort of at a new or a way of a little more looking at this as this idea of emergence or chaos theory in science, right? Where there are concept, complex systems and patterns and there's probably physics and math that's beyond my small brain, that you do get these patterns and systems out of the multiplicity of relative simple interactions like a thing relates to things. So think of a snowflake without getting too religious or philosophical, there's no person that designed every single snowflake, right? There is sort of a random pattern where water comes out of the sky and forms into these crystalline structures. But it's a no-to snowflakes look alike, but there is a snowflake-ness and you will see that there's patterns that have a lot of similarities when you go down to the core of those patterns, there is a structure. So that's sort of a scientific application of that. But that's used in practical things as well city planning. I, if you go to sort of a modern, say American city that was sort of planned on a prairie, it's all very straight grid lines, right? If you go to Rome or Boston, right? It was sort of built from cowpaps and things just sort of random. If you know where you're going, it's great. If you don't, good luck to you. But I think modern city designers realize that neither one of those is perfect in those straight lines aren't always as efficient. I know in my university, they had this straight line of the line. They sort of built it in those straight grid patterns, but everybody cut across the lawn because that was the fastest way to get there. So eventually they paved that path across the lawn and sort of just compromised, right? But that's this idea of city planning where they look at existing traffic structures and then design pedestrian pathways and traffic pathways based on usage patterns, right? So looking at otherwise chaotic systems and getting structure from it. And in my mind, that is sort of where I see graphs, right? So just like that human brain, it looks like it's making these random jumps and depending on the person, it's more random than others. I mean, there's more squirrels and some people stressed out brains at the moment. But there is a pattern, right? There's always some sort of semantic link between those thoughts and kind of linking those together with some sort of structure. I have a lot of power to it. And that's where I can see some of the power of this graph system or the graph approach. So in many ways, the graph database, it's not the only solution that it has its use cases, but it can be the best of both worlds because it has some structure and meaning. And so Tom mentioned that you have this idea of an ontology or sort of that taxonomy, it's more flexible at taxonomy that you can apply different ontology. So if I want to go to Mary's thought about music, there's sort of a relationship ontology that there is a brother, there is a girlfriend, right? There's sort of a place technology that we met this person at a certain place, right? Or we have a common interest. We both like boats, right? Those are all valid sort of ontology on top of that thought process. They're all valid, but they're all different. You didn't have to decide ahead of time, what is Mary gonna think about? She's going to think about visiting friends. And this is a family pattern, right? Or a friend pattern, relationship pattern. So again, it does have a structure, but it also offers some flexibility. So for those of you who do come from the relational world, and I think that's a lot of us on a data-versity webinar, or a lot of us in database world, that's sort of where a lot of us grew up, and relationally, basically, they're still great. They're not going away. They just have their use case. But as I mentioned before, in a graph, the relationships are really first-class constructs. And it's stronger than in other, there's a lot of different ways to model data. In today's world, which is what exciting. But ironically, even though we talk about relational databases, they really lack relationship to the first-order pattern, right? In fact, one of my colleagues and friends, Karen Lopez, who has also spoken a lot at data-versity, had a quote that I stole and liked. I haven't interviewed her. She said, you know, a database really isn't about relationships, it's about constraints, right? So you've kind of, you're creating those keys and you're creating those patterns, but it's predefined and it really isn't about those flexible relationships. So that's the one on the right. On the left is a sort of stylized graph pattern where you might have, you know, a customer is an owner of account, but owner of is a first-order construct in the graph model, whereas if you look at, you know, a customer account on the right, yes, data modeling tools, and you all know I'm a fan of data modeling. A logical data model, you can add those sort of verb phrases, but it isn't inherent in your database. There's nowhere in your article that says customer is an owner of an account that sort of imply. So that is one of the powers of graph. And as we showed before, customer is an owner of an account. A customer is also an employee. A customer is related to other customers who have similar buying patterns, right? So the more you can kind of add that flexibility, but with semantics and metadata meaning, that is sort of the power of graph. So again, this is not a graph 101, how to build a graph or I mean, there's plenty of vendors have great videos. There's a lot of good resources out there, RDF and AOL and a lot of the good underlying technology of this. There's also some great industry patterns that are out there that, you know, a lot of industries are taking up a graph and there are some industry models or ontologies out there. So plenty of resources, but hopefully if you have not seen graphs before, this again, at least in your brain can think of, I might have a use case to that, that thing relates to thing, that first order relationships. If you're all about graphs already and you've doing a lot of it, maybe give you a new way to think about them or a new way to describe to your colleagues graphs, right? We all have that problem of how do you describe what you do for living to your friends who are not technical or to your client or to your sponsor or your business sponsor in the organization. So hopefully that was a different way of looking at it. But the real reason we wanna use graph is for some particular use case and there's some exciting ones out there. So I thought I would just go through a few to maybe pick your interest or give you some thoughts. If you're not using graph in your organization today, we'll go through them. So a very popular one is this idea of social networks, right? Because again, I'm looking at data. I may not know that Donna is related to John. I don't have that first order. So I just wanna understand who is linked with Donna? Who are those cool people who really are Donna's friends? Because they like data. And then you'll see up here in the green, there's a few lonely people who don't like data and they're really pathetic. So they are not the cool kids and they are not in Donna's network, right? So that is a fictitious example. But that is used all the time. Think of when, and I know many of you on this call like me, celebrated when the word metadata was suddenly in the headlines, right? When you think of NSA, and when there was sort of phone metadata and why that was suddenly interesting, right? And because if I just phone call patterns, you can see who is in the network calling whom, right? Is there one phone call that's always linked with a certain person? Or this can be used for that type of use case. It can be used for customer patterns, what customers may know each other and may have sort of social recommendation engine type analysis, et cetera. But the fact of metadata being in the headlines, I do and I wish I'd saved it and I didn't. I'm always kicking myself. A friend of mine in Melbourne sent me a headline when they were having a similar issue with the metadata of cell phones and things. And it said, prime minister upset at not being invited to metadata talks. I just said, wow, now it is our time in the world where that's actually a headline in the newspaper. But because people were realized, it wasn't necessarily what you talked about on that telephone call, which would be the data. It was the fact of who was making the phone call to whom and that was, again, it's the relationships that were the first order content on construct and what makes that rather interesting. So this is used a lot for a lot of different use cases. Kind of a fun maybe application of that. If everyone's heard of kind of the bacon number or the X degrees of separation. So the fact that anybody with a certain amount of what they always say, there's three degrees of separation if you meet somebody, you always probably use the three steps away from somebody else and people have had fun with that with Kevin Bacon. Folks are not familiar with him. He had some movies, Footloose I think was one way back in the day. And there's actually a website on that that you can kind of determine your bacon number of how many degrees of separation is this person from Kevin Bacon. So I think Barbara Hepburn just does. And she has both a bacon number of three and a bacon number of two. So that sort of links back to metadata or data quality or master data, right? In the world there are more than one famous Audrey Hepburn or is that a data quality issue? And someone put two Audrey Hepburn records in that field. And that underscores the idea if you don't have good data and you don't have good master data and you don't have good data quality, the power of your graph is weakened, right? And this could either be a really interesting insight over there are two Audrey Hepburns or I think our data is wrong, we need to understand it. So hopefully that was sort of a fun example but it does give you some examples of how this can be used in social graphs but also how this, I can have its limitations. I will talk about this a little bit later but I'll touch on it now. I mean, this does have real world implications. We did a project where they talked about this a few times on this call a massive financial institution that was global and their customer base was the high net worth individuals, people like us who have a yacht and several businesses and homes all over the world that need to be insured and they were trying to understand all those patterns which are very powerful because there's market opportunity, et cetera. But they had the same issue as this Audrey Hepburn issue. They had Joe Smith and is Joe Smith the multi-billionaire or is Joe Smith just some guy who happens to have an account and he has $500 in his checking account, right? And so getting that core data was their problem. They had a lot of different powerful analysis they could do but their own data sets weren't robust enough to really get the power of that. So I mentioned financial institutions. They're another group that uses kind of this power of graphs and fraud detection is one and this maybe is a helpful use case to kind of describe that. So again, a lot of graph is about patterns. What are the interconnections that exist? So typically, if I'm thinking I have an online transaction I am buying my Christmas presents with my credit card and I go online and I purchase something on Amazon, for example. And so you could track that from my user ID, my IP address, my credit card number and that's fine. And then you might notice that the same IP address is using two. So if he looks sort of over to the right they're using two credit cards. Now, is that problematic? Might that be fraud? Maybe or but that's pretty common. I may have a personal card. I might have a business card. Maybe I'm using my husband's card or child using their parents card. Probably not too strange that there's more than one credit card being used for the same IP address. But if you look at this area on the left might be a little odd but the same IP address has seven different credit cards making purchase from it. So that's that idea of maybe it doesn't mean it's fraud but it might be something to look into that these sort of tightly knit graphs where there's all of a sudden a lot of activity on this one IP node might be something to look into. So again, that's sort of an interesting use case for something like a graph that other technologies wouldn't handle as well. Another use case for a graph is something like a recommendation engine that we're all familiar with and this isn't the only way to do this but it is one that typical customer bought this you may also like this. So how is that done? You can understand the customer's browsing behavior you can understand their demographics and you can kind of see that link that customer one sort of bought a product and customer two also bought the same product. You might say, well customer two all the product two you might like that as well. I just kind of transversing that right graph just understand these patterns of the relationships between the different nodes. Again, kind of an interesting use case but as I mentioned before with all of these data quality, data volumes, data sets are important to all of us. None of this works well if you don't have good data and that should be obvious but it's often forgotten and sometimes in the quest to use new cool tools and maybe rush to the fast stuff. So yes, knowledge graphs as Tom mentioned can have a lot of power and they can generate a lot of power fairly quickly but not if you don't have your ducks in a row and good data beneath it or the right volume of data. So again, maybe a fun example. I'm sure we all sort of get followed by strange ads or sometimes I know I try to have a lot of fun with a lot of the folks trying to connect my data and I put really strange search queries and just try to keep whoever's watching me wondering, right? But I did try to just find some examples of this and here's one, there's was a true use case. So don't ask, but I did a search on an ax there on the left is kind of a camping ax where you might want to go split some logs or whatever if you're out on your camping site. And of course it came up, customers who bought this also bought coffee filters. Now, that seems really odd. I mean, and sometimes you might see this too, you're looking at something online, customers who bought these pair of jeans also bought can openers. Well, yeah, I'm not sure why that has it. Yeah, sometimes it does make sense, right? Someone who bought this coffee pot also bought these coffee filters to go with it. But again, if they did, there wasn't a good data set beneath this or the data set is small. So this may very well be. Someone was going camping and they bought an ax to sort of buy, cut some wood for their campfire. And that type of disc filter actually is a common one used for some of those camping pots or the coffee pots that are used that you can put over a fire and are often used in camping. So that might be a really interesting insight for somebody that campers tend to buy these two products or it might just be really weird, right? Because there's not enough data to really say, it would probably make more sense. People who bought this ax, bought this ax sharpening tool or something like that. So again, you need to have good data sets. You need to have the right volume and variety of the data to really make some of these results make sense. Doing a graph on a spreadsheet of four people probably isn't valuable. We probably need to add, and that's an extreme example. But it is a good thing to think about that you wanna have the right data sets to make that make sense. And as with many of my webinars, sometimes you're like a grumpy person, but sometimes it's helpful to describe something what it is by what it is not. And I get frustrated. It was not sponsored today, but often the vendors sort of over promise with it. They have a really good technology and therefore everything can be solved by that technology. And I think that does a disservice to the really cool technologies that are offered. Just be proud on what you offer, because it's cool enough. And graph is one of those. It's cool enough. It just isn't everything. And one I'm sort of passionate about is master data management. And I would say in terms of some of the use cases in our practice at Global Data Strategy, things like governance and master data are the biggest growing other than strategy, because we do a lot of that, are the biggest growing ones. And ironically, often it's because of things like graph and data science and a lot of newer technologies because you don't really have to get their data right. But I have found that a lot of the, as to what you sort of defined by master data management, I define, I think the inbox that defines master data is that sort of that common single version of the truth of your data. Who is John Smith? Who is Audrey Hepburn? Who is Donna Burbank? What is that right version of the customer? And do you have their most recent and accurate attributes? Often that is done in a centralized way, commonly relational. I mean, I think that relational is a really great use case for MDM because it is all about those constraints in relationships and quality and rules. That is what relational is very good at and has historically been at. And there's also this idea of more of a virtualized or a registry that often is more of that of the data fabric or kind of graph approach. Both can work. I tend to be a fan of the left because, and I am not at all discounting the value of the kind of data fabric, but I think that has a different use case than I'll get into. I think if you're really trying to get that single view of person, kind of some of that, getting those rules in place, you can do the virtualization layer. I think you have to just have your act together even more so than the one on the left, right? You have to have a really good data model. You have to have a really great understanding of the different source cases, source systems that are being virtualized and connected through that layer. And one of the reasons I am not a fan of it is because when I see in usage, sometimes it's sort of like plastering over the holes in the wall, right? People want to get to the answer fast and call that master data. It's not like master data is the hard work to get that single view correct. As long as you're doing that virtualization, that's fine. I just think that that virtualization can be better used in other ways. The hard stuff around master data is the data governance, the parsing, the matching, the data quality, the semantic meaning of what you mean by a customer. And one of the benefits of graph is that semantic meaning can be fluid, right? A John Smith can be a customer. It could also be XYZ. But at some point with master data, you want to put that line in the ground and say, yes, but his name is X and his address is Y. And that's sort of what I just wanted to stress with that particular use case. And again, my little rant is, you know, often when you have a hammer, everything looks like a nail. And we're all, we all do it. Wow, I've discovered a graph. I'm going to use graph for everything. And you don't want to overdo because that just doesn't do service to the technologies. And I also have seen very flexible use of the word data warehousing. And we know as data people, semantics are important. I think a very common historical use of the world word data warehouse is this idea of aggregating, summarizing data in a relational or dimensional model, which isn't a graph use case. And I actually was impressed with Tom's presentation that they were very clear. It's a graph mark, right? Very similar, very powerful, but let's just be really clear when we're using things. If you're talking about your more traditional data mark, that is a thing and that has a name. So don't use existing names for a new thing. You know, be creative with another graph mark. That's much more descriptive and it does that justice to both. And they both absolutely have their plays and are powerful. So data warehouse, where you might use relational or dimensional. I mean, if you're doing something like total, show me total sales by region and customer Eastmont and I want to slice and dice. I want to make sure I understand what total sales mean, what region is. And I have very structured reports. I'm doing my annual report to the board, data warehouse. Like that just screams data warehouse to me. That use case doesn't go away. In fact, I would say maybe the third common, a popular thing that we're getting other than money for strategy, data governance and master data is data warehousing. It's as popular as ever. Plenty of people need data warehouses for the use cases they are good at. I've heard sort of disparaging comments of old school, we don't need them anymore. It's the classic yes and we do need them. If I think I want my financial reports done consistently and correctly and I want to be able to slice and dice by region. So whenever I hear those types of words, I like to look for warehouse. But, and we want to use that, there's also the sort of newer way of looking at things that Tom mentioned as well, this enterprise knowledge graph where that has a different use cases. The similarities in this is where perhaps a non-technical person could get confused. They're both enterprise wide. They're both looking at data across the organization but with a very different lens and a very different way of looking at things. You might want to say, let's look across all of the organization and who are my most influential customers. Yeah, you can get that from the warehouse. I could kind of see who's spending the most with us each month or by year and that kind of thing. But maybe I want to look at it with a different lens. Who has the most connections, right? I might have, I think if you were on my data master data management webinar, I think it was two months ago. I kind of talked about that where we had the two Stefan Krause's, I think it was. Both bought ski products. One was a banker in Zurich and he spent the most, right? He spent thousands of euros every year. And then there was another Stefan Krause who was a ski instructor and he only spent about 500 euro because he got everything free. And then at that point with master data we said, well, maybe he isn't your best customer. Best meaning spent the most money. But were you to throw a knowledge graph on that same use case? He might absolutely be your best customer because he's connected with all the skiers in Switzerland. He talks to all of them and he's winning all the races in a very high profile, right? So I think the combination of those two use cases where I have Stefan Krause the banker that if I look at him in a data warehouse viewpoint, he's great, both are great, right? But for a different way, because he spends the most of us every year and he always spends in December because that's when he goes on holiday and buys brand new skis every year. He uses them five times. Stefan Krause who's been using the same and gets all his skis free, doesn't matter on the how much he spent, but he's actually influenced another 10 people to spend. And maybe he's also very important. So again, just different tools, both very helpful and very valuable. Again, another way to look at graph and hopefully they're helpful and not completely strange. The connection between bone room dancing and data management, I'm sure that's the first thing you thought of when you were thought you were attending a graph database webinar. But I have tried my hand at bone room dancing and I'm terrible. But that sort of helped me understand this analogy is that when you first try any sort of dancing, all you're trying to do is understand what direction you're going in. And they're saying these words and do I even know what these steps mean? And you are completely self-absorbed and you probably don't see anybody else in the whole room and you're just wondering, do I look strange? Am I going in the same direction as everybody else? And what am I doing here? And then you get a little better and you know what the dance steps mean and you actually look up and you say, wow, I have a partner here. Maybe I should pay attention to what they're doing. And then you start getting equally self-conscious and say, but I better not step on their feet. But you're at that level, you've gotten yourself under control and you're now sort of kind of reacting and dancing with your partner when that goes. Well, it's a really cool feeling. I guess I've gotten good enough in dancing that I can kind of experience what this phrase is saying. And you're really in tune with your partner and you're spinning. And then when you're a really good dancer, you're dancing with the room because you realize that everybody has their patterns and you're sort of dancing with the other couples and you realize there's holes of nuance with that light guys throughout the room. And I sort of see that, okay, there is a connection here with data management, right? Dancing with yourself. If you don't know who John Smith is or the data quality is terrible or you don't have his purchase data linked with him and you don't have his address, you're not going to get very far with graph because you don't have great data, right? And the more than you can link with your partner or start to make those connections, either with a warehouse or any sort of connection, that's fine. But when you can really make all of those connections across your enterprise thing, that's sort of the dancing with the room. And to me, that's the knowledge graph and the power of knowledge graph is that you can look holistically across just for data sets without a particular lens in mind. Maybe you don't know the answers yet and you can kind of see some of those patterns and that's I think the beauty of graph with the right use case, right? So hopefully that was helpful and not too off the wall as I am known to get. So again, that's my connection here with this idea of a knowledge graph that is that dancing with the room. That yes, I now know that the Audrey Hepburn we're looking at is the one that was born May 1st, 1929. She's not a current customer because she's dead. Sorry. But, and I didn't know this, but I did some research. She has a son, Luca Docci, and he was born in 1970. He's a customer that might be interesting because he owns the fancy yacht she purchased. Do we ever know that this person named Luca Docci is very wealthy and owns all the mansions that Audrey Hepburn had? I assume. This is a fake example of those people are real, right? But that just shows you an example of you understand the data. I know which Audrey Hepburn I'm talking about. Maybe from my structured data, I know that there's a family relationship between her and her son. And then from that, you can get some of that graph. A little bit of a hand away. I mean, there's a lot of pieces of that, but shows the example of that you may not have been able to get as rich of interconnections that you can by kind of connecting some of those data sets with graph. But as many things in technology in life, it's a yes and it's not an either or. So graph can augment a lot of different technologies, can often power, often a lot of power. It doesn't give you an out to not have good data. Nothing, I haven't found one technology that helps you not have good data. Because if you don't know Audrey Hepburn, that is data that the actual human being is data, the actual customer. So before we sort of close up, and I know there's been a lot of sort of questions popping up, and I'm sure Tom has some good insights too. You can pick his brain. I think another benefit of these data diversity webinars, we can maybe give you some insights and some data on data on data. So we did a survey with the university folks did a survey this year. We do it each year on trends and data management or trends and data architecture, depending on the year. And if you look, compared to the kind of the traditional work courses, like relational databases, don't get me started. Also spreadsheets, that's a whole other webinar. Let's all agree that that's probably not a great idea for enterprise data management. But you'll see the graph database in terms of current adoption is fairly low. Again, the audience for this is your sort of traditional data versus the attendee. So that's kind of us, right? So I'm wondering if any one of the comments wants to share who is using graph. Not the highest adoption. That's not actually a bad thing. I think there is interest. It doesn't fit every use case. Whereas, I think every organization has some sort of relational need for your operational systems in your warehouses and things like that. Not every has the need for a graph, right? So it doesn't necessarily mean that's a bad trend, although I would like to see that increasing because I do think there's a lot of opportunity. I do find though interesting, we asked this question in two different ways. Who is planning on using this data in the next one to two years? And then you will see that there is a spike going up. So we in this last survey did two years. So 2019 actually saw a higher spike than 2020. But still it's about 20% are looking to grow. And that's in addition to people who are already using it. So obviously probably 30% or higher are probably in the kind of graph realm in the next couple of years. Just as an overall, I think we're all, many of us are ready to have 2020 in the rearview mirror. And we saw in general, if you kind of look at a lot of those, a lot of people are doing a lot less experimentation in, I wouldn't say graphism is really experiment, but non-traditional approaches than normal. And if you read that report, I'm not trying to plug it, but it is free on the university. If you see it, I will send out that link. You'll see that in general, people were sort of going back to comfort. I think in every session in the world is sort of up in arms, a lot of folks were kind of going back to relational or cloud-based database as some of the maybe more stable. That said, I know I work with a lot of customers who are actually going the other direction and upping their investment and things like digital transformation because everybody's digital. So I think, yeah, I wouldn't personally not look at some of these new technologies because of current events. I would sort of go the opposite and say start looking at some of these new technologies because I noticed in my customer base, the more people who had their data in a row and had some great data strategy and plans really didn't skip too much of a beat with this new world. I coined a phrase with my dad this morning, that the nouveau work from home. I think a lot of us have been digital for years and I'm sort of upset that other people figured out about working in their pajamas. But I do think there are a lot more at nouveau work from home. And now there's a lot more rich data that can be used to really support those people. Anyway, hopefully that data or that information is helpful to people. So just to kind of summarize, for those of you who are new to graph or interested in graph, just think of it again in the myriad technologies you need to keep in your poor data management brain, think of graph and I hope I'm not worshiping for Tom. It's a thing related thing where you can really use that power with the relationships to have great discoveries of your data. There is a lot of really interesting use cases. I only just started with a handful but this has been in use with things like social networks and whenever you have the whole idea of the enterprise knowledge graph, of how can we look at our data holistically across the organization without the lens of I have pre-described rules that I only know the answers to. I have a loose set of rules that can be flexible with an ontology and get a lot of different answers from the data which can be a really helpful use case. So we open it up for questions, just a blatant plug from my organization if you need to help the graph or anything else that we discussed, we'd be happy to help you. And then a bigger plug for next year's lineup, a popular one that we always do in January is a little bit of a sneak peek to what we do with graphs. What are the emerging trends and what is that next big thing? This graph, maybe something else, we will learn that in January but that's always a fun one to kind of think with this new lens of the new year, what are some of the technologies to look at. So without further ado, because I know there's a lot of questions, I'm going to pass it over to Shannon and open up for Q and A. Shannon? Donna, thank you so much. I always appreciate it. I'm gonna get a little bit of echo there. Just so if you have questions for Donna, feel free to submit them in the Q and A section in the bottom right-hand corner of your screen and just answer the most commonly asked questions. Just a reminder, I will send a follow-up email to all registrants by end of Thursday for this webinar with links to the slides and links to the recording as long as anything else requested throughout here. So diving in, and Thomas, we invite you to join in as you feel like in the Q and A here. Is Jason a standard format for graph database? Is it not, if not, is there any? I'll give a quick answer and then I'll pass it over to Tom who I'm sure has input as well. I mean, there's a lot of, I mean, I would think things like RDF and OWL are kind of two sparkle. I mean, there's really a whole new set of way. When you think, I would just sort of, if you're looking at Google things like ontologies and there's some great, but RDF is a sort of a common one that people look at. But Tom, does you wanna give some thoughts yourself? Yeah, sure. This has been an echo. RDF is a data model that our database uses. There's also other data models. JSON can be a serialization format, basically how the data gets exported. And there's a type of JSON called JSON-LV, which is a link data format. And so that is a common serialization format where you can basically move data around from tool to tool in a JSON-like format. So yes, internally, our database uses the RDF data model. Which is basically just a data model construct. Internally, we have our own representation like in the code itself. Other databases that support labeled property graphs and you can walk up to it. And then there's many different serialization formats that can serialize the data in the turtle file, which is an RDF format or N-triple, which is another RDF format where you can export it. There's many different file types. XML is another one that can be serialized as. Would you use data modeling repository content and place that in a graph database? Yes, well, I would, well, I think MDM, that's probably a whole other webinar, but yeah, MDM would be a source that can either be discriminated to the operational systems or the reporting systems. But yes, then they would be the source and there'd be many sources. So I don't think it's limited to MDM. I think often MDM is the one that would feed sort of those, if you think back to Tom's example, there was sort of the drug FDA. They have their repositories of data that MDM can augment. But yes, you're going to have different data sources that the graph can sort of sit upon. And I see MDM or data quality is sort of a foundation that feeds those systems that then feeds the graph. That makes sense. Tom, feel free to chime in with your thoughts as well. Yeah, we would ingest the data models from each one of the sources and then inside of our hands on the knowledge graph platform, you could build your model, however you wanted to represent the data. Do you use that as a tool versus a different type of data modeling tool? So if you think about it in traditional MDM world, we have our goals and the record, you could create a goal and record that says I want customer information in terms of database one, two, and three, and build out this customer record, right? You could build that in a tool, but it will look much different in a graph. And so we give you the tools to build that in the graph and we can automatically ingest those data models from those databases one, two, and three and then we let them link those together and pick out the different features that you need up in your usage. All right, and going back to slide 20 and still on the topic of MDM, is data actually stored in the virtualization layer or are they more like views? Yeah, no good question. So I guess the main difference without doing a whole webinar on this, which we did a few miles back. So yeah, the difference of those is the one on the left in the centralized model, you're physically moving data into that MDM hub. So where Tom mentioned that goal and record, why that central thing is gold, that you really take those attributes from copies of that address into that MDM hub then it can go by directional issue. You're moving the data into those systems with a virtualization layer, the data more stays in place and then you're doing that sort of query layer on top of it. So that would be sort of the foundational difference between those two. Are you moving data into one place and then querying it from that or are you leaving data in its place and then through that query, you say I want to take attribute A from system A and attribute B from system B and that is sort of the main difference between those two. But again, as always Tom, if you want to chime in on that as well. You're muted Tom, if you are speaking. Thank you. The only thing I would add is that we can support both virtualized and in memory or you can ingest the data in part of the graph and have part of it virtualized. So we support kind of a combination of both of those. So some sources are very, some operational systems, the data is very active, I can see that updated regularly. And then you may want to query to find the golden record and to pull out the information you need. You want to query that source in real time until you want to virtualize that source or other data sources or simply copy the read progress in some periodic basis. Going back a little bit again, what did you mean by a time series database? So time series database, whether it's a database itself or a way of using a database. So often, I mean, often that can be common with things like IoT, right? Where you want to get sort of data sets in a time. So every minute I'm getting a read from this particular data source and kind of understanding the time patterns across it. I've also seen it used, we were working with an education department and it was sort of that time dimension they were interested with, with say a student in their time across from being a freshman senior to a post graduate and kind of just laying out that data in more of a time series. So that is the time dimension is sort of flattening it out that way, that makes sense. But maybe the easier way to think about it is more something like an IoT where I have a machine that's, every minute is sort of sending out data and then you're kind of looking at the time patterns across it. Was there a spike in internet use at this particular time from now between seven and eight in the morning is when we're seeing the spike and it's kind of the way it's serialized is by that time slice. I mean, the database could be anything, right? But it could be a spreadsheet you're putting in which probably isn't the most ideal. But that's the idea of that. I like it. I think we have time for at least one more question here. How scalable are graph databases? Can graph databases work with millions of records unable to digest that graph databases can be truly scalable to work with? I think and then I definitely let Tom time in because I know he has an opinion. I'm sure he has an opinion. But I think yes. And I think that is sort of the, a lot of the organizations that are using graph are some very large scale organizations and that's the power of graph is that, yes, if I have a spreadsheet of 10 people, I can probably eyeball it and see who's related to whom, right? But one is you do have a massive scale and you can't use traditional patterns. And I think that is where some of the value is. So I would think definitely that is, that is I think also the rise in popularity. Now it's really looking across these large scale data sets but I'll pass back to you Tom too to give you a two cents. Yeah, thanks Tom. Here at Cambridge semantics, enterprise scale is really critical and our database is an in-memory distributed empty graph database and we tested it to up to 200 node cluster and over a trillion facts. And so this was really huge cluster of systems operating as a single database. And so we published those benchmark results. I think that one of the reasons graph has not seen adoption as quickly as we would like or in the past is because many of the graph databases have not been able to scale. So all of the graph databases up until recently were not distributed, were not empty, et cetera. Now we're able to scale more vertically and horizontally to handle any data body. That's really critical for our business because we're ingesting data from all over an organization and sometimes keeping all of that in memory for high performance. So we do like sub second period response times across two years of data analytics. Yeah, and just to add that I was looking at the chat while Tom was talking and some people chimed in that I mean, it's true that some of the first types of companies that use this graph were some of the big ones. You know, think of Google and Amazon as some of the big players. I think the exciting part now is that a lot of similar technologies, if not the technologies themselves are now available to sort of regular people. And so that democratization of graph is I think there's something really powerful people want to consider. Well, thank you both so much for this great presentation and thanks to Cambridge Semantics for sponsoring today helping to make these webinars happen. Really appreciate it. Just we are to the top of the hour here. So just want to send a reminder to everybody that you will get the follow up email by end of day Thursday with links to the slides and links to the recording from today's presentations. And if you have any questions, feel free to email us. Thanks for being so engaged in everything we do. Donna and Thomas, thank you so much. Really appreciate it. Thank you. Always a pleasure. Thank you. Enjoy it. Thanks everybody. Stay safe out there and have a great day. Bye bye.