 This will be an interaction to grab databases. So for all the people who are thinking about them and basically asking questions, what are they there for? Yep, this talk is for you. For the more advanced talks and topics, stay in this room and from what we saw, you will learn some things. Okay, some things about me. I'm the chief architect in IT Imagination. It's a big Polish IT company. There I'm helped in running one of the biggest dotnet projects in Poland. After hours, I basically dig into data. So basically, yeah, Neo4j is for this and making sense of it and also distributed systems. And I'm also running a pet project called Cookit PIO. It's a contextual search engine for cooking recipes and we will look at it later. Okay, so graphs, what are graphs? This is a graph. It's a bit lonely graph. It's a single note, but still it's a graph. So let's make it more interesting. Let's add a friend. And if we are adding a friend, we are adding a relationship. As simple as this graph is, it's way useful because this is basically Facebook. I know you, you know me. Yeah, that's it. And if we like to spice things a bit, we can add a direction to this relationship and this actually is Twitter. If I am observing you, you are not necessarily observing me. So let's talk a relationship, don't do it. Next, this is the, we are adding to weight to the relationship. So this is basically the most common thing that people are thinking when they're thinking graphs. So basically cities, cities, roads and roads also have a direction. Next, let's get a bit funky. So if we allow for two notes to have multiple relationships of the same type between them, we get a multi-relation graph. While they are funky, they basically don't have that much of a use and they can, in most cases, be implemented by this. Just basically counting the edges. Let's go next. Well, next we have a labeled graph. So basically we are able to add labels to nodes and relationships. It's kind of useful. There are some implementation, but the best graph and most practical is the property graph. So we are able to add labels and basically any metadata connected to the nodes and the edges. And one question. Does this remind anyone of something? And if I do this, I add one to many, many to one and send someone. And if I do this, no, there won't be any treats. Basically when I was starting to read it, this is how we draw relational databases. So there's a guy standing in funny pants saying to you that basically we have graph databases, but we have relations that the relations are in the relational databases and we have relations in graphs. So what's up? To understand why relational databases are completely not similar to graph databases, let's go a bit into history because history is fun. Basically relational databases started in 1970 by Mr. Edgar Kot, who wrote a paper. It had some pages and to best understand why did it gain so much traction that basically relational databases are everywhere now. Let's see what always happened in the 1970s. Well, RAM got actually quite cheap because for $700 we could get one megabyte and for those young people in here, megabyte is the lesser of gigabyte, yeah. And it got cheap because two years earlier, it was about two and a half thousand. And next, IBM released its first dishwasher with 100 megabytes of this space. When I'm saying dishwasher, it's basically because it looks like a dishwasher, it's here. So, and it has one, also one common thing with a dishwasher, the access time and the read time. In the specs, they are actually down to the bytes, how fast you can transfer it and it's like one megabyte comma zero, zero something. So yeah, it was slow. And next, Intel releases its 8-bit processor so we can play Mario and we have almost 800 kilohertz. Again, kilohertz, megahertz, kilohertz. Yeah, it was slow. And three years later, we have 16 bits and one megabyte of memory. So yeah, things just got awesome. Next, Oracle, of course, implements the relational database paper and we have our relational database and Apple releases its first and only non-white computer. Next, we have the absolute grandfather of almost every processor you have in your notebooks or PCs or anything, the Intel 8086 and we have a massive whopping up to 10 megahertz of processing power. So yeah, this is the relational database history. Let's, now in contrast, look how the graph one looks. First, in 7036, it's the graph problem by Mr. Euler. And this history has kings and queens and so on and so on, but basically, we all agree that it's the beginning of really graphs in mathematics. Next, map coloring problem and we'll get back to it in some time because it correlates with computer history. Next, if we are talking about math and we want to be serious, we should have a book. So in 1936, we have a book and then it's the official moment that graph theory in mathematics officially starts. Next, map coloring problem is partly solved. Well, why partly? Because basically the proof said, yeah, we are right, we can color every map using four colors, but basically there are about 20,000 maps that should be processed manually it to actually prove that they can't be colored with less than three colors. And because we didn't have enough crayons and we are lazy, we had to wait 30 years for actually solving it and we actually solved it using our computer, of course. Next, Intel resets its Pentium 4 and it has a whopping almost three gigahertz of processing power. And another question, does anyone remember or is old enough like me? To know which part of the PC also evolves quite rapidly, thanks to the Intel 4. The radiator, it was a time when we thought that yeah, water cooling would be in every PC. Pentium 4 had such massive heating problems that basically Intel said, well, let's put it in a drawer. They took Pentium 3, put two chips in one processor and this is how we got a cold duo. And next in 2003, the first commercial graph databases. So why I'm showing here? Because if you look, this history is completely different. When we got the relational databases, we had a massive gain in computing power and when we had graph databases, we already knew that we can't go faster on one core. And if you think one more thing, if you think about relational databases and you think about the normal forms and you think for a moment, why do we have normal forms? It's quite simple to actually save the amount of storage of this space that we are using. If you are replacing a string with an int, so creating another table, you are actually reducing the space it will actually take. And now, gigabytes are cheap. Why? Okay, so we know what. Now let's see how the landscape actually looks like. First, we have Flock. Flock is basically a database, a relational database for the pool, why? Because Flock actually can get only the parents of the nodes. It can't traverse through the graph. You may think, why would someone implement it? And the answer is quite simple. Flock was implemented by Twitter and Twitter has a use case that if you tweet something, it gets all people following you and moves that tweet to your Twitter feed and that's all. So it actually shows that many graph databases are implemented in some specific problem in mind. If you will use them in the specific problem, they are awesome. They are most cases not a general purpose databases as we made relational databases as through 50 years. Next, Microsoft 3NT project, it's then been called Graph, Microsoft Graph, then it was, I think, killed. But basically the idea behind it is like, Microsoft has Azure, so they thought, let's process massive graphs in Azure. So what they did is basically created a graph engine that only sends messages, basically. So they abstracted, in a way, created a virtual actor for graphs so yeah, it was created with Azure in mind. Next, Titan, now it's been replaced with Julius and it's a graph engine that basically you can swap storage engines and it can combine a lot of things. Next, R&DB, if you don't know what to use, R&DB basically has relational capabilities, graph capabilities, key value store capabilities and everything you can imagine. Yeah, people are using it. Next, Affinity Indexed, one is for IoT and low latency, the other is for mobile devices. If you have a case that you actually want to use, want to calculate graphs on a mobile device, why not? Next, Neo4j and it's without no reason that it's in the top right corner. We'll be seeing a lot of Neo even in the stock later so I won't concentrate on it. Next, Allegograph and Hypergroup DB. Those databases are being developed in one use case in mind, basically graph storage, sorry, knowledge storage and basically going through knowledge. Okay, let's go further. Usage, this is the Amazon page and you can't see it but yeah, basically when you go to Amazon, you can see that Amazon basically doesn't show you similar books, they show you books that people frequently bought together and so on, so on. So how does Amazon do it? This is my guess, so quite simple. Let's take one person, he goes into Amazon, browse a bit and buy something. Let's take another person, another. So basically what you see is when you think in a graph way, those are pages and so projects, you actually can see the patterns in data. Some nodes and some edges that just are thicker and light up as a Christmas tree. So what do we see from here? Basically if anyone enters this node, we should be showing this node because people aren't buying as often this node but from this node going to this node. So they in some way maybe similar that we can't know precisely but yeah, this is our buying path. Let's go in a similar use case also connected with money, money laundering. If you have a million euros, you can't actually buy a house by a portion and so on because the IRS will come and ask you some simple question, where did you get the money? And then you will be said, yeah. So basically what you do is give this money to a lot of people and basically they go through the chain of restaurants and so on. And this is exactly the same as was done in the Panama Papers. How they did it? They just inputted each invoice at a time, basically who paid who and how much. And after some time they started seeing some relationship and thanks to this they were able to see a really stupid face of a prime minister of Iceland when he was asked what are his connections with some foreign offshore company? Yeah, my computer is attacking me, so sorry. Some technical problems, sorry. Okay, let's go further. NLP and this is over simplification. I know but stay with me. If we have a sentence saying, find me all sushi restaurants in New York that my friends like. We can easily understand each word of the sentence, sushi restaurants that my friends like. But answering to this question is quite complicated. Well, it is not because if you grab this, basically get me, my friends, basically I don't have that many but yeah, let's say, get a note of location New York, get the type of the cuisine. So as you see we have this part, we have this part. So the answer to this question will actually be the notes filling the graph. It's basically that easy. And there's also one interesting thing because if you are talking about, for example, Facebook, we are talking about billions, billions, hundreds of billions of notes. But actually no, in each part of this query we actually triggered each note and it's children. So it's fast. Yeah, next knowledge graphs. So basically New York Times had this case that they have a huge knowledge graph because if you write any article, you want to have contextual, actual news, actual data. So if you take Apple, Apple can be a fruit, can be Apple Inc, can be some films. There are even a Star Trek episode. There is also Apple Records, that's the Beatles, a Russian party is nicknamed Apple. So there are a lot of it. And if you look at those types of data in a relational model, so basically you have the Apple, it's one Apple, then you label each type and according to each label you look at the different relationships, you can do really awesome stuff like knowledge graphs from Google. You see Euler and Euler has some fields like bond, diet, education and influence. And basically influence is a field you wouldn't normally put other person but someone may argue that yeah, it's for famous people and so on. So you have the field influence. But then again, you have Spock. And Spock has a field species and with a value of Vulcan and this value is a link. So with graph databases, when you are adding knowledge, you are not changing, you are adding knowledge and this knowledge influence your database. You don't have to think ahead of your structure, your table relationships because if you do it, you won't have Spock. Next, performance in short theory. Basically what are you doing if you are implementing a relational database in hierarchical structure in relation to the database? You have let's say table persons which is quite long has many columns and you have table person child. And the problem with person child is that it's quite narrow because it has ID, parent ID, child ID. And what you are doing when you are asking a relational graph question is basically you are taking one ID and matching with this huge table with parent child. Then you have some IDs like five, 10. And then again, once again, you are matching it with this huge table. You get the point. It's not the best way to do it in graph databases. You actually store the IDs at the node. So getting the node, you actually have the IDs. So you don't have to process it individually and do this join all the time. But enough talk, let's see examples. Let's take Twitter from 2009. Basically, you will have the slide. This is the whole description. Almost 2 billion watches, more than 40 million users. And with the statistics, what they've done is using Titan is basically bought the cheapest virtual machines on Amazon and six really expensive ones. And what they were able to achieve is those machines were processing all the time in a loop basically. So this is the times that they got. And it's quite fast. And it's what is even more interesting. It costs them 11 bucks an hour. And that's cheap. And yeah, let's go Neo4j. Why Neo4j? Why did I get interested in Neo4j? Because there's a site, dbengine.com. And on this site, Neo4j is the 20th or 21st multiple-player database. And it's a really good position. And of course, it's the number one graph database there is. It has drivers for almost any language. Cypher just got, we have open Cypher. So Cypher is basically the industry standard for asking graph questions. That's it. And of course, there is a free version. And yeah, so my problem, I have Cookit. Cookit is a contextual search engine. So what it does is basically scribes the web. It's looking for a page that has a cooking recipe. Then it extracts the text. Then from the text, it extracts the ingredients, amounts, units, and a lot of data. It's doing it automatically. So one part of the most critical parts is basically connecting those ingredients found in the text with the actual ingredients. And this is actually the graph of my ingredients. And units have about 3,000 nodes. They have more than 3,000 edges. And because this is a tree, so there are no cycles, the depth of this graph is about ninth level. And there are eight main ingredient groups. So basically, what was my problem is that I have errors. And sometimes I misinterpret some ingredients. So I basically wanted to do some sanity check on my data. So I wanted to ask this, if there are any recipes that have fish and sweet drinks, like Coca-Cola, you don't normally see those two things in one recipe if any sane, not student person is cooking it. I used to do really crazy stuff cooking and studying. So basically, let's do it in Neo. First, I have a match. And match is the keyword. And this will be Cypher. And Cypher is a crazy love between SQL and ASCII art. So yeah, but it's really good. It's better than SQL. You have a match. And match says that, yeah, I will be showing patterns. And this is a very important word. I will be showing patterns. Now, not exactly matches I will be looking. So next, I'm saying that I want to find a note. If I put anything in round braces, this is a note. So yeah, you actually can get the ASCII art. And this note has to be type of recipe. And I'm calling it R. Next, I also want ingredients. The same notation. Next, I want a relationship between those. And this is the name of the relation. And yeah, this actually shows you the ASCII art part. Next, I would like to return them. So yeah, R and N. Next, a bit more funky part. And the most awesome part is like, I get the ingredient. And I'm saying that, well, traverse from this note through relationships of type from and specification and do it from 1 to 9th level, basically. So I'm doing, yeah. And until you get to the note of type ingredient, which has a data of name, sweet drinks, and I'm doing exactly the same with fish. And as it turns out, you really should put a limit on it. Yeah. This query actually takes on SQL server about 20 seconds. On a Neo, I was able without any problems to get to 100 milliseconds. Yeah, because I am asking a relational graph query, and it's fricking fast. And what's fast? I could do it faster probably on our relation database. But I would have to denormalize my data. And in this case, this is a sanity check. It's not a part of my domain. So I don't want to have any, I don't want to change my domain, change my database to fit to answer this query faster. Yeah, so to wrap up, because my time is actually ending, I think, when to use graph databases, when you are talking about hierarchical data structures basically. Next, there's a saying that you should use graph databases when your relationships are more interesting than your data. So what this sentence is saying is that basically, if you are more interested in the relations than your data, like actually the text and the blob storage and so on, go to graph databases. Next, for searching patterns in data, you can actually visualize quite easily. And they are really good tools, also integrating with Neo4j very, very easy, used in Panama Papers, for example, and to get the sense of the data. And when you should not use graph databases. First, for data manipulation and heavy systems. And if your nodes will have hundreds of fields or will have huge blob objects and so on, so on, think about it twice. Because graph databases are for relationship, not exactly for storing blob files with 100 megabytes in each node. Next, when you are thinking about big data, ACID transactionality won't probably be your best pick because it will be heavy. And the last but the most important, I really wouldn't like for someone saying, yeah, yeah, I will just scrap my relational database and replace it with a graph database. Because in some cases, it will be really awesome and fast. But they are not the most general purpose databases that basically, rational databases aren't also. But during those 50 years, we were able to make relational databases good at almost anything, but not super good at anything. Yeah, so questions, and I will wrap up. Thanks. Thank you.