 Our next talk is about Graph databases, please welcome Francesco Fernandez Castanio Hi, my name is Francesco Fernandez. I am from Madrid in Spain I work as a software engineer in Bicot, and I also run the CC++ user group there in Madrid and also Neo4j User group and today I'm going to talk about graph databases a little connected to so Let's start by the beginning There's a lot of people talking about no SQL, big data, why Relational databases don't scale, but these kind of databases, graph databases are based on graph theory And graph theory is a bit old topic Let me introduce you this guy Probably you will know it, him, sorry He's Euler. He was a mathematician from the 18th century and He's the guilty of the graph theory He developed a lot of mathematical stuff also the graph theory and He has a lot of time to think and question things to himself And he used to live in Brazil in Koningsberg. I think that I've pronounced it well He asked himself Okay The old town of Koningsberg has seven bridges Can you take a walk to the town visiting each part of the town and crossing each bridge only once? Does somebody know the answer? Well, the answer is no, but this is not the interesting part of this Question with this problem. He started to developing the graph theory and thank you to his work We have this this kind of algorithm, this graph databases and everything and he ended up Defining a graph in this form. It's a very concise form and a graph is just a another pair of set of vertices and edges that connect that vertices. I have to read it It sounds scary, but we are used to to deal with graphs every day Even my mom is used to to deal with graph here. We have an example here we have a map from the Manhattan underground and We have in one place We have the stations that are our notes and the connection between the station are their relationships or the edge of our graph So probably most of you have come To here to Berlin and you probably have run some graph algorithm to find how to come here to Alexander Plus. Oh, I am in this place. How can go to Alexander Plus? Probably is not the best the shortest path, but you have found a solution. Okay, but What is a graph database? Does somebody knows what is a graph database? Any idea? No, okay It's a very simple concept. It's just a database that use Graph as a main data structure Today I want to talk about Neo4j and Neo4j implements a property graph And what is a property graph here? We have the definition of a property graph in in a form of a graph So a property graph store nodes and also relationship This relationship connect our nodes and both of them could have properties and what are properties just a pair of key values Okay And as I told you today, I'm gonna talk about Neo4j Neo4j is a graph database is written in Java. Sorry, it's not Python It provides an exit transactions, a REST interface Cypher language that is a declarative language to query the database It's open source and it's a no SQL database but Probably you are questioning yourself. Why should I care about graph databases? I usually Work with MongoDB or probably Postgres, MySQL and everything is okay. What should I learn a new technology? Well, I think that probably there are a main reason to Take care about these these technologies And I think that the traditional way When I mean I mean with the traditional way when I working with Relational databases if we're dealing with highly connected data this approach is a bit artificial because Relational databases weren't designed to to deal with connected data So probably we have some some problems because we have to deal with some met information We have to deal with foreign keys If we are working with a many to many relationship We even have to create a new table to hold this met information We have to take care that this information is consistent So I think that we are mixing our data with some our metadata in the relational case If we are working with a Documental database it we have the same problem if we we want to To work with connected data the scenario is even worse I think we could have to run we have to run Some how-do process or whatever to get some information and we don't we cannot get insight in real time and We probably face to some scalability problems in a highly connected Domains, so probably we will have some problems of of performance Some guys the Neo4j in action authors run an experiment They wanted to compare the performance Between my SQL and Neo4j in a highly connected environment So they run this experiment. They don't they model a domain social network with with users that follows between them and I Think that they store a million of users and a lot of relationship between them and they compare they wanted to know Give me the friend of my friends friend of my friend of my friends until at the depth of five Here is the table we can see that there is At the first level the times are similar, but when we go deeper that the times are far away from my SQL takes a long time to to finish Why why this happens? Probably we will design our relational database in that safe We will have our user table and then many too many Relationship that this is their relationship between users in another table So each time that we are looking for the friends of one user We have to look in this table. It's an index lookup and it has a complexity of log of n because we are looking for an index While when we are working with our database, they are designed to get the neighborhoods for free They are starting in a safe that we get in a constant order of complexity what happened when we go deeper in a relational environment we get this complexity because per each Depth that we have we have to look into our table. We have to have an index lookup So it is multiplied by the name of the depth of our lookup While when we're working with a graph databases We end up with this complexity because we only have to transverse or graph But other reason to to think about using graph databases could be that we can't transfer our domain model in a natural way When I face to a problem I usually grab a paper and a pen and I finally ended up with this kind of drawings I have some entities. They are related to each other The relationship are some semantics. So this is some kind of Diagram and if we are using a graph database, we can translate this to our storage directly We don't have to take care about Normalizing my model and blah blah blah this this kind of thing that we have to do when we are working with Relational databases Probably using a graph database for a I don't know storing documents is not the the best solution But for other scenarios could be rational. Okay What are the use cases for for grad databases? Okay For example, we have social networks the well-known use case someone follows This is the model of Twitter for example Then we have all the use cases for example just partial problems. I want to go from point A to B So this is a classic algorithm And that is all using graphs for detecting fraud Authorization network management to build recommendation systems in real time and there's a lot of other use cases Okay And now I will start talking about Neo4j. Let me introduce you to Cypher. Cypher is a declarative language Is ask oriented so we in some way we translate our What time we are representing to ask a code as a drawings you will see better in later Slides and we look for patterns. Okay And now for jay give us these these layers to access to the API's On the top of it. This is Cypher's then we have we can access to other API's traversal API We have to write using some JVM language to access to these APIs. We can use jaython if we want to Okay, what is the simple simplest thing that we can represent using Cypher? This thing I know this related to another one a is related to be on the top We see a drawing and below we have the Cypher representation The translation is very straightforward as far as you can see. Okay Then we can represent other things for example here. I'm telling that Eric Clapton playing cream we have one node that is very Clapton and we have cream that is a Avant and we have a relationship with some semantics. So we are relating the two entities using a graph Then we have our example of social network We have some users in Neo4j. We can label our nodes because Probably we want to categorize our nodes. So here I'm saying, okay I have some users and they are related they follow each other's Then I can also add properties to my nodes, to my relationships. Here I'm representing that Eric Clapton has some properties in that case and a name that is Eric Clapton and also the relationship has an Property that is a date when he started to play in that band. Here I'm trying to represent what bands Musicians that play in bands and the styles that these bands are labeled And what is the simplest thing that I can query to Cypher? This thing I am asking to Neo4j Give me all the nodes that are related with this Relationship with our relationship that is labeled with playing So it will give me all the nodes that are related with this relationship And it returns all the nodes. I Can look for for other things Here I'm asking to Neo4j Okay, give me all the nodes that are related with playing and also in the other side are related with labeled So basically it will return me all the nodes that Musicians play in a band and the style of this band and it returns some properties Okay, but we can look for some some particular notes Here I'm asking to Neo4j. Look me in your index a note that have a property name with a value Clapton, so we will have an starting point We have the note with this value that represent Eric Clapton And I want to know all the bands in that Eric Clapton played and the style of this band This is the the goal of this this query and I return some properties of these these nodes in that case I get the the name of Eric Clapton the name of the band and the style of this band Okay Then I can look for more for more patterns here Here I'm saying to Neo4j. Okay find a note with an Eric Clapton again and give me All the bands that have the style blues and looking for two notes in that case I'm asking to Neo4j that look look me for the note with this property name Clapton and also this note with this property blues and look for the bands that have these properties have this this this relationship and it return order by by some field and by Okay We also can have optionality in our relationship here we have we evolve for the model and we also have the there been no their relationship between a musician and a band and He called musician can produce also bands so Here we are looking for all the bands that Clapton playing or produce And we are filtering by some date as you can see is at some point is similar to esquivel Also, we can We can have an optional Depth here. I'm saying to Neo4j. Okay. Look me for all the notes that are related with this property At a maximum depth of five. So he will look for me and he will be give me a a1 a2 a3 a4 a5 all the paths If they are passed until depth of five He will he will give me all all the all the notes. Okay Here we have a more more developed example It's a just partial problem and My goal is going from A metro station in madrid to another So I look for a station. I am in sol And I want to go to retiro. Okay. So I look for these two notes I ask Neo4j that find for me these two notes And then I find all the connections all the paths that There exist between these two stations Okay, so probably I have one two three or four. I don't know and and path to that connect sol with retiro and then I I reduce I add all the weights between Between all the station that is composed the path And I get the shortest path Just notice that Neo4j Has implemented all this kind of graph algorithm. It provides a sorted path Distra a star all all of these kind of graph algorithms are implemented in in Neo4j. This was just an example As I told you Neo4j Give us and rest api to query to create notes and everything There are some occasions where we need to extend this rest api So we can extend Neo4j using extension manage or unmanage so we can write some Some algorithm using the api that transfers it api for example And we can expose this as an endpoint in our api Uh, well, this is some example right then in java. Sorry Uh, there are drivers for almost every language as I told you is we access via Rest api if you want to use using python, I recommend you buy to now uh Is has a module for jango, I think and I also my conclusion. I want to quote martin foler and Instead of just picking a relational database or probably mongo dv Because in hacker news is the The trending thing we have to think about our data and what We have to do with this this data probably We have to tend to polyglot persistence have two three or five databases in our systems to to explode this this data If you would if you want to know more about this topic, I recommend you these three books No, it's ql. This tile this tile by martin foler Neo4j in action and graph databases uh, also if you want to To try it Without installing it. I recommend you graph mdb. That is a Neo4j as a service. There are some free plans To try it and Okay questions So this is all very new to me and I have only very vague idea about that but From what I've seen my impression is that we basically store records in notes, right? And we label the edges with the relations that okay, uh, so In sql when I want to create a new record. I have to put it into a table For which I define the type, right? So I define all the attributes in advance and I define how they should look like I'm Am I required to to do it here as well? Do I actually have to define the type of data which I can store in the note? Or can I just do anything with the cipher? Statements you can do anything that you want. There are no pretty fine schema Okay, so this reminds me then of uh difference between dynamic and static type languages It also so what what happens if I Write a statement in in uh cipher that actually Uh, doesn't make sense I would ask for four relation Between or I would create Two notes and connect them with the relation And then I would create other two notes That would have that would carry different type of data and I would connect it with the same relation I could I could create many statements that probably wouldn't make any sense What happens then Nothing no Okay, so it allows you to Store whatever you want. Okay. So so basically the issues or the problems are Solved during the runtime when I run the statement Probably it will return nothing if you are requiring something that doesn't make sense or Something that you you didn't store before But there are no type checks. Okay, like Is are there any advantages that this brings to us like Dynamically typed languages definitely have some advantages out of this. Do we see something in the data? Yeah, there are advantages. Uh, you can evolve your your model As well as you evolve your your program You are not tied to a schema or So for example, if tomorrow I want to in my example of musicians. I want to ask the engineers that Engineer the the albums of these bands My old voice will still work and it can evolve without touching anything It's it's more agile this It's like in no sql philosophy Yeah, but some real world scenario where disadvantage can actually can play role would be interesting to me Thank you for for your answer. Thanks Hi In the example that you had where you're Searching for two kinds of relationships. It was uh artist and producer or musician and producer or something like that. Yeah That one In that query can the result contain the type of connection? Or just So here you are just starting in the R variable You have information of this relationship and so you can get that. Yes Okay, thank you Um, sorry. This is a silly question You're you're adding all your objects in their relations and then you have a database full of stuff Is there are there tools that can sort of introspect that to then just sort of not uml But dump out the relationships that you actually have within your database I can hear you. Can you repeat the question, please? So Once you have your database full of data Um, is there something that can output sort of a summary of the relationships that are stored within the database? Yes, you have some web interface That represent graphically What have you stored in your And that's part of cypher or part of it's part of neo4j. Okay, and there are other tools like Link torus. I think that exploring this way on visualization of your data Thank you for your talk You said that the relationships you get for free. There are no indexes And uh, there is just on the slide. I wanted to ask how it is implemented that our date is greater than 1968 So there are actually some internal indexes for comparison or it is linear search Just uh, when you are looking for properties In the background, neo4j used lucine. So when you are looking in that case for named Clapton You are using lucine. So probably this could be a handicap of this kind of databases because the You have to to go to the index Yeah, okay. Thank you Are there any more questions? Okay, thanks a lot for your talk