 Am I audible? So hi guys, I am Sonal and I am going to talk about Neo4j and graph databases in general. So before we start, let us have a show of hands. How many of us here have on xp with working with graph databases or Neo4j in particular? Quite a few. And how many of us have graph databases in the future? Great, I think that is the whole audience. So rest, because this talk will have some each one of us here. Start with basics of Neo4j and we will be moving on a little advanced concepts. So the plan of the talk for today would be something like this, we will start with graphs and no SQL databases. We will move on to Neo4j and Cypher, a bit of explanation for those of us who are new to Neo4j. We will then take a look at Python Neo and the REST API that Neo4j exposes and finally we will look at use cases and demos to what cool stuff we can do with Neo4j at the back end. So how did this all start? About decades back relational database truly ruled the database world and pre-new was to store data in the form of columns, rows, tables and when you encountered highly complex data, highly connected data, options that you had were to use primary keys. Hello, yeah, yes, so you had to use a lot of joints and that was all that you could do with relational data plus the added disadvantage was if your data structure changed at some time, there was no option to include that in your database. You had to completely re-instantiate a new database, migrate your data to it and then you could change the structure. So that's really horrible for interconnected data, right? So the universe balances itself as it said, the new SQL world came into existence to encounter these problems and it was not just a single database, no SQL addressed problems of all existing data types. So wherever you see a different type of data springing up, a new database came into existence. Today, no SQL databases are divided into four types. We have key value stores, they store what they say in the form of keys and values. Then you have column family, when your values are divided into multiple, when you can have multiple values for a single key, you have, we can separate them into columns. Then you have document stores, when one entity has a lot of key values associated with it, that's called a document and you can associate it with a relative key. And finally, you have graph databases. You must be wondering where in this graph no SQL, my SQL databases would lie. So this is where they lie. Now what this graph depicts is a comparison of data size and data complexity. So key value stores can handle a huge amount of data, but the relative complexity of data that you can process there is quite low, whereas graph databases, they handle a very high complexity of data. But the amount of data that you can handle there is quite low. This is because there's a tradeoff between the traversals that you perform on a graph database and the amount of data that you have. So why do we actually use no SQL databases? First thing is elastic scaling. So if you had a relational database with you and suddenly your traffic increased and you started collecting a lot of data, the only way you could scale initially was to scale up. That means you threw in a lot of RAM, you threw in more disk space, you shifted your code database to a larger server, and that probably should have solved your problems. But then elastic scaling deals with not just scaling up, but scaling out. And that's what no SQL databases do. So when you have to scale a no SQL database, you can add more nodes where your data gets distributed over them. So you don't have to worry about what happens to the existing data. You can just keep adding nodes to your cluster and they scale out very well. Plus the new fund are called Big Data. I don't know actually what people understand by it, but Big Data is not just storing and handling large amounts of data. It's also about efficiently processing large amounts of data. So we have something called transactions, which are small units of computation that you perform on a data set. And that's what no SQL databases handle really well on very large data sets. You not only can store large data sets, you can process them really well. Plus it's economical. You don't need specialized servers and specialized firmware to run such databases. You can add a cluster of personal computers, old machines, Pentium machines, and then set up a very efficient data cluster. Plus you don't need database administrators. People specially trained in handling databases. Anybody can handle a no SQL database. Then there are flexible data models. We'll come to data models later in this talk. So for people here who have less exposure to graphs, let's introduce and brush up a few concepts of graphs. So where do we see graphs? Graphs are basically existent all around us. Wherever any scenario you take, there's a graph that you can visualize there. So anything which you hear with the word social is a graph scenario. So websites like Twitter, Facebook, LinkedIn, even once like SlideShare, Groupon, Foursquare. So all that's mentioned in this graph are actually graph scenarios. They may or may not be using graph database at the end, but they are graph scenarios. Maps, how many of us have done projects on maps? Great, so personally I have done a project on maps where I had to divert an ambulance through Bangalore traffic. Yeah, so in that case it's really difficult if you have your map data stored in relational databases. You have to have a graph which will efficiently process paths and routes and shortest distances. So maps is a very good example where graph databases can be used. I'll be talking about a graph database example at the end and we'll see how queries can be performed on that. Logistics, that's another division that uses graph databases on a large scale. So when you have to route your packages from one place to the other, you have to think about the economy of it and the time about it. So your package has to reach your customer in the minimum possible time and the shortest possible route so that you bear less expenses, right? So companies like FedEx and UPS, they have integrated graph databases for calculating shortest distances and they are really successful, as you know. Airlines, all of the same concept in the same tradition, they need to find the shortest distances because expenses are huge in airlines companies. There are some graphs that we come across every day but we tend to overlook them as graph scenarios. So when you open a LinkedIn page, there are topics which recommend to you news that you would probably like reading. So as you can see, this is on my page. I don't know why I would like to read this news, but yes, it recommended me that I just got fired. So these are places where graphs are used extensively at the back end. Plus, there are columns which you can see on a lot of sites like Facebook, LinkedIn, Twitter, where people are recommended to you. Friends are recommended to you to follow, to add as friends. So these are also prime example of graphs where you are connected to a lot of other people and algorithms are used to find out who are the people that you would like to interact with. So we'll also see an example of a dating site in the end, where if you enter a name, you get possible matches of people who you would like to date. When you visit video sites like IMDB or Netflix, there you get movies or series or video recommendations based on what you have already watched or what your friends or people like you have watched. So these are, if you think about at the graph scale, there are a lot of complex traversals that are going on, relating you to the content you've watched, relating you to other people and the content they have watched. And that's a big problem when you try to solve these through joins on a relational database. Apart from that, graph scenarios, there are a few others like recommendation engines, fraud analysis. You can analyze impact on networks, plus security firms like JP Morgan and all they analyze debts and investments through graph databases. Also, there's this file permissions on servers and file systems which can be implemented using graphs. So why do we prefer graphs today? Because graphs have been around for a long time, but graph databases are coming into existence a lot in the past four to five years. So this is because data is becoming highly complex. Most of us spend, today I heard about a group who are making an app that can find how much time you're spending on a social network. So if that's going to be a business, that means we are really spending a lot of time on social networks. And think about the data that we are generating, every click, every like, every comment that we are making. We are actually creating a part in the graph and multiply that by 365. So how much graph data are we generating in a year? And the companies, they have to process this. They have to find out information about you, for you, for others from that data. So connectivity of data is highly increasing and that is making it more and more complex, plus it's semi-structured. There's not a fixed set of operations that you do every day. It can be anything, you could like earlier used to have forms that had a fixed set of fields. Now you can have a text box which can be passed at the back end. So the data, it's not mandatory to have fixed columns or fixed fields in your data anymore because they are semi-structured. So the concept of graphs actually dates back to 1735. Where did this all start? Why graphs? So how many of us have heard about seven bridges of Konigsberg problem? OK, so everybody has heard. For those who haven't heard, just a brief. So the anecdote goes that there's a place called Konigsberg in Russia. And it is separated into two parts by a river. And there are two islands in the river. And these islands are connected to each other and the main land through seven bridges. So they came up with a weird problem that you had to walk through the entire land by crossing these bridges, covering the two islands and the two main land paths. But you have to cross every bridge exactly once. And this problem caught the attention of the genius Leonard Eiler. And what he did was he proved by contradiction that this was exactly not possible. If you have seven bridges, you cannot cross every bridge exactly once and cover the entire land. So he did it with the help of graphs. The solution that he proposed was with the help of nodes and relationships. And that was how graphs started. So for around 280 years, we just had concepts of graphs. We pondered over what can we do with graphs. And recent decades have seen people have thought about, let's store data in the form of graphs. And that's how graph databases came into the existence. So we've just been wondering for 280 years where we could have done the graph database parts a long ago. So OK. So most graph databases today store data in the form of property graphs. Property graph is very simple. It has nodes. These nodes have properties of their own. The properties are nothing but key value pairs. It has relationships, and the relationships have properties of their own. That essentially is what the entire graph database space is all about. So the major building blocks of graphs were nodes, relationships, and properties. And in recent years, there's something called labels. So labels were made to facilitate faster traversal of graphs. A label is a group of similar nodes or similar relationships. So you can group a lot of entities together as a single node. So your graph can have a lot of nodes which can represent people, a lot of nodes which can represent companies. Think about a LinkedIn graph. It can have nodes with people, nodes with companies, nodes with recruiters, nodes with job seekers. And how do you distinguish these nodes from each other? You label them. So when you query a particular type of node, you use that label to traverse only those nodes instead of traversing the entire graph. So data models for graph databases are classified into two types. First is a native graph. So a native graph has data structures which inherently store data in the form of nodes and relationships. Whereas all the other graph databases that are not native graphs, they store data in the form of tables, the old SQL type tables. And they use joints. They use a layer of joints and aggregates to simulate graph operations. So when you're actually thinking that you're doing a traversal on the graph at the back end, somebody has already written a code to perform joints and aggregates. Neo4j is actually a native graph. So there are a lot of popular graph databases around. You have Titan, which is another major player in the graph space. But we'll talk about why we want to use Neo4j. First, Neo4j is a schema-less property graph. So it's the same with Titan. It's also a property graph. It can handle connected data very efficiently. When you want to use graph databases in scenarios that have critical data like financial operations, banks, and other sectors, you have to have fully asset transactions. You cannot afford mistakes. So like the scenarios we described where JP Morgan and other companies use graph databases to process financial data, we cannot have non-asset transactions. It's fully scalable. And in recent versions of Neo4j, we have high availability clusters. We'll talk about high availability clusters in the end. It has a REST API for servers. So it's easy to query a Neo4j server at the back end. You can embed your application within a JVM. So Neo4j is basically written in Java. And its core API is in Java. So any JVM-based application can use it. But since Python wrappers are available for it, your Python applications can as well use it. The main advantage that Neo4j has over other graph databases is Cypher. Cypher is a query language that was designed specifically for Neo4j. Plus Neo4j is a graph DB which has extensive Python bindings. We have to talk about Python. We are at PyCon. So other graph databases lack this functionality of integrating with languages like Python. So let's see Cypher in action. Cypher is a highly expressive query language. It's actually a pattern matching language which reflects what you're thinking about the graph. If you're thinking of matching a particular pattern, you can represent that pattern in the form of a pattern. So it cares about what you do rather than how you do it. A basic Cypher query is mentioned here. So if you have two nodes and a connecting relationship, you represent a Cypher denoting this as one label two, where label is the label of the relationship that you're using. A little more complex. When you want to match a particular relationship from the graph, you can define your first node as node of one, M as node of two. So what one and two here are, when you store a node or a relationship in Neo4j, it internally assigns indices to that. So these are the IDs that Neo4j provides for as you add a node or a relationship to the database. It's automatically generated. So you just have to say match N or M, node one, node two, and the relationship. It will find your node one. It will find your node two and get all the relationships that is existing between these two nodes. And it will return that relationship. Do you see something similar to SQL here? So to make people's lives easier, this syntax is inspired by SQL. So you have commands like match, return. So I have briefly outlined the CRUD operations in Cypher, how to create, read, update, and delete your data. So we're not going into much detail of it, but it's on the slide. You can have a look later. So like the first query here, you define a node N which you want, the person. That is the label of that node. You define the properties that you want. Name is Chuck Norris, title is Analyst. And you return that. So what it basically does, it creates this entire node, creates its properties, and returns you the reference to that node. You can also perform operations like match where you get a person which we have seen in the last node. You can compare its properties. So where is a clause that you use to compare properties? A dot name is equal to Chuck, and B dot name is equal to Rajni, and you compare them. So A cannot find B. Yeah. So the update operations, you start with a match where you find the node that you want to update, or the relationship where you want to update, and use the set command to update it. For delete, you just have to specify remove. It's as simple as that. It also has a REST API. I've also outlined the REST crowd operations. So we use post to create a request that can make nodes and relationships in your graph. It's provided as a JSON document. The properties are provided as a JSON document. The get is used for read operations. Put is used for update operations. And delete is used for remove operations. But we are here to talk about Pyto Neo today. Pyto Neo is actually a wrapper around the REST API of Neo4j. It's a small library that was created by Nigel Small. He works at Neo Technologies. And over the years, it has grown into an amazing library that most people in the Python community use. So what it actually does is you can use Python, your Python-powered application, which is running with Pyto Neo, to interface with the REST API that's running atop the Neo4j. So you can initiate queries in Python, but which are internally transformed into REST requests. And the response that you get back is again transformed by Pyto Neo into a format that you can read in Python. So let's look at a bit of messy code. So you define an instance of the graph database service. And we specify the location of our database server. If you do not give anything, it will, by default, connect to localhost 7474, which is the default host and port of Neo4j. But you can specify where your remote address is. You can use node and relationship defined in Pyto Neo to create a graph simultaneously. So here, we are actually creating five nodes and linking them together. And when you execute this query, a single statement, that can create your entire graph. If you have a larger graph and a data set with you, you can write a simple Python script that can append these things together, form the query, and then execute it. But there are even better ways to create graphs, which we'll see. The graph database object that we created before can be used for a lot of other operations. It has inbuilt methods like clear, which you can use to clear your entire database. So try to run it on your own database and not on others' databases. You have to use create and delete with different types of parameters for creating and deleting nodes. There are also properties like delete index and find a node or a relationship. You can get an index. You can get an index node. You can perform more complex operations, like getting indexed relationships. You can use the inbuilt match method. So you can retrieve nodes and relationships and sets of relationships. So this basically returns a list. When you map something, it runs a list of all the matched nodes or relationships. But the most used methods are get or create index, get or create index nodes. Why? This is because Neo4j, when you're creating a node, if you're creating nodes for two people and a node is already existing in the database for that person, and you're not using indices, so what do you think will happen? It will definitely create the node again. Or it will raise an exception, because if it cannot create that same node with the same properties, it will create an exception. So to avoid that, Nigel Small has created a method, these methods called get or create index. So if your index is not present, it will be created. Otherwise, it will be retrieved and given a pointer to. But this is all about storing data in your graphs and reading from it. But if graph databases were meant just to store and read from, then there would be just a persistent graphs. And there's no use of storing. You could have just used a SQL database at the back end. So what are the advanced things that we can do? You can create paths. When you have graphs in mind, paths is what comes to you. So directly, you can create a graph path by defining the nodes named Alice, Bob, and Carol here. And you can create a path where A knows, B knows, C. So essentially, you're creating three nodes and you're joining them with two relationships, direct path. You can also join two paths using the inbuilt join method. You can also get or create a graph database by committing it. There's also advanced features like indices. So internally, what Neo4j uses is leucine. And that is a very effective index. Leucine has its own format of querying indices. In the native graph API that Neo4j provides, you cannot run leucine indices. But Py2neo has a wrapper around leucine as well. If you've seen the last part of this, you can use a direct leucine query in the parameter and query your index. So this is a leucine format of writing a query. So we come to something called property containers. Most of us who have been exposed to Python, we know about containers that exist in the Python language. So these containers, or you can call them collections, are used in Py2neo to create nodes and relationships. So what's the benefit of using property containers? Firstly, you can create advanced methods, like get relationships, where you can encapsulate an entire traversal to give you a specific entity. You can get related nodes of a given node. Plus, you can perform boolean operations for checks. There are methods like has relationship or is related to, which is quite helpful when you are looping through a particular set of nodes that is returned. For the relationship parameters, it is defined in Neo4j.relationship. This is the class that holds the property container for relationships. And you can have properties like start node and node. So when it returns a relationship to you, you can directly refer to the start node as dot start node. Similarly, the type of the relationship that is present can be referred to directly. But writing Py2neo code in direct Python API poses a lot of code to write. We'll see an example where a small query takes about 15 to 20 lines. So it's a lot. Python is a language which is designed for reduction of code size. But we are writing to execute a single line query. We are writing 15 lines of code. That's huge. So recent versions of Py2neo have come with interface with the cipher engine. So you can directly run cipher queries on top of the graph database using Py2neo. So you must be wondering about the performance of Py2neo. So since it's a wrapper around the REST API, it is a bit slower than the native Java API. That is obvious. So we have to do something which compensates for this delay in time. That is why the Py2neo creators developed the interfacing with cipher. Cipher is extremely fast compared to the native REST API. So when you integrate that with the Py2neo API, you can perform very fast reversals. So you can create transactions as well. So there are two methods in which you can execute cipher queries. One is by using the normal graph database service that we used in the previous code. Or you can directly import cipher. So what cipher does is it creates a session and then starts a transaction. You can append as many queries as you want here and then execute them whenever you want. But it's not committed to the graph. The queries are simply executed. In the end, when your transaction is committed, you have to mention tx.commit. If you don't do that, your transaction is rolled back. So that's the beauty of it. It's acid. There's a classical way of doing it also. You can use the graph database service and the function called cipher query where you provide the instance of your graph database and your query, and it directly executes it. But this does not guarantee an acid transaction. So when it's critical data, it's advisable to use the transaction properties. When you install Py2neo on your systems, if it's a Windows system, you can directly use the py2neo.tool through the command line. And any terminal, you do not need a specific shell or a interface to write queries. You can directly use the python-mpy2neo.tool. If you're on a Linux or a Mac, NeoTool is a very good option. It's essentially the same thing as Py2neo.tool, but it's a shell script. And you can directly use it to run your cipher. You can export your data from that in the form of a cipher CSV or a cipher tab-separated values. You can also import and export using geoff, which is a very good option for handling graph data. Plus, you can run the shell of cipher directly using NeoTool shell. Neo4j has adapted a lot over the years. And there are very advanced tools available with the Neo4j package. You have a batch insertter. So when you're porting data from one database to Neo4j, and you want to do that quite fast instead of just manually typing in, the batch insertter can be configured to export your entire data set to Neo4j in an automated way. There is high availability, which has been introduced in recent versions. So your data set, your Neo4j database is replicated across many machines in the cluster. And each machine has its own hot cache. So when you set up a Neo4j in production, it is advised to initially test it on a data set so that the hot caches are formed. And these hot caches are, depending on where, when you strike your hot cache, the transactions are fast or slow. So what Neo4j restricts you from is you cannot distribute parts of your graph onto different machines. You cannot store half of your graph in one and half of your graph in the other. You could do that theoretically. But the whole point of a graph database is to perform traversals easier and faster. And if you store databases in parts across multiple servers, you need to have inter-server communication. You have to have something like IPCs to communicate between the two graphs. So if you're traversing from a node on one machine to a node on the other machine, that takes a hell of a time. So the Neo4j high availability cluster does is it replicates your data set across multiple machines. You have the same database. But different parts of the database are loaded into the memory, into the cache of different machines. So and there's a record kept of this. So whenever you pass in a query to the system, it finds out those parts of the database where the required component is in the cache. And you can traverse that very fast. Plus, there are built-in online backup tools. So most Neo4j clusters today have a separate server in which real-time backup happens. So whenever a node is updated or a relationship is updated, it is reflected across in the backup node. The backup node is not used for queries. Plus, it has HTTPS support. So when you are using a REST API, you can use HTTPS. When you deploy Neo4j in production, there are a lot more required than just basic traversals or storage of data. So a company called GraphAware has developed a Neo4j framework where there are additional tools and modifications to make Neo4j even more faster. So you have a library called GraphUnit. It is used for unit testing Neo4j applications. You have libraries for performance testing and API testing. So how well is your API connected with Neo4j? You can test that. You have batch transaction tools and transaction event tools, plus some other goodies in that. Let us look at a few examples of Neo4j. Let us run, so first example that I'll show you is a dating site and a few queries that we can perform on this dating site. And next, we'll go to a map. Is it visible to everybody? Okay, so this is what the Neo4j interface looks like. This is a relatively new interface. Earlier, they used to use the legacy interface. So this was actually pretty good. I personally like it. There are tools here which can reflect your nodes, properties, relationships, and it gives you a beautiful demonstration of nodes and relationships with time when you're creating them, when you're deleting them. It also gives you a pretty good idea about the database space that you're using. So since we are short in time, I'll explain the parts that I've put in the Neo4j notebook. So initially, just like I said, you create a Neo4j graph database object. Let us look at the cipher first before we look at the Python Neo code. So what are we trying to do in this cipher code? We have a node where you specify a name and we try to match a given pattern to find the people who you are likely to date. So every person has some interests and every person is looking for some qualities in the other person, the person who he wants to date. So we are looking for common intersection between these two properties. So we define a node n, which can be any person. We see what interests does n have. So n has some interests and we also see that the someone that we are looking for has the same interests. So essentially this line is matching the common interests between two people and this is extended over the entire graph. Apart from that, to make it a little more complex, we also see in the same pattern that the someone should live in the same city as n lives in. So this is a single line query, right? When you execute this on the database, you get a visualization. So the analysts who are using it must be really happy. You also get a tabular representation where you can check out the name of the city and the person and the information about the person that you are likely to date, right? The same query, let's see how Python handles it. So you create initially the database instance. You get the graph node of the person you're looking for, this is the n. Then you get all the relationships. So you list and match the nodes that start with n and have the relationship type as has. So you can see that we are doing everything in parts. You cannot do it in a single query. So this will return and I'm just printing the top five here. This will return in the form of relationships and this is a rest response, as you can see. So this is clearly evident that Python is a wrapper around the rest interface. Then you calculate the interests out of all these relationships. You create a loop that checks what are the common interests between me and the person I'm looking for? So you get the same interest people. These are nodes. You again check for the city separately where they live in and you find an intersection between the city that I live in and the city that the other person lives in. And finally you transform them into sets and get a common list, the intersection of these two. When you print them, you get the names of all the people that you're likely to date. So you can see the amount of effort that you put into just simulate a single Cypher query. Now if you were to execute this in the form of Cypher, it just takes a single transaction event. You take the transaction, perform the Cypher query and execute it. And you get the records. This is written in the form for record where each record contains all the information about the person that you're looking for. You see the name, you see the city they belong to. Similarly, the other method that we mentioned of non-transactional execution of Cypher queries. So you just form a query string. This can be dynamically formed for different types of queries. You can just execute it using query.execute. So I've just printed the type here so that you can see what type of variables it returns. So when you form the query, it's Neo4j.Cypher query. It just forms a query, it does not execute it. It is of the form of Cypher query. So they have a class of Cypher query. When you execute it, it returns as Cypher results. And to iterate over that results, you have to use a method called stream where for each record in the stream, you process and find out the relative position. So record of zero will give you the name. Record of one will give you any other property that exists in that node. So this is the list of all the properties that it has returned. So this is another query that I'm running here. So this is actually a little more complex and this is in the line of a recommendation. So what we are doing here is, it starts in the same way as the previous query. You take the node, any node of a person. You check which city he lives in and find all the people who live in the same city. You check the orientation of the person, person who he likes to date. He must not be of the same gender. And you also check that the person has, wants the properties that the other person has and the person has the properties of that the other person wants. So you're actually checking for compatibility. Then you return the distinct names of the cities and the name of the person. You collect them and you return the top 10. So you count all the attributes, you count all the requirements and check the degree. So if there are four properties matching between you and me, then we can date. So that's a thing that you're calculating in this return statement. And you limit it by 10. If you execute this query here, so you can see all the, you can define the format you want it to be returned in. So the name of the city from which the person is, the name of the person, the name of the interests that I have and the person likes, and the name of the interests that the person has then I like, the number of matching characteristics and the number of matching characteristics that I want and the number of matching characteristics that are present. So this is just random data so don't judge it. So these are the kind of recommendations. So you can use it for product catalogs. If you're running an e-commerce website, you can traverse the previous history graph of the person and check for products that the person would like and display them on your website in live. Let us also take an example of a map data. So we have a map database, we just need to connect to it. So let's see what type of operations you can map, we can perform when your map data is stored in the form of a graph. So suppose you wanted to travel from a given, this is a random postal code. If you wanted to travel from a random postal code to a random postal code, what are the places that you have an option of visiting in that place? So the query is as simple as this, you need to match a location where you have the same postal code or a similar postal code. This is just to check the case and all. So you return the number, the house number, the street number, the local address and the postal code of that place. So we just refresh this page so that the data clears off. So you see, if you wanted to go to a place with a postal code of this number, these are the places that you have an option of visiting. When you implement that in the form of Cypher, it's better to go for a transaction because it makes your asset properties to use. You can also query like from which junctions can I get to a place called Chertsey. So the unknown here is the junction that I'm going to, the depth you're mentioning here. You also check the properties that the name of the place should be Chertsey. And then you can use the J, yeah, you can use the J that is returned from here, the junction and calculate the properties and return them. So if you execute this query in the interface, this is what we get. So there's a place called hatch farm roundabout and this is the place where you will reach. This is the way you implement that in Python. So another problem that is very easily solved using graph databases is shortest path. There are inbuilt methods that Cypher as well as Py2neo have devised which can easily get you to use shortest path problems. There's one point to note that Cypher supports a lot of existing algorithms like ASTAR and DISTRAS which are already there, you do not need to define them but there are no existing wrappers for Py2neo which I think is in progress as the last issue on GitHub showed. They are developing wrappers for existing algorithms as well. So you can take a look and contribute to the code if you want. Let's see how shortest path works. So what it does is you take a location, you find all locations that are related to that location, get the postal code of one location, get the postal code of the second location and find the length of the relationships. So it actually finds all relationships between two existing nodes in a graph, find the lengths of those relationships and gives you a list of the shortest ones. So instead of getting the shortest path, you get to know the exact path, where you have to go. Do you think it's something like Google Maps does? You can create your own Google Maps duplicate. There's one more query that's good. Like if your nodes are already storing the average speed that is required to go through a particular area, you can calculate the time that is required. So when you search on Google Maps that you go from one place to the other place, it gives you an estimate of time according to a given situation, right? In P cars, it'll give you a different time. In a normal time, it'll give you a different estimate of the time. So let's see what this query does. So here we're using an operation called reduce where you calculate the total kilometers that is traveled and reduce them by the weights. So each node here, like in the previous example that we ran, let's see what each node has. This is the number of the node. This is the number of the property. So in the property, it is defined the speed at the peak time, the speed in a normal time, the type of the node and the distance that is there in that road. So when you execute this query, it'll give you an example of the average kilometers per hour or the miles per hour that you need to travel in the path that you have selected between two nodes. It can also match a little more complex queries where shortest path is used along with reduce. So how long would it take to drive from a place called 48 Rallstone Moor in London Road to 123 Stains Road? This is exactly another situation of the graph database problem which you can relate to Google Maps, right? So they were pretty simple use cases where you can use social network data, you can use recommendation data, you can use map data to perform complex operations. So in a post that Nigel posted a few days ago, there's a lot of hope for Python U in the future. There will be support for fast HTTP. Python 3 support is already there. It's probably not released yet, but it's already done. There's multi-threading support available. There are more command lines tools coming up. There'll be new methods to ease your process of communicating with the database. There'll be getPaths, so you just have to specify a given node and you can get all the paths that are existing in that database to that node. There are a lot of other relatives in the Python family for Pyto Neo. There's a framework called Bulbflow which is a separate framework devised for communicating with all types of graph databases including Neo4j. We have an object graph mapping for Neo4j called NeoModel. We also have Neo4 Django which makes it easy to integrate Neo4j as a backend for any Django application that you're building. So since you're running short of time, that's it for today. We're open to questions if anyone has. Well, don't be stopped by Neo. He's not trying to stop your questions. Hi. So Neo4j has a notion of nodes and a relationship. So I want to ask that the relationship between two nodes is unidirectional or bidirectional or there is no such thing in Neo4j. No, there is. You can specify what direction you want. You can have a bidirectional relationship. You can have a unidirectional relationship as well. So the concept behind it is to relate one thing to the other. So you can specify two relationships as well. So there's an option of specifying the direction. I think I saw a question here. Okay, we'll come to you. Yeah. So Neo4j is like it's not a replacement for relational databases. It is going to more argument or complement the relational databases. So there is a need to transfer the data from relational databases to the Neo4j and models are quite different. As you mentioned, this is completely on this one. So do you have any suggestions or what kind of tools that we can really use? Yeah, they have developed a batch insertter for it. So like when you have a very large MySQL database at the back end, you cannot transfer the entire data to a Neo4j instance, right? So there are special techniques where Neo4j is basically used for analytics. So when you have a part of a data that you want to use for analytics, you transfer that analytics is not something which you do on the fly. It is done once in a while. So on warehouse data probably. So what you can do is you can take that part of the data inserted into Neo4j, perform your operations, and then do away with the data. So these kinds of operations are much simpler instead of replacing your entire database with Neo4j, you can use your entire database, part of your database to transfer the data to Neo4j, process it, get your results, and then remove the data from the Neo4j. So basically what you are doing is you're making the complex operation simpler with Neo4j by a little overhead of data transfer. He's been waiting for a long while now. Okay, we'll come to you. You want to go first? Yeah, sure. Yeah, so you mentioned about Cypher following more of a syntax like a SQL, and then we had PyTunio, which is more closer to REST-based, how we use for Elasticsearch kind of a syntax. So now you have PyTunio where we can import Cypher and use Cypher queries inside. So don't you think like when we are using Cypher and PyTunio both together, it become more cluttered of a code where you have one syntax in one format and another in other one? Cluttered of a code, I think if you're not using Cypher, your code is more cluttered. So as you know, PyTunio is actually a REST wrapper. So a REST request is slower. If you think about it, it was a necessary step that was taken to include Cypher in PyTunio. Cypher queries are much faster than REST queries. They have the less over time, overhead of data transfer through JSON documents. You don't need, that was just legacy. You know, it was, it began, PyTunio began development as a REST wrapper and then moved on to this. Plus REST provides you flexibility, like a lot of options are there which you can do with REST over a server interface. But yeah, they have been incorporated into Cypher lately. Finally. This Neo4 Django, how is it? What exactly does it use behind it? Same, it's a wrapper around the Neo4 and PyTunio. So it is a REST API interface. It's implemented with REST at the back end, but it has specific methods which are directly callable from your Django functions. And since it's based on REST, so it would be easy to integrate with MongoDB, because if I have the whole website on MongoDB and I want to use this for basically finding relationships, it would be much faster with Neo4j, right? Yeah, of course. So if you have related data, then it's much faster in Neo4j. Plus, there's also one point I would like to mention, like, you know, if you have a lot of data, so I wouldn't say Neo4j is a good option for that. So it's around a huge data set. Neo4j does in-memory traversals. That is why it's fast. So in-memory or in-cached traversals are fast when you have relatively good amounts of data, but not very huge. So like you see, Facebook has implemented graph search. You can't, you just search specific stuff. So maybe graph, at the back end, we don't know, but maybe at the back end, it's only implemented on the names of the people. The nodes are the people themselves. You can search people. You cannot search which company they work for, right? So that's the problem that they're trying to solve with graph databases. You cannot transfer the entire graph data. Facebook cannot operate on a graph database at the scale that they are working on. Special cases of that data which you can traverse using graph databases. Hey. Hi. I want to know, like, I was using Gremlin for querying Neo4j. Actually, Gremlin was supported till the last version, officially. I feel Gremlin is more easy to learn. I have used that. Gremlin is a generic language for all graph traversals for any graph database. So... Even I'm using BULF for, you know, on the ORM layers. Okay. So Gremlin is the enforcement. Definitely. So does it end, like, do I reduce a lot of, you know... Not much. I feel Cypher has been quite optimized to work at the level of Gremlin. Probably for Neo4j, Cypher works faster than Gremlin. This is from personal experience, I haven't tested it out yet, but yeah, I feel that. Okay. So, like, you see in this, this is the old Neo4j console where you had a shell for Cypher queries and HTTP. This is a new version. In the previous version, we also had a tab for Gremlin. So you could perform Gremlin queries from here. Okay. Yeah, thanks. Yeah. Anybody else? Any questions? I guess that's it. Thank you. Thank you, guys. Thank you, Sonal, for the lessons on Neo4j. Lightning talks has been