 Hello, hello everyone, thanks to the organization, thanks to the Big Things for giving me the opportunity to explain my experience in the use of graphs and in this industry of analytics. So I hope you will be interested, it will be interesting for everyone and that's the objective to learn a little bit more and to see how graphs fit in the analytics industry. Let's see what we are going to review during this a little bit more than half an hour. We will have a brief talk and a brief journey on the history of data analytics and how we represent at the moment, how we represent the world, how we represent the reality and how and why we have to use graphs for analytics. And then let's talk also a little bit about the technology that's behind these graph data science needs and a small conclusion to wrap up. So let's move ahead. Well, you have seen this schema a thousand times. I'm not going to talk about history of data science, no, no worry. I'm not going to even talk about a couple of algorithms. This is not the objective of this presentation. But I want, at least at the speed of light, to review the evolution that brings us to the place we are at the moment. What happened since the beginning of the first artificial intelligence needs to the last deep learning techniques that, I like this schema in a kind of earth, because we are digging every day more deep and deep into different ways of extracting value of data. And it really shows how it is in reality. So let's move. What brings us to here? The explosion, in fact, of data science has been possible thanks to two main factors. Algorithms and the evolution of the technology. Since Alan Turing in the 50s of the last century began to talk about how machines can talk with people or can think like people, like us, to the last few years where we have been digging into data through machine learning and deep learning techniques, mathematics has been helping on this trip. They have been helping to find the right way to answer questions. Not just to answer questions, you know, but also to find the right way to answer, because for every single kind of question we have a way of finding the answer. This is where supervised or unsupervised systems, to classification, to prediction systems, models, of course. At the end, from the darkness, we had a long time ago to the light that we have at the moment when we look at the data. And of course, all of this has been possible thanks to the technology that has grown in terms of capacity and enablement to work with huge data sets and to apply algorithms that perform really well with amounts of data that never before was possible to work with. We started working with Single Threaded for a couple of thousands of data of rows, to now to distributed systems that can afford, can load and can manage billions of rows, from batch to real-time, from coding to the last automation learning platforms. So it's been a long trip. But the way I am talking about this, because this trip, these algorithms, this technology that it's wonderful and has enabled us to do a lot of things, works properly, but we need to fit with data. We need to fit with data. And not any data. We need to fit with the right data. And data is subject to all problems, such as the quality of it. I'm not going to deep into the quality of data here, but it's important, this formula, that two factors, if we increase each of these two factors, it increases the accuracy of the models. And it's true, if we have better data, we have better accuracy. And if we have better reality representation, we also have better accuracy. What I say about data, I'm just saying that it's an old problem with new techniques, but at the end, the problem of garbage and garbage out, it remains the same as a couple of years ago. Let's focus on the reliability of the representation of the truth of the world at the end. What is important here is that even we have lots of data, maybe the data is not representative of the entities we are trying to analyze. For instance, if we are calculating a risk scoring for a credit request, for example, and we have lots of data of the customer, but we lack on economic data, I'm sure that the accuracy of that model will be, well, not very good. So it's a question of right data. This is very simple, what I say, but we will go more in depth in this specific point. So let's focus on improving the representation of reality. So let's move to representing the world. And the world, apparently, is very well represented with relations. It's not me who says this. There is a lot of literature out there, papers, scientific papers, books, so on. This is just an example. Let's focus on a couple of them. Maybe you know this book of James H. Folder and Nicholas Christakis that a couple of years ago released this study of how social networks influence the individual. So we cannot explain individuals without understanding the environment, their context, their connections. So the last sentence, this is in the preface of the book. So to know who we are, we must understand how we are connected. And this is the point. This is the point. Understanding the connections, it means understanding and shaping and representing better the world. It's not just James Folder. There is a lot of literature out there, as I said, and it began, everything of all these studies began with the social networks because it's what is more mature now. But social networks were just the starting gun of this race. Now you can think on this same approach and this same needs to cover every single data analytics use case in your organization because graphs, networks are everywhere. We will see that. So relations are a good way of understanding the world. These relations are a good way to model the behavior, to predict the behavior. And we've been working a long time, years, without taking this into account. We've been working in the world of tables and the world of data sets where every row explains an entity or tries to explain an entity, but with discrete information. And this and this approach lacks of context, lacks of information on the relationships of this entity we are analyzing. So connections between data is like the dark matter in our systems. Dark matter now at Astrophysics, it's very famous. We have had the dark matter in front of our eyes forever, in fact. But we haven't realized it was there. And it's really feeling a lot of the universe. The same way, the same way we have our relationship in our data, in our data that we already have, is this dark matter that we have to dig and to find value from it. So we have barely begun to use these connections to find value. And it's time to start. It's time to start because the data we have already has the relationships. We don't need to think on extra information or extra data. No, no. The data we already have, it's enough to find value. And why? Why? We had this dark matter and we haven't had the possibility to find value from it. If it's so good, we say the relationships are so good and we haven't used them before. First, because we didn't know if they were so good. And second, because they were like hidden. But they are hidden because we live, we have been living in this kind of dictatorship of the table, of the data set. The rows and the columns. And as I said, this kind of approach lacks of context. Don't say nothing, don't say anything about the relationships of every single of these rows. And here is where graphs comes up. So we arrived to the point of the graphs. What are graphs? Well, I'm sure you all know, but just in case one or two of you don't know what's a graph, don't mind. This is not necessary to know a lot of graphs to be here. This is not a graph. This is a chart. A graph is this. A graph is connected data, its nodes and relationships. It's a way of representing the reality that represents their world. You see this? And if I give to any one of you this schema, you will understand, I have a person that has a checking account in a bank. I can bring this to anyone in the world and they will understand. Believe me, this is not usual in systems. So it's a really natural way of ordering, of shaping the information. Because this is a graph, but this is a graph too. This is a water molecule. We have three nodes. There are three atoms, two hydrogen, one oxygen, and two relationships that bring them all together. So graphs are very inside of us. It's like a natural way of representing information. I can show you this. This is a graph that's representing cocktails. And we see that we have the amaretto sour that has some garnish, like orange or cherry, and contains lemon juice and other and like a simple syrup or amaretto. We understand this. Our brain understands this very simply. And why it understands this very simply? Because in fact, even we have a huge graph in our head, each of us. We are a graph. We have a graph with 90 million nodes that are our, 90 billion, sorry, nodes that are our neurons and countable synapses between them that are the relationships that is where the knowledge remains, right? So it's a natural way of ordering information of representing the world. So why then we should use graphs in analytics and how, of course. It's not just why. It's how. First, this graph theory. Graph theory is not new. Graph theory came up in the 18th century, thanks to Leonardo Euler, that tried to model a problem. The problem was how to cross the seven bridges of Konigsberg without repeating a bridge and the piece of land. That was from the image from the left to the image to the graph from the right. That is the modernization of the problem. Here is the beginning of the graph theory mathematics. So why we use this for analytics? Because basically, they improve the curiosity of the models. It doesn't matter on which kind of models we're talking about. Prediction, classification, clustering, whatever. And without needing extra data. Data is there. Relationships haven't been uncovered. With graphs, we uncover relationships and we find value from them. So, the y is clear. Improve the accuracy of our models. And how are we going to use them? We're going to do that asking questions and finding answers and using algorithms. Using graph-native algorithms. There are many algorithms that we're already using in our machine learning pipelines at the moment. But there is a family of these, of them, that are the graph ones. So using them to shape and explain reality better than anything else, we can manage to find this value. We can classify algorithms in many ways. For example, we have these community detection algorithms that classifies, that detects, creates clusters in the entities we're analyzing, our customers, our users, our visitors in our web page, or our e-commerce. So partitions, the data between groups. Centrality brings us importance of the elements, importance of the entities we have in the graph. Influencers, many, many, in fact, all these families have different kinds of algorithms that have a direct translation to application in the real world. I mean, it's not just mathematics, of course. It's because the importance of a customer that comes up from, for example, page rank, it's saying something. It means we have a more important customer than others, for example. There are limb prediction, a limb prediction family that estimates, for example, the likelihood to have a new relationship between two users, two customers, two visitors, two, whatever. Similarity is more a standard one. But path finding and search, it's also something very natural in a graph. Find the shortest path, find or the longest path, or find if these two elements are connected in some way and how many hops they have between them. The connection is more strong if it's closer than it's less strong if it's longer, the more hops we have in between. And embeddings. The graph embeddings are powerful. They bring topology of the graph, topology of the reality to a machine-understandable, with less dimensions, way of analyzing, right? So we are converting topology of a graph. You have seen the topology of this, for example, graph that I showed you about cocktails. Converting that topology of every single cocktail to an array of numbers that are understandable by a much-learning algorithm, pipeline, etc. So, okay, you say how. Well, you can say, you told us how, but not many how. So you could say yes, but how I apply that algorithms. Okay, it's a journey. It's not we just execute and that's all. This is a journey. That's a journey that starts with more simple questions, to more sophisticated, to these embeddings, to graph networks. So it's a journey that, from answering questions, more simple questions, but graph questions, to even the graph native learning. So something that learns from the structure, the proper structure of the graph. And why, what happens, what the question is, what happens on what I have up until the moment? I have at the moment an ecosystem of tools, methodologies, etc. of my much-learning at the moment. So what I do with that? I throw away everything I use graphs? No. No. Graphs comes to complement data, to complete the data, but you don't need to throw away anything. Everything you did, it's perfect. It's good. It's cool. What I mean is what I say is that with graphs, your results, your accuracy will be much better. So why? Because, remember, these current data science models ignore network structure. They don't know the topology, about the topology. They know about discrete data. So graphs adds these highly predictable features to our actual machine learning models. So we work like this, with this information that comes from the tables, we say table data sets, and this information that comes from the graph. And now we say, well, Josep, please give me something that we can understand better. Let's see an example. Let's imagine the same graph, the same table I showed you a couple of slides before, that has, let's imagine that has this data. This is the discrete data of our customer. We have the customer ID, the address, the phone number, the gender, the age, the income, if it's merit or not, the level of studies, whatever. We can have 50 more features here. It's a table of a lot of features. But look that no one of these features talks about the context of this person, of this customer, let's say. Let's imagine that we are talking about this data set to be for calculating risk of credit in a bank. So all this from the left comes from the table, and all this from the right comes from the graph. Discrete data versus related data. Related data is things like the page rank. We see that the highest page rank is the customer too. The community, we have customer one and two in the same community. So we could imagine that one influencer in the community number two is the customer number two because it has a higher page rank, and it can be used for many different things. It could be used to, for example, churn calculations. If customer two is at risk of abandonment, we have to take care of it because maybe it influences a lot in their community. Or maybe for, I don't know, for even ordering the queue in the call center, depending on this page rank, because it's more important than the other. Between a centrality degree, total neighbors, all this information is pure algorithm, pure technical data of the graph explains the structure of the graph. But I have added another one, this risk connect. This is another feature that we can find from the graph, but that is more related on the business, not that much on the technical structure, but also the business. So we can answer a lot of business questions that were very difficult to answer in a very simple way. This, for example, if we are analyzing this credit risk, let's, this risk connect feature, what is answering is if this person is connected to less than four hops to someone tacked as a fraudster. I'm sure this is relevant for a model. But calculating this out of a graph, it is really, really difficult. It's very time consuming. So this is how we coexist both discrete data that we have already and the related data that we take out from the graph or calculate inside the graph, of course. So it's time to learn. It's time to learn algorithms, to understand algorithms, and to be where are they applicable in the real life, because this is not mathematics again, it's business. We have talked about these different families or classifications of algorithms, but finding connections, centralities to rank the importance of an entity, of a customer, of a product, of whatever it is. And we have here degree centrality, closed centrality, page rank. All these ranks give us the scoring of importance of an element. The community detection, like Lou Bain, like label propagation, link prediction, well, we talked a little bit earlier. The point here is that it's not something that we have to develop. Algorithms are already there. You just need to use it. But you need to know when to use one and when to use another one. That is something that in the more major machine learning techniques, it's already clear. You don't need to calculate a k-means. It's there. You just use it, and everyone has a function that calculates. But you need to know when to use a k-means instead of another one. So it's the same. Know the algorithms and use it. So now that we have the algorithms and we know the importance of them to shape reality, to explain reality, and to increase the accuracy of our models, we say, okay, but we need something that support, the technical support will execute them. Yes. In fact, if you remember at the beginning, I said when I talk about the algorithms and technology, now we are reviewing this. Graphs give the algorithms, the graph mathematics, graph theory. And technology brings the possibility to execute that at scale. That's the point. So the graph technology that's behind the graph data science, what is at the end, it's a system for handling very, very, very, very efficiently relationships between data. And this is the sum up. It's a way of managing efficiently the relationships. We have had many systems, technical systems, to manage very efficiently data sets, but not relationships. So now it's time for graph technology. And we have different graph technologies. We have from the memory in memory distributed graph systems, like graphics, reference, etc. But the last time it became the need of having something else. It's not just I bring something in memory and I calculate and I take that down, because data must be connected always. And I need to reshape data in the graph to change the data, to find new connections. So then it's when the graph data raises us, like Neo4j came up to answer this question, to have native graph storage and easily to work with the graph, refactor the graph, traverse the graph, create inferred relationships, take system up and down and magically and quickly have the database already up and running, something that with libraries you cannot achieve. And so graph databases are cool. I'm sure you know this ranking that Diven Jeans has in Internet. This has a trend score. You see that graph databases have really bumped absolutely the market on this since five, six years ago. They are really cool. They are really cool because there's something else behind, because there's value. And why are they cool? It's not just because they are very nice. It's because they work and they bring value, of course. And let's see an example. We can, talking about social networks, because it maybe was the first way of using it, we can model a social network with a relational database. Let's imagine this. Friends and relationships, the friendship of people and friendships. We have two tables, persons and friends. So to ask to this system, relational system, give me the friends of my friends or friends of Josep. We have something like this, a query like this. It's not very nice, but apparently it should work. Let's see how we should ask that to a graph database. So with Cypher, that's the language that Neo4j brought up, brought to the market, and now it's open Cypher as a standard. And in way of being the graph query language. You see that it's a language for patterns. It's a language for patterns. We see A connected to B connected to C. And here we see with these brackets and these arrows that it's clear that A connected to B connected to C with A is Josep. Give me Cs. Cs are the friends of the friends of Josep. The friends of the friends of Josep. A is Josep. B is friends of Josep. C is friends of the friends. So it's very nice. It's very nice, but it's not enough being very nice. It has to be very fast and performant. So let's see a quick example. But imagine a million people network with 50 friends per person. So we have 1 million nodes and 50 million relationships. And how perform relational database versus graph databases, native databases, that's important. In the second level, friends of friends, more or less the same. But look at how relational databases really, the time that lasts to answer the questions increases absolutely geometrically. It's exponentially. So 30 seconds, half an hour, more than half an hour. While here it's almost flat. The answer is almost flat. Why? Because this is another way of looking at this. The more hops we have, the more volume we have in our relational or other nos equal databases, the more hops or the more volume the most exponential time takes to answer the questions. While graph native databases are the most volume or most hops, the answer is the same for contextual answers. And why it happens this? Because, oh my God, relational databases even are called relational. They do not have relations. And no, the relations are something that appear in query time when here is a 1 and there is an ID 1. So it's the same. It's related. But we have to ask every time the database for this relationship. While graph native databases are a democracy where data and relationships are both citizens alike. Both exist. Both are persisted in the database. So while traversing the graph, we don't need to ask the gene again for the next hop. The data is already there connected. That's why it's so fast. That's why it has to be fast because calculating a page rank or a centrality or whatever requires a lot of high speed. So because these algorithms are there and we can use them, finding communities, like, for example, with level propagation, it's very simple. This is an example of how we, in a person's graph, we would calculate the community using level propagation algorithm. We just call this library problem. And after a couple of seconds or minutes or whatever, of course, it depends on the volume. We have a 20,000 count number of communities and end up with a convergent. So it's stable and it runs with nine iterations. So we can see this, for example, the biggest community that has 24 elements. And you see that has really a lot of relationships between that element and this case persons. So to conclude, I would say that graphs and graph technology came here to stay. So we encourage you to use it. Why are you here to stay? Because they provide great value adding these features, using graph embeddings, understanding the shape of the data in terms of topology of these relationships that are closer to each element, brings value to the accuracy of the models. The technology is ready. The technology is capable to manage highly, is highly scalable at the moment, to manage huge volumes. So we can handle things that a couple of years ago was impossible to think of. Another point is using a graph database. That's an important point. Using a graph database brings us the possibility to think more in the what and not in the how. To just ask things and the system do it for you. It's not just a question of algorithms. It's a question of asking any question as we ask to every single database with SQL. So it's a great step, having native persistence, having native, of course, has the persistence in disk and in memory as you have the libraries, but we have both worlds here. So it's important. It's a good and a big step forward. And so graphs are everywhere. That's the point. We see, this is our name, graph everywhere. So we really feel that you have just to start working with this and to dig into your dark matter that you have in your data, these relationships, that will bring you a lot of value. And we encourage, as I said, to start tomorrow, well, after the big things event. Okay. And that's all. Thank you for your time. If you have any questions, I'll be delighted or if you want to contact me, I'll be delighted to help you and anything you need. Thank you very much.