 by Julian Simpson. He seems pretty cool. He talks about graphs. Graphs are nice. He's got many surprises in his talk and so everyone welcome him. Thanks Rachel, thanks everyone. So today's talk is surrounded by graphs and I might as well come out and say we think you are surrounded by graphs. So before we really get set up there's this obligatory introduction slide but I wanted to turn it around to you guys because I've done talks before where I've totally missed the mark on the audience. Oh there's the sweet spot for the audio. So I've totally missed the mark and I've kind of gone at a really sort of high level on something we everyone want to detail and and vice versa where I once bought an audience in Copenhagen for about an hour and a half and people fell asleep and everything. So I would get that right. So really the introduction is about you guys. Now if it was a much smaller audience then this will actually make you go around and sort of say what your interests were but it may be a little bit a little bit harsh. So I'm just going to do a couple of shows of hands on things basically. So who is a Java developer or writes some Java or doesn't hate it. Okay all right cool. And who used a NoSQL database last year? Okay about third. Who's used Neo4j in a year? We've been around for donkeys years. So okay right that is very helpful thank you. So I will introduce myself then. So my career in Unix like operating systems started just up the road in the late 90s where I worked at a company called Clare. I worked there a couple of years. I went to set my fame and fortune in London and didn't find it. Did come back with a wife and kids though so that was all right but yeah I was around doing SolarisAdmin at the dot com crash and then it all went horrible and Linux disrupted my cushy Solaris job anyway. But I kind of moved on by then I got a job working for a company called ThoughtWorks basically building people's Java code and that was a lot of fun. However it gets really really irritating to have a room full of people developing in an IDE on Windows and they throw the code and say it doesn't work on Solaris but it works on Windows so that matters. Never mind that those days are often on uniprocessor Windows machines and they'd have a deploying to multi core Solaris machine. Must be my problem. Anyway so as a reaction to all that I kind of moved into the AWS community speaking about how we should all just get along and how we should get developers involved in making code releasable and I'm going to take no credit for that. Someone made that a lot better. I'm not sure it was me. My only claim to fame there is that I actually went to the very first DevOps Days conference unlike everyone else who claims they did and didn't. So if anyone wants to talk over a beer later about how DevOps didn't actually change the world or did they will be a fun talk. I think we did horrible things to the configuration management community along the way but now my job is a developer at Neo Technology. I work on internal stuff. I don't work on the product. I did use to but moving from London to New Zealand really made it difficult to collaborate with people in Sweden. That's my biggest learning last year was that you can't. So I'm going to talk about the product at the end of this talk really because I don't really want to ram that down your throat so I want to ram graphs down your throats. I'll be delighted if you guys want to start using Neo4j for projects but that's not really the purpose of this talk. I am quite happy that we've announced 20 million funding today for Neo. They'll keep the wall from the door for a little while. Anyway so this talk what I want to do is kind of set the landscape for NoSQL and how graphs fit into that. We've tried really hard to position graphs as a separate category to NoSQL but we always get lumped in so we'll cover that. What graphs are and some examples of that and then I'm going to read you a story. Now this may be your talk but you guys also have this huge risk that we're going to be reading a thing of it's going to take a variable length of time to do so Rachel may have to hold up signs at me. If you guys think I'm going really too slow we'll try and speed up and we can do some Q&A or a demo at the end. If I'm going too fast please hold up your hand. And we'll go over some questions and I have a few copies of the last print run of our graph databases booked from O'Reilly. This isn't such a fabulous giveaway because we're about to have a new print run but there's some good stuff in the back especially about different types of NoSQL database. So does anyone have a definition of NoSQL? Everything that's not a relational database that's a pretty good one. Now I stole Martin Fowler's definition of this and bingo! Everything that's not a relational model NoSQL. So that's good. There's a bunch of other things that he's kind of posited and I mostly agree with. Mostly open source. I believe some databases like Vertica are lumped in the NoSQL category that aren't open source but I think for most part they are which is a good thing. Designed to run on large clusters mostly. Not all NoSQL databases do because the category is so wide but I think everyone's trying to move towards that you know the horizontal scaling pattern. And it pains me to read this out but based on the needs of 21st century web properties. I believe what he's saying is that not all websites need highly durable storage. Some websites can get away with an awful lot and some of them if they blow up it doesn't really matter all that much. That would be different in say finance or something but you know a lot of web two sites manage to cut corners and deliver stuff quickly and that is pretty much win for them. And the last key thing is NoSchema which is probably mostly as well but the key thing is that it's probably optional at the strongest. I think that's implied basically. If you take out the data you just call it a base. So I must pick up the habit of repeating the questions just in case they don't come on the audio so you know the comment was we didn't talk about storage that's implied. So there are four kinds of NoSQL database in the Neo technology view of the world and that just happens to be shared with some other people as well. Key value stores and I think if you've used a host file you've used a key value store you know you look at the documentation for a bunch of Linux systems they'll talk about the host database and I think that everything from there on is fair game to call key value. Obviously that's a wide spectrum but I think all the stuff is. Document is a bit more kind of specific really. You're persisting adjacent document like MongoDB or CatchDB, column family like Cassandra famously used for Twitter and then finally graph the finest of all the categories like our product and a few others. And I stole most of the next eight or so slides from the book NoSQL distilled by Martin Fowler and Pramod Sardaj and they've got some recommendations for where you should use it so I have faithfully reproduced those. So back to key value stores. A simple hash table primarily used when all access to the database is by primary key. Well that's again that's super wide. I don't believe the example I've given you is stolen from React I believe and that is as simple as the API can get you know pretty much getting set. And what the value can be varies from database to database but it could be scalar values like that, it could be a hash, it could be a more complex data structure, a list, something like that. Hard to generalize on this. So recommendations do use it for storing things like session information, user profiles and preferences and shopping cart data. They justify that in their book by saying that they were using the key to kind of as one great big namespace. Redis you can have buckets of keys so you can effectively namespace them off but this is a very natural fit if you have something like a list of usernames or session IDs or something and you just want to persist those in a nice simple database. And the good thing is there's this pattern called polygot persistence that Martin Fowler also named. Don't throw everything into one no sequel database just choose a good fit for a kind of storage that you have. You can leave everything else in your relational database if it suits you. So this may simplify your life and it may not even matter if you lose some of that session information people just might have to log in again and curse you a couple of times. Not recommended. Any case where you have relationships amongst the data well that's kind of hard to do. You often can't do a transaction on multi key operations so a complicated transaction may not work. If you want to do anything with a set of values also that's not good and not all key value stores will let you query by data. Obviously you can use grep on your host file but that's not the case with some other systems. All you end up having to use something else to index it. So document databases who's used a document database recently I've got about five six. So yeah these are web scale. If anyone hasn't seen the MongoDB as a web scale YouTube video what's a hoot. Not a comment on MongoDB a comment on people's assumptions about web scale. So self-describing hierarchical tree data structures which can consist of maps, collections and scale values. So I've chosen the most dumbest kind of example here that could be quite a big document and you could update individual bits of that document using MongoDB's APIs which seem pretty sweet. And I like it because it's a really simple kind of way of doing things. And the recommendations of that one are event logging because you can write to parts of a document individually. Blogging with CMS because lots of web systems will quite happily transform JSON into front-end pages, web or real-time analytics in e-commerce. Not recommended those complex transactions because I don't believe you can query across documents but I don't believe you can do a transaction across multiple documents. You can actually query several documents at once but if you have data changing all the time like you don't have a fixed kind of schema in place you won't actually see all of the responses, the data. So varying aggregate structure basically means be very careful about what changes in your app. Of course we have about three times the number of people now so my careful poll of who knew what about no SQL and stuff is gone, let's just roll with that. I will repeat for those guys who just arrived. If you think this is really too slow let me know and if you think I'm going too fast please raise your hand or shout at me. So column family, I have to confess this is the one I really understand least. Obviously Twitter have made excellent use of Cassandra and I believe the main thing is schema flexibility and read performance that makes Cassandra really popular. Storing data with keys mapped to values and the values grouped into multiple column families each column family being a map of data. I just can't read that and actually make sense of it. You're basically getting columns full of rows rather than rows of columns in your column family database. You can see here the really tied Hobbit example Bilbo and Frodo don't have the same keys in their map and that is by design that's the schema free aspect of a column family database and you can nest different columns in there as well. Martin and Promod suggest event logging as an interesting thing again. I believe the right performance to a column family database can be super fast. Also being good at persisting JSON again so the same recommendation for content management systems are blogging. But the last two are interesting counters and expiring usage. So they have data types for counters and you can actually tell data to self-destruct which is cool. So if you have something like records you need to keep for compliance for a certain amount of time you can actually say persist this but kill it in 30 days or something which is kind of sweet. Now the fun bit is graph databases. Storing entities and relationships between those entities but I think it's the first gigantic clue that graph databases are awesome because it's really simple and what I've done is take the same cringy Hobbit thing and I've used the our most simple Java API to show you the very basics of creating a node and giving some properties we'll go into some more detail obviously. So the thing I love is connected data is the recommendation because 30 years ago I don't believe data was very connected at all now it's just super super connected everything is connecting and it's a great use case for us. Routing a dispatch and location based services is another thing. We have a customer in London who built their entire business around a graph database and later got acquired by eBay where they're basically choosing the fastest path to get goods to someone and they demonstrated this by ordering a bottle of whiskey online and having it show up by the end of their talk. Recommendation is another classic for us as well as fraud detection where you're able to look at patterns and pull out the really strange patterns for fraud detection that is also used by a bunch of people and the one downside is updating all of the entity's new database may hit cause you to update the entire graph and you may hit performance issues. So I just want to clarify before we really get into the integrity of graphs what they actually are because I met a bloke at a conference last year and he said what do you do I work for Neo or do they do they make a graph database and he says oh a graphical database cool and I said no not cool a that's Microsoft access b that's not cool. So hands up if you think this is a graph thank you you're all very clever people and I mean that sincerely because I am super impressed with the both speakers and attendees of this conference how about this time hands everywhere excellent we already identified the right right thing and I'm just going to go into the a quick sort of jaunt into history as to why so it's just the word graph. Graph means mark something as far as my ancient Greek goes so the word graph has been used lots of lots of times for photography lithography steganography we have to go back to 1878 to find the first proper use of the word graph and once I found that quote from JJ Sylvester thanks to the person who bought the document from Nature magazine so I could reproduce it but I then had to merely work out what a cacoolian diagram is. I stopped studying chemistry in sixth form so and it's that that's benzene molecule often surrounded with a snake eating its own tail just for style but cacool is the guy who gave us that depiction of molecules and if you go back to the previous graph I just showed you I think it's kind of dead on so we really try and make sure that people think about charts versus graphs and we always beat them up when they get it wrong so the history of graphs comes back to a guy called Lenhardt Oehler who was born in Konigsberg Prussia which is now Kalengrad in the Soviet Union or as Russia as it's called today so and what you do in the 18th century is you would posit these intellectual kind of challenges to each other and someone said is it possible to take a walk around Konigsberg and cross all the bridges and never retrace my steps and because Oehler was the super smart guy who gave us half of our maths including functions he didn't actually try and walk it around or draw it out on a piece of paper and and throw it up and screw it up and throw it away he actually used maths so his insight was that it may be a map here but you can reduce it to a far simpler structure so he just thought okay River Praegel has two islands in the middle and a north and a south bank so okay that's just four things four things with a bunch of connections between them actually looks like this and as it turns out you cannot solve the seven bridges of Konigsberg problem without sticking another bridge in which people have suggested but um there it is Oehler of course came up with all the maths to um to prove beyond all doubt that it was impossible we're not going to go in that today otherwise it will run out of time so I thought I'd do some simple examples of um representing things as graphs and you know what what more appropriate um thing than a make file to start with so I believe everyone knows what a make file is in this room or can easily find out um and we've got three targets in the make file and there's some dependencies four if you count the uh useless one at the top which I should have marked with a dot phony so I made my make file I ran a dry run on it and I piped that output to a file called mate to graph and it gives me a directed graph and graph is zero hands up if you're familiar with using graph is for visualizing stuff okay about half the room so it's just excellent tool that's been around for donkey's ears that allows you to programmatically draw a graph so you can see here we've got some labels and we've got some some nodes and it's declaring some relationships at the bottom so we can turn that into a nice diagram which looks just like the other stuff I've shown you so if we do something that's um a little more complex let's talk about Apache Maven this is one of the reasons I stopped doing build stuff in Java is because uh I could cope with Apache Ant but uh this is a whole whole new ballgame so it doesn't really show show up well on the display but what you're looking at is a collection of libraries that a project depends on the libraries at the top and you might just be able to see uh it's a Maven plugin so it depends on some Maven stuff it also depends on some some really common Java libraries and then it turtles all the way down down the dependency tree or graph and I find this incredibly useful technique when you're trying to work out what the hell something does because you can just visualize it all there and this tool kindly shows runtime dependencies in green and the test time dependencies in blue so you can see if uh what the uh different phases of Maven are going to do and this morning I did manage to corner Linus and ask him about how he ended up choosing a graph to build get on and in some ways gets more like a key value store and in some ways it's um is a directed graph and it turns out that um back in the day when Bitkeeper was still a thing Larry McAvoy was all keen to use that Trig didn't like that at all Linus was kind of stuck between these two guys okay you should use this no I don't want to so so Linus said he's trying to explain to to Trig how it all worked and he started using a graph metaphor to explain it and then before the um the organizers pulled him away to to get to his keynote this morning um he explained that then that that that sort of turned into into the when we all used today um and you can see in the diagram um graphs lend themselves very well to to dealing with the the branches that we create um and I particularly like this one because someone's doing a rebase and you can see that we're actually disconnecting parts of the graph and moving around to make a simpler structure how are we doing on time Rachel elapsed left I'll be able to get a move on and also um um just for fun you can also visualize um parts of a code base uh with with the graph lovely and Rusty did this completely awesome graph which apparently took nine hours to build on his Athlon 1000 back in the day of the two four code base of every function in the kernel which I thought that was sweet and endorsed so this is the really high risk bit of my presentation because I'm going to read you a story honestly so who hasn't read a choose your own adventure book or any of the yeah things okay good everyone's hold it so you you you make decisions when you go through the book turn to page if you want to slay the dragon all that kind of stuff that twelve-year-olds are going to read yes so today's book is inside UFO 5440 um published by Edward written by Edward Packard who sadly died last year published in the early 70s I believe oh 82 so starts with a warning do not read this book from end to end so for a bit of audience participation I'm going to read the first chapter probably fuzz over a few things then I want some decisions so it's your first trip on the concord the supersonic jet airliner that crosses the Atlantic in three hours and 45 minutes who flew here on jet start today this week because that's the feature of air travel we used to think it was concord sorry about that right now you're at 57 000 feet a mid-flight from new york to paris you look up from the magazine you've been reading as the voice comes over the loudspeaker this is captain ravelle speaking we're about halfway across the Atlantic now a lad issued 54 with longitude 40 we've just come on to a new course that will bring us over the coast of france in about 90 minutes those of you on the left hand side of the plane maybe we'll see the southern tip of green land you glance out the window hoping to see green land instead you see a gleaming white cylinder several times larger than the concord but without wings engines or ports the object glistening in the early morning sunlight is coming straight at you look the whitehead man sitting next to you leans towards the window get a view at what don't you see it it's coming right at us he opens his mouth of answer but says nothing because you're no longer there turn to page six you're sitting in a thick rubbery mass in a circular room the room is bathed in a pale white light yet you see no windows or doors or lamps you remember now sitting in the concord the huge white object coming at you the plane shattering and where are you plane to light turns violet and mixing with oranges the red of brightens as the sun about to rise a voice is speaking except it's not speaking your hearing thought entering directly into your brain we are the utai masters you are on the galactic ship rakama opening the planet earth you've been chosen to be a specimen in the galactic zoo in the imperial planet of ra if you refuse to cooperate you'll be sent to somo you may make one statement if you demand to be returned to earth turn to page three if you want to know more about the utai turn to page four robert what do you think all right tell me more about yourselves you say why did you choose to visit earth we study earth people as your science is steady bacteria under a microscope we came to earth and such of ultima the old the planet of paradise if you offer to help the utai masters find ultima turn to page 22 if you ask the utai how they think they could reach ultima visiting earth turn to 25 what do you think paul 25 cool i think we'll finish about five o'clock basically they say you will cooperate or you will cooperate so you say look i'm a human please and they say you are unsuitable you'll be erased you won't even know you've been in space turn page 24 you're sitting in your room at home well that's a turn out what happened you remember having board the concord um and then something weird reaching into your pocket you pull out a pebble the size and shape of a watermelon seed why is it heavy turn it over and you're about to toss it toss it to the wastebasket basket does it go in or not turn to page 41 or 50 the strange object lands in the wastebasket with the thud and you think no more about it a few days later you're talking with a friend of yours todd hawkins did you see the airline pilot on tv last night Todd asks here in his crew swear that a couple of days ago they saw an alien spaceship over the Atlantic really yeah it was a latitude 54 longitude 40 so they called it UFO 5440 at first i thought a passenger was missing but none of the doors have opened there's no way a passenger could have disappeared they finally decided the passenger list must have been wrong you get a bit freaked out by that you look at todd but you can never quite put your finger on why he the thing he said would seem so discordant the end so that was a fun i had my 12 year old daughter read it and she she enjoyed it i i i had in my bag for the past few weeks i've actually been really on the bus and it's been quite fun so i have destroyed the left hand um display i don't know what we're gonna do about that yeah okay never mind so and you can see you can depict this as a graph as well you're on concord you meet aliens um i didn't actually write down that path but you can see going off to the right hand side you ask them about themselves blah blah blah um what i was hoping is that you end up um you know falling asleep forever because they thank you didn't like you um but there you go so that's another graph is this document that i've i've do you guys had i hope you ever see that okay so so it's another graph is um document like i i showed before and similar thing so this is a screenshot from nether jay where i i actually spent days sticking a bulldog clip on this book trying to actually link all the things i should have drawn it out on a whiteboard first i think uh but i i did it so you can see here i've got page number one on one here that turns to a number even i can't read 20 and there is the entire structure of this book in a graph now that's kind of cool but it's not useful to you at all so um and i showed you we had a uh you know a basic api for manipulating or querying a graph but even that didn't really work out so for the past couple of years we've been working on a query language called cypher and much as programmatic interfaces to things are convenient to the programmer doesn't work for everyone and we find that a you know a sequel like languages is very kind of um easy for everyone especially one like ours there's a cartoon about um someone wanting some business data out of the database and being told to go write a map reduced job but i think to finish it would break our policies here at lca i'll send it to anyone if they want to see it um so you can query or you can upgrade a graph and we built it on ASCII art because andres's um big insight was that every time someone models something they draw circles for an entity and they draw arrows and they label the arrows and so he decided just to make that um and ask yeah what you're kind of like and the other great contribution is that because our code base started its life around um the turn of the century and the matrix was still cool there's always been these jokes about the matrix in neo even though neo stands for network engine for objects again showing its age a bit he decided to name it after the bad guy in the matrix who um does a deal with the agents and tries to kill everyone including the blonde lady and the other blokes who names i really should have um looked up so this is cypher and um sorry about the ruining the whole screen thing for you guys on on the the far side um it's really dead um but you can see it's not a million miles away from um another query language um you can see create we've we've got our circle for the um the entity and we've got a bunch of attributes create beginning create a page it's got a beginning label on it it's got a page label on it it's got a number and a synopsis so this is how I created the book is just a series of cypher statements that that just basically built up the graph I ran a bunch of queries in our neo4j shell tool just on my my laptop and we can answer some questions about the um about the data we've created so how many pages well there's 80 pages that matter I did actually exclude a few how many endings are there uh 27 so you can see we're using our match keyword we're looking for anything that has a ending label on it and we're returning the count and we're making it pretty by by uh having a header of endings how many decisions could you make what I did was I'm I stuck a property on the um relationships that connect all the nodes together so if there's a decision um that tells you how what what path you took through the book so we need to look for um a node that turns to another node with a particular type of relationship where it has a decision and I want the count of those decisions so 59 decisions you can make during the book and that adds up to 125 paths through the book which is you know reasonably good value for a avid reader to to go through keep you going for a little while so how's everybody with that that'd be great so and you can see we we think it's it's awesome because you can look at the structure you can actually query based on the structure so the shortest path is four four pages and if you look at the top line of the query um match beginning turns to ending and we can look at the the the length we can apply functions to what we um get returned and we can pull back the shortest one we can reverse the the order and thank you uh we can reverse the order to get the longest 19 pages you flip through and if you correctly limit that query uh you you can you can see a depiction of of the path with all of the um the different nodes in their synopsis in there so so a lot you can do now the cool thing is this book had an Easter egg in it Ultima oh I found found the sweet spot for the audio Ultima so the mad aliens are looking for this mythical paradise planet and here it is probably looks better in real life the the um you know 80s are printing doesn't read justice so really we need to find it now no no page actually links to Ultima you just had to flip through it they told you not to flip through it but you really had to so you could actually discover the thing is like an Easter egg in the book just kind of nice so this is how we cheat so match a page which has a synopsis of Ultima and um and there it is and you can see here that no page will will um will actually link to Ultima and you see if you look at the the query you can see I've got an arrow in there so I'm doing a directed query from a page to the page that I now know is page 101 and nothing's there but if I just match on any relationship in any direction you can see here there's a second page um where it's after it's shown you that the awesome line drawings it says all your new like immortal buddies and ultimately hang out anytime you like and you can disappear off to there from from your normal earth life must be pretty amazing in the for 1982 reader so one of the nice things I find about our ability to um display a graph is that you can actually look at your data and see what's wrong every time I broke the book when I was trying to import it and I failed to connect parts of it together I'd actually visualize it and you I could see okay there's a bit missing but this is as it's meant to be there's just these two pages that are off by themselves because that that's the hidden bit of the book so I believe I'm pretty much out of time so I'm just going to do a quick spell on Neo4j as I said my purpose is always not to convince you to use it but just in case you're interested we have a awesome community that's made lots of drivers either our own staff have written a bunch of drivers or we've just had people show up and make them you know people come up with that's you cold fusion drivers and things that we would never expect people to want to use but they show up and say here they are so we're loving the fact that people are sharing those back to the wider community the very core of Neo4j is GPL we have an enterprise version that's a fairer GPL and it's all written in Java and Scala with a bit of bash and Ruby glue code kind of holding it together with Maven like I said it was around 2000 that they decided the network database would be cool and they developed this database and then it became open source in 2007 we're on GitHub and yeah we have our website amazingly so yeah thanks for listening and who has questions for the entire thing okay go I don't think there's a difference between I mean the main difference I see between the graph database and a relational database is that the connections of persistence in the graph database you create the relationship whereas it just happens to be something you work out at runtime in a relational database that this foreign key matches this to their own so yeah actually if as a relational database geek I can I can also feel the answer to that question which is with the relational databases the connections are generally visualized as being hierarchical whereas in a graph database the connections are more peers yep great would you like a book sure okay that's all the time we have okay well I think I just broke my phone yeah sorry the the cipher language is clearly a nicer way to to work with graph data than say representing in relational database using SQL but what's the performance story generally like putting your graph database in Neo4j is reversing any graph database can be super fast because what we're doing is just reading those connections and a native graph database will actually store your graph as a graph so there's very little impedance mismatch between the thing that you're you're symbolically manipulating and the actual data itself so that can be super fast we're working on the cipher performance all the time that kind of started as a very sort of experimental thing now we're actually hiring people who do ADDMS you know database science to uh make it perform thanks have a book can do one more question okay make it a good one all right um I was just wondering what the ecosystem is like for GIS software that works with Neo4j or similar products um these are like graphical information um uh map APIs and that kind of thing I didn't hear the first like four words what you said you want the ecosystems like for uh GIS startup uh so geographic information systems like okay yeah so yeah there is a project called Neo4j spatial which does spatial stuff and we're looking to add some stuff in the future so yeah have a look all right thank you everyone so that's all we've got time for we have a small gift for Julian oh thank you that was really good another big round of applause