 It looks like it's time to go ahead and get started. So my name's Will. I work at Neo4j, which is a graph database. Has anyone used Neo4j or graph database before? No, OK, great. A bunch of new stuff we'll cover then. So I'm going to be talking about graphs and sensors, the internet of connected things. So I work on the developer relations team at Neo4j. I just say, really, I work for the Neo technology, which is the commercial entity that supports the open source Neo4j project. And on the developer relations team, I work with a lot of users, hands on basically trying to help them be successful using Neo4j for their projects. And there have been several IoT projects that we've worked on with some sort of shared use cases and takeaways. And so the goal of this talk is to introduce you to the concept of graphs and graph databases and share some of these IoT use cases that we're seeing some users adopting. So this is the rough agenda. We'll talk about what graph databases are. We'll do a demo in Neo4j so we can familiarize ourselves with the data model, the query language, and then we'll jump in to looking at some use cases where it makes sense to use graphs along with IoT. So when we were brainstorming this talk with some of my colleagues, we were trying to approach this from the first person's perspective of a sensor, sort of a lonely sensor that's looking for connection in the context of graphs and graph databases. And we ended up calling this the mushroom talk because it actually maybe requires some psychedelics to put yourself in the first person state of mind of the sensor. And then doing a little bit more research into that, well, there's actually a lot of sensors involved in mushroom cultivation. You have things like temperature, humidity, these kinds of things. So anyway, side note, internally I think of this as the mushroom talk. OK, so on to graphs. So when we're talking about graphs in terms of graph databases, we're not talking necessarily about data visualization, but we're definitely not talking about charts. So these are charts, and this is a graph. What's the difference? Well, when we say graph, we're talking about some data structure, specifically nodes. Nodes are the entities in the graph relationships connect nodes. What kind of data can we model as a graph? What can we store in a graph database? Well, this could be information about a person that has a checking account at some bank. We can see the relationships between these entities. We could be talking about a hotel that has rooms. Each room has some availability calendar associated with it. We can model, this is a graph. We can also model devices that are listening on ports on certain IPs, what organizations own the IPs, and how geolocation ties into that. Where's that IP located? So really, when you start to think in terms of graphs, I really think that graphs are everywhere, and you start to see graph problems everywhere that you look. So let's talk a little bit more about this graph data model, specifically with Neo4j and most graph databases. They use a data model called the property graph data model. So we said that nodes are the entities. Nodes can also have one or more labels that get at the type or the class of the thing. So here, we have two nodes that are labeled person, one that's labeled car to represent those types of things. We store key value pair properties on nodes as well. So here, we have a node with the name Dan, the date that Dan was born, his Twitter handle. Similarly for the car and for Ann, we have key value pair properties that are specific to those data points. Now relationships, we said connect the nodes. Relationships have a type, a single type, and also a direction. So in this case, Dan loves Ann. And it's good that Ann also returns that. That's modeled sort of as two different relationships. And also in relationships, we store key value pair properties. So that's the basic data model. Now we need a query language to work with graphs. And that query language in Neo4j is Cypher. Cypher is all about pattern matching. So the basic idea with defining patterns in Cypher is nodes are defined within parentheses, relationships within brackets. So you end up drawing these sort of ASCII arts representation of graphs to represent the pattern that you're looking for. And in this case, this is a create statement. So if we ran this against our database, we'd create two nodes, one for Dan, one for Ann, with this loves relationship connecting them. We'll see some more Cypher examples, but this is the basic idea. Just think of pattern matching and drawing ASCII art to represent the pattern you're looking for. Cypher, I should point out, is really open Cypher. It's an open source language that is becoming a standard for working with graphs, not just on Neo4j, but also other graph databases and also in some graph processing frameworks as well. OK, so that's sort of a whirlwind introduction to graph databases in Neo4j. We talked about the data model, this idea of nodes and relationships. We didn't talk about this, but there's an important property of graph databases, this idea of native graph processing. This means essentially that I can traverse the graph very, very efficiently. This means that we can work with very, very large data sets at scale in the graph. We talked about open Cypher, the query language. We will look at this in a minute, the Neo4j browser, which is sort of a query workbench for interacting with Neo4j. And of course, Neo4j is open source, so you can download, build it from source, get started with no problem. Cool, so let's look at a little demo with some actual data in Neo4j. So I pulled down some data from Shodan. Anyone used Shodan before? A few people. So Shodan is, well, as they say, the search engine for the internet of things. So Shodan scans lots of different ports on IPs and sees what response they get. And based on sort of the known ports and the response, they're able to identify databases, cameras that are listening on these ports, industrial control systems, and so on. So I grabbed some data from Shodan. You can export this in a CSV format. So this is an example doing a search for cameras that are available. So you can see we get information about the IP, the hostname, country operating system, these kind of things. And Cypher has some really great functionality for importing directly from CSV format. So this is just a quick load CSV script in Cypher to load this data from Shodan into Neo4j. Essentially, what we do is iterate over each row of the CSV file, extract the fields we're interested in, things like the IP address, city organization, and define the graph pattern that we want to create with that data. I'll try to quit moving. And so this is the graph model that we end up with. Saw this earlier. We saw this earlier. But let's jump into Neo4j browser and actually look at some of this. So this is Neo4j browser. It's, as I said, sort of a query workbench for writing Cypher queries, visualizing the results from Neo4j. And we can jump into some saved queries here. So first thing we want to do is look for any cameras that we've imported. So we want to define the graph pattern that we're looking for. Well, we're using the camera label to indicate that this device is specifically a camera that's listening on a certain port. And that port is connected to a certain IP. And then we're just looking for the first. The first match of that path that we have, and just returning that. And here we can see, OK, so here's some device that happens to be a camera listening on port 554 and on some IP address. We can traverse out from this IP and see that, well, this IP is in Korea. It's registered to Korea Telecom. We can traverse out from Korea Telecom and see what are other IPs that have devices exposed as well. So these are some basic traversals we can do in the graph. Let's look for things in Portland. So let's match on, in this case, the city node where the city name is Portland and return that. And we find any devices that are exposed in Portland. And we do. We find 32, it looks like. And so we can traverse out from these and see the organization that owns these. Or we can just write a more complex Cypher query. That's going to say, OK, starting from Portland, let's traverse out to find all the IPs connected to Portland what ports that we have industrial control systems listening on. And also, let's find out what are the organizations that have registered those IP addresses. So if we run this, we get a nice graph here. And this is showing us, so here's Portland in the center here. These purple nodes are the IP addresses that have some industrial control system. This is specifically one called Niagara, which is used for things like building heating, elevator control, that sort of thing. And we can see these green nodes. These are the organizations that have registered the IP. And so if we look at these, a lot of these are internet service providers. Here's one potentially interesting. This is Star Park, LLC, which owns a bunch of parking lots in Portland, potentially interesting there. So anyway, that hopefully gives you an idea of how we can work with data in Neo4j, what the query language looks like, and some of the visualization components. And we already saw these. These are in the slides just to reference later. OK, so let's talk about some actual use cases where it makes sense to use graphs along with devices. So I'm going to say sensors a lot, sort of as a general term for devices that generate some data and connect to the network. I think there are lots of sensors that are generating data and uploading, streaming this data to the cloud. And I think we're at the point now where we're able to handle the streaming data. We're able to do anomaly detection, things like this, with the streaming data. But what I think is really important that we haven't necessarily focused on is looking at the connections in these sensors. So we can do things like look at power consumption, networking, again, streaming data. But really, when we start working with this data in graphs, we realize that there's lots of value in the connections in our sensor and device networks. So let's look at a first example here, just a simple sensor or device network. So in blue, these are 18 sensors that are connected to four different wireless access points, two switches, one router, so standard network configuration. And we notice that there's a problem here. We're not getting readings from four of our sensors. Well, if we look at this network topology, we can see, well, okay, they share the same wireless access point. That's a potential point of failure. And of course, we can look at this topology. This is fairly simple and sort of figure this out. But how can we express this programmatically in Cypher using a graph model? Well, here's what that might look like. We have depends on relationships connecting our sensors to the wireless access point. And maybe each sensor has an ID. So we can match on each specific sensor and find any component that they depend on. This is how we would express that in Cypher. And we may want to find all components that have connections to disabled sensors. And so we can do this. We can also do an aggregation to tell us sort of the number of sensors that each component has that has connections to that is disabled. And we can go one stop further and say, okay, show me components that have this depends on relationship with sensors that are disabled that do not have any connections to sensors that are not disabled. So essentially show me components where every connection to that component is down. And this is a problem if we find that. That means that something's not right. We need to do some maintenance. So let's say we're going to take down this switch. Well, we can traverse the graph to find sort of all of the sensors that are going to be impacted by taking down this switch by traversing out downstream to find any device that will be impacted. This is the idea of a blast radius. So if one device goes down, what's going to be impacted in Cypher? We have this handy variable length path operator. So which is the asterisk here in that relationship. So we can specify traverse paths of any number of relationships. So any number of hops down this depends on path what's going to be impacted. So if this red node goes out, we can see everything that it depends on through many, many different different relationships. So dependency analysis. This is a great use case where graphs can add some value and especially by looking at the connection or sorry, the topology of your sensor network. So okay, so let's look at another one. So what about relationship between sensors? So imagine a oil and gas pipeline where we have lots of sensors at different points in our pipeline and they're constantly streaming out readings, things like temperature, pressure, flow, these kinds of things. Okay, and so a common approach is to use something like Kafka to create Kafka topics that can consume data generated by these sensors. So that allows us to handle the streaming nature of the data. And we have tools, things like Apache Storm that allow us to do things like anomaly detection on streaming data. So for example, let's say we have a sensor that starts to report back very high temperatures. Storm will tell us very, very quickly, hey, there's an anomaly here, something's going on with this sensor. And okay, that's useful. But with graphs and by looking at the structure of our network of sensors, we can first of all verify that this actually is an anomaly rather than just maybe a sensor going out. How do we do that? Well, we can look at the readings from our adjacent sensors. So this is an oil and gas pipeline. We know sort of approximately what each temperature and pressure observation should be based on the flow in the network. So we can say, okay, for a sensor that is apparently undergoing an anomaly, what are the adjacent sensors within some radius and what are the observations that we're getting from those sensors? Are they consistent with an anomaly occurring? So this helps us verify that the anomaly is occurring. If there is some anomaly, we can also use the structure of the network to reroute flow in the oil pipeline to avoid this anomaly to alleviate some additional temperature or pressure while still maintaining flow in the network. Cool, so let's talk about another benefit when working with graphs and that's the ability to sort of join data sets. Graph model is very flexible, allows you to easily combine data sets and query across them. This is a common use case for graphs and when we do this now with the sensor world, it allows us to connect sort of other real life sources towards sensor data. So an example of this is ADSB. This is Automatic Dependent Surveillance Broadcast which is commonly used in aircraft. So most aircraft in the US carry these and they're constantly emitting information that represents their altitude, their velocity, their location, their tail number, this kind of information. Now there's also lots of public information on aircraft registry, who owns the aircraft, what organizations own the organizations that own the aircraft, maintenance records, safety records, thing like this. So what could you do if you combine the readings from ADSB and some of this aircraft registry information? Well, the graph model might look something like this. So we have not just information about the plane and its location but we can also tie in the manufacturer, the model of the plane, its manufacturer, maintenance records, and so on. And if we have data like this, what could we build? Well, we could build a Twitter bot that tweets out anytime a plane owned by a dictator flies into or out of the Geneva airport. We could monitor flights within the US and tie those to planes known to be owned by the government specifically for government surveillance to sort of track what areas the government is surveilling based on ownership records. So these are the types of things that people are using this concept of combining these sorts of registries with IoT data. And so in the final use case that I wanna talk about is this idea of recommendations. So everyone has been exposed to some sort of product recommendations, sort of personalized recommendation. So think of Netflix, here are movies you might be interested in based on movies that you've watched on Amazon, products you may be interested in purchasing based on your browsing history, these kinds of things. And this concept translates over into the physical world, in retail specifically as well. So we think of the layout of a grocery store and we think of this concept of the loyalty card. So even though we're not purchasing online, of course the grocery store is tracking our purchase history through using our credit card, again tracking our loyalty card purchases. So we have some record of preferences that we've expressed, think of product ratings, purchases, that kind of thing. And then in retail we see many of these location-aware apps that allow retailers to track customers throughout the store. And we've even seen these sort of in-store display ads that are changing based on the composition of the customers in that area. So if we've found that we have customers who may be interested in a specific product based on their purchasing history and we have a concentration of those customers, we can switch the promotions on the in-store display ad. So what kind of data goes in? Goes in to do that, what might the graph model for something like that look like? Well again, we're just combining the physical location of the user, what department they're in in the store with their purchase history from our retailer's transaction catalog. So and again, this is just pointing out that the typical architecture for something like this, the benefit of using the graph is this ability to combine sort of data silos where we have lots of different information about users and different pieces of the organization, a graph is a very flexible model to be able to work with that. Cool, so if you're interested in learning more about graph databases, there's a download for this O'Reilly book, graphdatabases.com. There's a lot of use cases, code samples, good way to get started. And sort of the takeaways I guess that I hope I got across from talking about some of these use cases are that there's more value to be derived from the structure of your sensor network with graphs and then also the ability for combining data sets from both sensors and from other sources and graphs are a good way to do that. So that's all I have, any questions? Okay, well thanks everyone.