 So my name is Jans Asman. I'm the CEO of France Inc. And our main product is a graph database, an IDF graph database. But I put this thing up here because when I was 22, I made my first money with writing software. I was working at a traffic science institute in the Netherlands. And we wrote a simulator so that people on bikes could see a screen with the intersection, with cars driving on the intersection, and then the students could, with a cursor, control their little bike because the Netherlands is more bikes than cars, and to try to get a life over the intersection. Yeah, so, and when I would give a talk, I always had my cars there driving on the screen, and then bicycles, and then once every 20 seconds, one of the bicycles would be killed. And so I would just start my talk and everyone would look to see whether someone would die or not. So anyway, that's why I show this thing here. It's kind of a fond memory of a long time ago. But I'll talk about this thing later here. So what I'm going to talk about is geospatial processing and tracking moving objects in an IDF graph database. So first, before I talk about geospatial, I will talk a little bit about IDF graph databases and linked open data. So who has worked here with IDF? One, two. And who's worked with the graph database? Ah, a lot more. That's good. All right? So I'll first talk a little bit about an IDF graph database. And then I'll switch to how we geospatial information in our IDF graph database. It's called AllegroGraph, by the way. And so I do a few demos and I'll talk about a few real-life use cases. I talk about how we enrich data that we scrape from Google and Bing. So we can do very advanced queries. I talk about something we did for a command and control system for the military. Now, when I talk about a little part, it can't be taped because under NDA, I can show it in public but not put it on the Internet. I'll talk about a customer of our Siemens that wrote a price-winning application that saves about $10 million a year in that company based on our triple store but also very, very geospatial. And I talk about an online banking example. So who saw my online banking example yesterday? One? Okay, good. That's good. And I... maybe two. I'll do that again a little bit. And then I talk about the main part. This is moving objects. So we have some patents in our database for 2D and 3D geospatial indexing. So I'll talk about the principles of that. And then I'll go into what kind of applications can you do with moving objects. And I'll do my demo again with the ships and I'll show you how you can do queries over their data. All right? So first, I mean, I guess you guys all know about graph databases. Yeah, instead of rows and columns, you have nodes and links between nodes. And actually, the only difference with a graph database and an IDF graph database is that the nodes in your database and the links between the nodes are unique URLs. Yeah? And that gives you enormous advantage because what you can do is you can have multiple databases all over the place. And as long as people take care that they use the same names, then suddenly I can combine information. Yeah? And I can do queries that touch many, many different databases at the same time. And this is all based on W3C standards. Well, you guys know that knowledge graphs and graph search are in, yeah? All our huge, our big area companies build huge proprietary knowledge graphs. We have Google that's building the knowledge graph so that if you go and search in Google Fully or another Da Vinci, then you get this page and on the right-hand side, you see a lot of information. So basically, what Google is doing is building like an encyclopedia of everything they can find in the world, every important person, every place, every organization, every product with all the relationships between them. And their architecture looks very much like an IDF graph database. Then, of course, you know about Facebook and the graph search you know about LinkedIn. But did you guys know that there's a beautiful, much bigger knowledge graph out there called the Linked Open Data Cloud? Who knows about Linked Open Data? One. That's not many. All right, so what is it? So in 2002 or three, people started with the standard for RDF or the Resource Description Framework, which basically is a way of describing objects in the form of triples, yeah? And already in 2007, there were a whole bunch of databases out there that were, well, that you could use and were publicly available. So for example, you see here the Dbpedia, which currently is about 1.8 billion triples that describe all the contents of the Wikipedia. So this is a fantastic Encyclopedia that's now machine-readable. Then you have GeoNames, a database with 7 million places on Earth with latitude, longitude, the alternative names, et cetera. You might have the US Census database of the year 2000. And what you see is that all these databases are linked together. Well, actually I say databases, but most of them are or just files with millions and billions of triples or Sparkle endpoints, where Sparkle is the query language for RDF graph databases. So just to give you the principle, one demo that I sometimes do is where I say, so what was the average income of the place where Barack Obama was born? And most of the time I get this thing about, yeah, but that's in Kenya and they don't have the US Census database, but once we get past that stupid joke, I can tell people that I have Barack Obama, I look him up in the Dbpedia, I find where he's born, I go to GeoNames, I find the latitude, longitude, I find, say, all the cities within 10 miles of that point, then I go to the US Census database and I find the income, yeah, and I do an average on that. I can do that query in about 100 milliseconds, yeah? So I can talk to three databases, I can answer a fairly complex question and use these publicly available data sources. So this was 2007, this was the same picture of 2011 and after that they gave up, yeah? They gave up in the picture. They gave up in the picture, sorry, yes, yes, yes. So this alone are about 20 billion triples in the domain of life sciences, for example you might have a database with about 100,000 clinical trials or 1,800 drugs or 4,000 diseases and here you have a whole bunch of databases that are all geospatially oriented. So you might have linked sensor data, you might have the fishes in Texas, you might have geospacies, the World Factbook, GeoNames, linked geodata, ocean drilling codes, anyway, yeah, you get a massive amount of stuff all that are centered around geospatial and of course all linked to other data sets all over the place. And guess who made this particular picture? The people from DPPD because they're in the middle but they're actually also an amazing fantastic data set so they deserve to be in the middle, all right? So that's a little bit about IDF graph databases, it's about linked open data but that's just a background because I promised to talk about geospatial and moving objects. So in our graph database we've implemented geospatial reasoning and what we've done is that we, and I'll talk about it later but we have created a special 2D and 3D indexing so that if I have a particular event, something that happened somewhere, I can find all the events that happened within 5 miles extremely fast and also we implemented polygons, so if you have city boundaries or any kind of polygon we can tell you all the objects in this polygon in record time. So that is our geospatial indexing details and by the way you can use both a normal coordinate system or you can use the sphere of the earth. So we implement both forms of geospatial coding and then I'll talk about this later but we also do temporal reasoning because many ways durations can relate to each other and we wrote utility functions so you don't have to write a whole bunch of code to say something about time, we just made it very easy to say something about when something happened. Okay, so that's about the background and now let me show you some applications where people use this geospatial. So the first demo, there's actually already a demo that I created about two, three years ago but I showed this demo all of the place but I'm fairly new at NoSQL, the conference, so let me do it here. So what we did is we spied at Google and Bing very carefully, otherwise they kick you out and you can't access them for days. So you spider and you get HTML pages or other pages and we extract all the people, places, organizations, events and important events, FRPs from the HTML, HTML, okay, well anyway. Then what we do is we link the people to this fantastic encyclopedia called the DBpedia, we take the places and we link them to geonames or the US Census database, we take the organizations that we find and we link them to Freebase and DBpedia. So now I know much more about these people. It's very complicated. That's one of the hardest problems in business is you have a customer database, you use company and your customer will be in 20 different places just because the names are different. So what we have to do, you find Obama, well there's all kinds of tricks that you need to do just to find a name and then if you have people with the same name what you do is you take all the text in the DBpedia and all the text around in a newspaper article and you do some kind of intersection search to do kind of disambiguation. If you get the most matches for the words surrounding then that's probably the person. You never can be sure. Especially if you call John Smith. So this is the architecture we use. We have spiders that we can control with URLs or word lists or taxonomies. We get information from the web. We apply text extractors or scrapers. We get all the places, organizations and people out. We store them in our database. We link it to the linked open data cloud. We also put it in solar for special indexing and then I can show your data and I can query your data with our visualization tool. And this is the kind of queries we can do. We can say, well, which republic can talk to an oil company less than 100 miles of temp at two weeks ago? Try to type that query in Google. It won't help you yet. It will come soon. We see now the Knowledge Graph, so they're working on this, but you can do this and implement it yourself in the triple store or find a newspaper article with a democrat and a republic that are in the same committee and have the same religion. Yeah? Or which scientists had mentioned in the news yesterday that also mentioned the city that was less than 100 miles of temper. Again, all that stuff you can't ask Google or ask.com. You have to enrich your data before you can ask those queries. And as I said, within the two, three years, Google will be able to do this. So demo, so I have... Oh, I've got a lot of time. Okay, so here is a tool called Graph. It's a graphical interface to our database. And what I've done already is I opened the database and to Google. And is there anything in the news you're really interested in, other than Obama, maybe healthcare, maybe the CIA, NSA, all kinds of cool topics? Baseball. Baseball? Oh, sorry. Oh, no, this is an older demo. Anyway, let's see. We have some newspaper articles. Oh, okay. So this is a one-day of Google, yeah? And I can look that up. And so here are three texts that have the word baseball in it. So I do free text indexing. I double-click and here you see the triples that make up the graph. Here I see that this text has some important concepts, has some organizations like the American League, the Baltimore or whatever, some people, some places that are mentioned. And then we linked things to geo-names. Let me find one that is actually in geo-names. Okay, so in that newspaper article was also a place. I linked it to geo-names and now I can show you the triples that come from geo-names. This is about KS Kansas City, which has about 140,000 people. This is the latitude and the longitude. It's a populated area. How high is it actually? About 266 meters, et cetera, et cetera. Yeah, you get the point. And I did the same thing for people, so I could look at the James Shields and it would find someone that's probably incorrect in this case from the American Civil War. Yeah, but just see what I did. I took a name that the entity extractor found, just a dumb string, and I looked it up in DBPD and I find more information about that person. So now, given that I have that, oh, by the way, I can explore this graph on the screen. So if I say, well, I want to explore this graph, but only look at organizations, people, and places, I could say, well, let's look at this, and let's look at this text, and this text, and let's reformat it a little bit. And what you see is that all these clouds are actually little independent clouds, so I don't have that often. So let me see if I can find any link to this thing. Oh, yeah, so there's one way to link this text to the other text. Can I maybe just pull a little lonely thingy here linked here? Yes, I can do that too. So I can use the graph database to do my own linking and seeing how I can find from one graph to the other graph and then just explore. Yeah, here you see the names of the links. So it has linked data name, or it has a place, so if I do this, you see that if I click on has people, you see all the people here. If I look at the types of objects, I can see places here. Does it make sense? So the colors of these things are the nodes, and the links are here, and here you see the meaning. What I did is I said, let's explode this node using these three link names. And somehow they didn't hook up. It's the first time ever when I do a demo that happened, yeah? But then I said, ask the database, can you find the shortest path from this thingy that I have here and this thingy that I have here, and using the three predicates that I selected, can I find any link between the two? Any find of a link, yeah? I mean, I could use any... Yeah, I could take another predicate, I could take these guys away. Yeah. Yeah, well, let's not... because otherwise I could later in trouble, but... Yeah. And I can do, I can say, I don't know if there's any NSA for this day in here, no, not for the NSA, maybe the CIA. So here's some text about the CIA, and I probably can link the text to the CIA, to associated press, okay? So we do this, too, for life science databases where I have a gene here and a drug here, and I can say, show me all the links that I can find through clinical trials or drug interactions or whatever, yeah? So this works for any kind of thing you can imagine. Does this all make sense? Okay, so now you want to do these queries I talked about. Let me do it first and not a query. Yeah. I could just select all the predicates. Usually there's a lot of nonsense in there. Yeah, because they just have the number one as a property of something, which is not really very uninteresting. So the first query, this is in a language called Sparkle, where I say, well, can I find a text that has a property link data... a property, well, a link that's called has link data name X, where X has the word-nep type scientist. Yeah, so this comes straight into the geodbepedia. And what I find there are, about for that particular day, there were five, six scientists in the news. Yeah, so if I look at them, you see the guys there were in the news and the text they were linked to. If I want to know them again, so if there's any possible relationships, well, that's not interesting. There's no faster way to do this. Oops, and here's too many relationships. Anyway, this is... I can go back. So this is the question about scientists, but now say I wanted to find all the scientists in a text where the text also has the name that was within 100 miles of Tampa. So now I have to actually look in a geonames database to find all the places within 100 miles, and then I've looked for scientists. So I do this query, and they say... Yeah? So you probably can use this query. You say text with city and radius, text, Tampa, 100 miles. Yeah? So give me cities that are within 100 miles of Tampa. Yeah? Mentioned in the newspaper. Where the text has a person, where there's a scientist, where the text has a particular title. So now when I do the query, I get it's text 540. Name is William Henry Phelps from Wichita, and the report sees a disparity in Medicare costs. Does it make sense? Do you see the power of this? Just by looking at Google, you couldn't see it. Now you can see it, and the geospatial part is mixed in. So the next thing is... Let me go to the next demo. Or the next... So the question is, this is all fun, but do they use this in the enterprise? Well, Siemens recently, and I last year, built a mini Watson to answer natural language queries for salespeople about turbines. So Siemens maintains more than 100,000 turbines in the world. Yeah? And that's a lot. And so the salespeople, but also the maintenance people have a lot of queries, but all this data is on lots of different databases. So they wanted to get all these databases together and then ask very advanced queries. So they built, like Watson, a very interesting natural language system based on UEMA. And so this is the kind of questions they can do. What is a mega cluster? Give me all the active units in China. What are the service regions of New York? Yeah, which is a geospatial factor, et cetera, et cetera. Now, a lot of this information is all in Allegograph. But what you also can see here, like Watson, they use a lot of linked open data. So please note that Watson uses many, many of these data sources. So Watson is one of the smartest systems on the planet. Yeah, people will easily give you that. It's heavily using the whole linked open data cloud. And so the mini Watson from Siemens uses the DBP, the Freebase, Geonames, and to our delight, Allegograph here to store all this data information. And this system won the Siemens Innovation Prize in 2013. It saves more than 10 million euros per year. And then they stopped counting, they said. It's a candidate for the AAAI 213 prize. And if you want to read the paper, then go to our booth and we have copies of that paper there. Okay, so that is how they use it in the enterprise. Yesterday I talked about how we actually also use geospatial and a system that we're building with a big online bank. Yeah, and I think there was one person I've seen it, so I apologize, but I'm going to explain it again. So imagine you have accounts, they were opened at a particular time, at a particular IP address, with a particular email address, a linking bank account, et cetera. Yeah, and then you have events, yeah, payment where the sender is an account, the receiver is an account, at a particular IP address, a time, et cetera. Yeah, a very simple graph. And what you want to find is interesting patterns that might indicate that there is fraud going on. So the first use case that we worked on in this thing is what they call collusion. People working together to get money out of the system. So for example, you have, say, what you were looking for is, give me a bunch of accounts that were opened in one hour. Yeah, and then within 30 minutes, other people send a huge amount of money and spread it out over these other accounts, and then in the next half hour, yeah, the money flows out of these accounts and gets into the external accounts, yeah. Now, that is still all reasonable. There might be a group of people that want to go on a vacation together. So the question is also, do the senders and the payers know each other? So what we do is we look at these accounts and say, well, these guys are both connected to a known fraudster, and these two people opened accounts from the same IP address, yeah. And these two people did it from the same city or within 10 miles from each other. So there's all kinds of geospatial and graph connection that you can make between these people, and you can do the same thing for the senders. And the more these people are connected, yeah, the fishier the whole thing gets. Does it make sense? All right. So for this case, the transactions were stored in Hadoop. We imported less than 10 percent of the data into AllegroGraph, only the graph itself, not all the other stuff that they need for their bookkeeping. And then we started finding these patterns in the data. And so let me do a quick, quick demo. I still have the time. So this is a tiny demo running on my laptop. So the server is running in the VM. The graph is running in Windows, and they're talking to each other. And the typical thing I would look for, well, is there a person here? I'm just showing you the basics of this little thing here. Yeah, I could say, well, you guys now know how the system works, so I'm choosing sender and receiver, yeah. And I can say, well, this here is the, some kind of looking, so what you see here, let me just do a few here. And reformat, and they can make it a little bit smaller. Yeah, and I can stay saying, can I, so what you see here, the big ones are the accounts, the small ones are the transactions. So if I look at an account, you see there was an account daytime. There was an open IP address, the number, the account place. Actually, this is again in GeoNames. So if I look at GeoNames here, you see again the same triples you saw before in the Google demo, yeah. And then you see the payments they made. Ooh, he made a lot of payments. Okay, now, everything above this gray line is an outgoing link, everything under the thick line is a relationship to you, yeah. So this is what we call a, yeah, still a buyer. So a lot of people that paid you money. You get the points on how this all works, yeah. So we can do, this is the graph that we do our data on, but then when we do our Geospatial queries, I go to the graph database for now, and then I don't want to go too deep, but what I wanted to show you, the power. So have you guys ever seen a Sparkle? Who has seen Sparkle before? One, two, yeah. Okay, so here you'll find something new, and I'm now talking only to the people that know Sparkle, yeah. So basically Sparkle is a graph query language. One thing that we added to the graph query language, it still fell at Sparkle, is that the predicate in the middle can now also be a function. So actually, instead of looking in the database to find a relationship, when the database sees this magical predicate, they call that, sorry, magical predicate of computed function, then actually it will do something in the programming environment and then go back to the graph. So here is a query that said, did the most important friends, did someone of the most important friends of Sonya make a payment within 100 miles of Rotterdam in New York in the last 10 years? Yeah? Now, you probably have no clue on how to read this if you don't know it, but let me just quickly try. So you say, give me a place with the label Rotterdam in New York that has a particular location, so this is a variable, yeah. And then look at the person with the email address, account with the email address Sonya, that has a group that is two levels deep using the paid relationship, so what you see here is what they call a magic predicate, then compute the active degree centrality in this group. So for each person in the group, you compute how important it is in the graph, yeah. And then you look for events where the sender is this particular member, has an email address, and lives within 100 miles of Rotterdam. And when you do the query, you get the results here. It's actually only two people that made all these payments. Does it make sense? Any questions about this? Any questions? It's a mix of graph search. And then, for example, here, yeah. So for example, we say if there's a member having an email, then you actually look in the database to find, given that you know the email, what the member was, or say you know what the member is, then you can find what the email is. You do that directly and look up in the graph. But the next line, Franz Geo in Circle Miles, is actually a function that computes, that is computed, then it returns results. So actually the Sparkle looks now a lot more like Prolog, if people are familiar with Prolog. We can now build functions into the Sparkle. It's still completely fell at 100% double 3C compliant Sparkle code, yeah. But it's gotten a lot more interesting. All right? Yes, are you providing or extended to? Right now, we have, all these things are built by us. In the next version, we make that something where you can write this in JavaScript or in Lisp. So with a JavaScript to machine code compiler, where you can write your functions in JavaScript, and then you can hang this in Sparkle. So you can do much more. All right, okay. So this is a quick little demo. I hope you get the gist of what I was trying to show here. And then let me go back to my presentation. So then about moving objects, where am I here? Yeah. So we don't have many patents in our database, but our 2D and 3D geospatial indexing is patented. And basically what we do is say you want to find things on the surface of the Earth. You have to tell the system that you want to define the world in strips. So basically what we do is we take, you say, I want to use strips that are five mile wide, and then within the strips, we have sorted the data. This is described on our website. And once we did this patent and got it, then we decided that we found out that Google and Microsoft are actually doing the same thing. But that's what happens to patents here. So anyway, so if you now want to say, give me everything within two miles of Berkeley, then actually you have to need only one disk I.O. Because you say, well, I know what Berkeley is. I look in geonames to find the latitude and the longitude. I have the longitude and the latitude. I can compute the strip size, the strip number. And then within the strip number, I can go straight into our indexing with one disk I.O. I can find the things close to this particular, to Berkeley. So that was a technique that we designed. Now if you make your strip size too small, then you actually have to look at multiple strips. So now you have multiple disk I.Os. But most of the time you kind of come up with reasonable strip sizes. And what we do is actually compute maybe one for five mile if you want to look in the city and a hundred mile if you want to look between cities. Doesn't make sense. So you can actually have multiple indices on geospatial. I'm going to do that now. Thank you very much for this question. So this was 2D where we defined a plane or the world in strips. And then we did 3D where again in 80 bits, we now have a time strip and then a latitude strip, a longitude strip and then the time modulus and the lat modulus. So now we have the fight of the world in blocks. So now if you say for a particular point in the world given this particular time point or duration, again with one disk I.O. I can just jump straight to the point on disk where probably your data will be. Again, if the time is bigger or the block is bigger you have to look in multiples places. But if you choose the right strip size for time and space, one disk I.O. and you get your data. And so now we can do things where we both look at time and space in a much more efficient way. In the bank demo you saw I was looking 100 miles in around a particular Rotterdam but you saw I also looked in a particular time interval. But we had to do a join for the two. Now I can do a joinless search both for time and for space. And with that we are looking at multiple projects right now. We can track data for animals in a biodiversity project. We can track ships near the coast of Africa. So you want to know whether two boats are close to the other whether it's a pirate thing going on. People are looking at this for trucks and airplanes and ships and for telephone services. And then of course there are government agencies that do the same thing. Okay, well let's not go there. So we, as I said, we track ships in the Bay Area. So using AIS data, anyone ever heard of AIS? So every big boat is required by law, by sea law, international law to tell every so many seconds where you are so that they can track you automatically and warn you of things go wrong if you get too close to each other. So you can see there's a whole database called marinetraffic.com where you can get these maps with boats and you can download the AIS data from there. It's very interesting data sets to work with. And then I just showed you before at the beginning of this talk. So one of the questions that we want to answer for this project close to Africa is here are two boats very close together for a while. There are a few boats in the same spot for more than an hour because then they might be transferring drugs or people or just have a party, whatever. But you want to know what's going on. So we have this data. We do our experiments with this. And how does that look like? Let me see if I can actually do this here. They won't have AIS. So there's also satellite data for the same data. For the project we're doing, you have satellite data and AIS data. For the ship demo here, we only use AIS data. That makes... I mean, yeah, the pirates won't have AIS. If I can find the address one... Yeah, maybe this one. Okay, let me do this way. So the kind of query you can do, it's in this case is in ProWoc, but I can do this in Sparkle too in the next version where you say, well, you guys know what half a sign distance is. It's like distance between two points, but taking into account the curvature of the earth. And so we can say triple inside Haversheim miles, given a particular triple that is at... and people that live in the Bay Area know that this is less longitude of the Bay Area. In this particular hour. So give me those, I can execute that and I get the subjects at this particular point. So we already have this implemented and we'll try to make it more user-friendly right now, but like the Sparkle query I showed you earlier, this will be in the product. Okay, so, and then I'm almost done. See here. So then we wrote a paper that I think is also at our booth about the performance of this particular new capability. And we also... we wrote a paper about the complexity of this and why it's important that you need to have this particular 3D indexing. So, for example, what is simpler to do is to say, what is within a given bound from a given let-long time? So you have a location and then the time, just a block, a 3D block, and then it's just one disk IO. If you say, detect when two given moving objects are within a given distance, yeah, it gets harder, but now you have to traverse both of these lines already and the computation has to be fast. If you say given a moving object, detect all moving objects ever within a given distance. It gets even harder and I'm not even going to explain it, but it's a beautiful paper about the space-time complexity of doing this particular kind of search. And that was all I wanted to say today. Thank you very much. So any questions? Yeah. Well, the reason was we wanted something that fit into our existing indexing scheme. And that's why we chose. I can do this offline. This is far too technical. Far too technical. Hi, Ann. Yeah, other questions? Yeah? It would be interesting as well. And then you'd like to say something about this. 4D. Well, we're looking into 4D where you have let-long and start-time-in-time, because that's another big problem. Even with 3D, it's hard if you have very long events and very short events. If you want to say we're two people together at a particular time, a duration, and these are very long durations, and you have to strip-size. It gets really, really, gets unhappy. So we need to work on 4D where we actually can do indexing on... It's the same with geospatial, by the way. You can do the same thing for time with a strip-size for durations. So we know how to do it, but that's the next challenge. Yeah? Yeah. Yeah. Then you have to do joints. Yeah, okay. Well, I understand what you mean. Yeah, you can do... I was wondering if there's an interesting problem. Yeah. Well, we have an easy solution to 4D, because in our triple store, we can traverse 2 million triples per second when we read through. So if, say, we take latitude-longitude, and we have a particular area, and the number... Well, the time variation is much smaller than the geospatial location, then we can... Well, if the time variation is smaller, then we do TXY. If it's vice versa, then we do XYT. It depends on what has the highest variability. So you have to do the smallest number of lookups. But again, say you first look at latitude-long, then you maybe have to traverse a million triples. Then you can just go straight through them without doing any other B-tree search just a straight read of the triples, and then just the checks where the things are in a particular duration. Then you can do an easy 4D thing. But it would be better if we had real 4D indexing, because then again, we jump to a much closer point, and you can get it down to milliseconds. But we haven't done that yet. Other questions? Or shall we stop? Okay, one more. Oh, you can use it also for height, but then you can't do time. All right, thank you.