 My name is not Bob, as you might have seen already. So a couple of the printouts, I think we switched the talkers and something went wrong. So my name is Etienne. If you were expecting Bob, I'm sorry. Yeah, welcome to the first talk of the day. Plenty of firsts today. So my first time at Foslem, therefore also my first time speaking at Foslem. But more importantly, the first time we're introducing VV8 to the public. And it's a very cool chance for us to speak in front of an audience that is either interested in graphs or is experts on graphs. So that's really, really cool. And we're looking forward to your feedback. VV8 is the decentralized knowledge graph. And we have the vision that any AI related task can be reduced to semantic question answering. And that's what drives us in building VV8. You can also see the word Semi on these slides here. So Semi is a company that sponsors VV8. It is an open source product in the Creative Software Foundation. And Semi is a company that offers enterprise solutions on top of it and supports the open source project. Cool. So for the next 30-ish minutes, I'll give you a bit of an introduction of what VV8 is. What the contextionary is that powers VV8. I'll get to that in a second. What you can do with VV8, how we build it and a quick life demo. So what do you get from it? I would love to spark that desire in you to just try it out. So we have a Docker compose file. You can just do Docker compose up and play around with it. And as I said, I would really love if we could have this targeted audience to get some feedback on both our API decisions on our implementation decisions, et cetera. Cool. So you see a GitHub link here. So if you wanna check it out right now, before I've even told you what it is, feel free and it's Creative Software Foundation slash VV8. On the title slide, we had the official name VV8, the decentralized knowledge graph. Here I've added another word, which is the word contextual. And I'll explain what we mean by contextual decentralized knowledge graph. So to understand the part contextual, I wanna introduce the contextionary. That is an AI based natural language tool that helps add context words. And I think it's best explained if we have an example. So think of the word King. If you have any kind of association, so these are the associations I had while building these slides. King could be a ruler, a monarch, also man and male is somewhat related to the word King. And all kinds of other associations. So we have fancy, royal, could be an oppressor, could be well-spoken, and depending on how hungry you are this morning, could also be hamburgers. But if we take a look at this word King sort of in a vector space and seeing as King of the sum of all of these words that make up the context, then we can perform basic mathematical operations on it. And one that I've done here is taking this man and male and subtracting that and adding the word woman or female. And then we immediately as humans, so I assume you're all humans, we've already established that we have one robot in the audience, I hope the rest of you are human, because then it's very intuitive to us that the word King becomes the word queen if all the rest of the context is the same. And this is something that's traditionally been very hard for computers and the contectionary uses that context to make these kinds of things very easy. So you could of course, when you carry something, you could of course just match for the string if you already know that you're interested in both genders of King or Queen, you could just say King or Queen. But these kind of queries won't scale so well. So if you say you have a relation between one entity, say a King lives in a castle and therefore castle, you have maybe the word Ford, you have these kind of different ors that you have to do and it won't scale so well. Whereas if you have the context, we just match one context to one other context and that makes that much easier and will be extremely helpful if you don't know the anthology or if you're operating on foreign data or if the data quality just isn't as good as you'd hope for. So other examples for how we can use the contectionary. So think of the word location, place, city. For an example, let's say we have three lists of restaurants like the top restaurants in your region or in your country. And these are made by different people and everyone has a different way of saying where the restaurant is located. So the first one says location, the second one says place, so the third one says city. Again, for us humans, it's very easy to see that this is really the same property that they're talking about. But if we're just matching strings, this becomes very, very hard. So by using context, we can very easily identify that this is the same property that we're talking about. Then there's also the opposite. So I have two homonyms here, seal and seal. So we can use the context for disambiguation. So think of seal could be an animal. So we have the mammal or we have the animal, the seal. And you also have the seal as in to seal an envelope. So depending on these different contexts, if you have mammal, ocean, et cetera, you can build the centroid of these words in the vector space. And this is what defines the context. And this is how you can disambiguate between seal the animal and seal the envelope where you would have, sorry, seal the seal as on an envelope where you would have stamp in the context envelope, maybe security protection, these kind of things. Cool, the next word that we had in the definition is decentralized. So in order for something to be decentralized, you need to have many different things that you can then either combine to something centralized or spread out to something decentralized. You can operate VV8 completely standalone, but you can also establish a peer-to-peer network of different VV8s. So why would you wanna do that? One reason is because it gives you the ability to completely own your own data. So you don't have to send your own data away somewhere. If it's private data, if it's customer data, have to think of GDPR protections, et cetera. But you can also still, because it's a network, you could enrich it with other data. So think of Wikipedia, for example, like a lot of human knowledge is very easily accessible, again, to humans on a Wikipedia. But firm machines is not necessarily that easily accessible. Yes, you can parse the text, et cetera, but establishing relationships is a bit more difficult. So you can of course say, okay, if I'm talking about a car in my database, I can add a link to the Wikipedia article of car. But it would, of course, be much nicer if you had that Wikipedia knowledge in some sort of structured graph form and you can get to keep all these kind of relationships. So this is something that's possible with VV8. You just connect your own VV8, your private VV8, where you have your private data, with a public VV8 that has, for example, Wikipedia data. You can also connect several VV8s to combine their different advantages. So I'll get into modularity a bit later on how we use modularity in VV8. But you could, for example, say, okay, I have one VV8 that's built to be very fast. So maybe it has an in-memory database where, yeah, we don't store anything on disk, everything in memory, so it's super fast. But we have another VV8 where we have a data set that's so large that we can't do that. And then by having it decentralized, we could combine those two VV8s to have the best of both worlds, ideally. The next term, and this is one that I already received a question about this morning. So what does knowledge graph mean? And in fact, that's not entirely easy to answer because I don't think there's one agreed upon definition. One very common way to refer to the knowledge graph is the way that Google uses it. So Google calls their knowledge graph the knowledge graph, and this is what Google uses to answer structured questions. So if you ask Google who is the president of, let's not go to that, let's say, who is the chancellor of Germany? That one's a bit safer, I think. Then Google will say, okay, the chancellor of Germany is Angela Merkel, maybe at her age, her education, how she got there, her party affiliation, et cetera. And this info, this related info is of course very easy to retrieve because there's a graph behind it. For us, with VV8, we also focus on the second part very much. So we are in the graph track here. And for us, that means we're building an existing graph technology. So we're building on top of existing graph databases. We're using GraphQL for our API because we think that makes it much easier to use graphs, especially for people who don't have any familiarity with graphs. And additionally, we keep adding like more and more semantic tools to VV8 over time, turning a sort of regular graph into a knowledge graph. So it's very easy with existing graph technologies to query for this kind of related information, but it's not so easy yet to understand these kind of questions. So if you just put a natural language question in, there's two parts basically, the natural language part. And then once we have that in a structured form, there's the graph querying part. And the combination of that is basically VV8. So what can you use VV8 for? So as I said already, you can combine data across industries, but because you have the context sharing, you don't necessarily have to harmonize your ontologies. You don't have to make sure that location is renamed to place. So it's consistent because you have the context. This gives you different abilities in all kinds of business cases. So I think for example, you are a bank and you have bank transactions and you see that people are buying Netflix subscriptions. And based on the price, you can tell that this Netflix subscription includes the 4K feature. So let's say if the bank wants to work together with a retailer, they could now think, okay, is there any kind of correlation between buying the 4K Netflix subscription and actually buying a TV that is capable of displaying 4K content? So that's sort of synergies that can be created there. Another example would be mobile providers have movement data related to postal codes. So let's say there's a specific street and you wanna know how many people from a specific postal area or from a specific area in your city go to that street. And of course, you don't have the individuals there because that would be horrible for privacy concerns, but just groups sort of overall, this many people from this area go there. So this kind of data is very useful for any kind of business that would be in that street. They could say, okay, maybe we have to drop some pamphlets at some other place because we don't get enough customers from this kind of area, even though it's not that far away. Or you could use this for restaurants for example. So let's say you're a restaurant chain and you're thinking about where should I open my new restaurant? So if you have postal codes of people that visit that area, you also have demographic data. So you can say, okay, does my restaurant fit into that kind of demographic? Also a transport sector. So let's say you have two postal codes that are equally as far away from your street that you're looking at, but people are only showing up from one postal code, not from the other one. Maybe there's something wrong with the transportation. So all kinds of insights where it makes sense to combine data from different industries in a decentralized way. You can of course also very easily gain more insights into the data you already have. So you don't have to use VVAD as part of a network. You can also just use it standalone and just have an easy interface onto the data because you don't have to know the ontology so well. You can enrich your data. So this example I had before with Wikipedia for example, you can use that to say find more customers. If you have data about your own customers, identifying who are your most profitable customers, like what are the aspects that make these customers your best customers. You don't wanna share that data of course. You wanna keep that. But additionally it would be very nice if you just had like a database of potential if you're in the B2B sector, like just a list of companies in a structured form that would match your criteria. So you can apply your own criteria of what makes this customer a good customer on that list. Another example would be fraud detection. So typical machine learning example is spam detection. So one of those sort of, if you look into machine learning, one of the first examples, you have a list of bad words and if these words appear in your email, your email is probably spam. With a graph with sort of structured knowledge in a graph, you can take this much, much further. You can say not just matching it for words but maybe have behavior in there and this you can use for fraud detection or any kind of other behavioral analysis. Cool, so let me just drink something and then dive into our first example query. So I said that we're using a graph QL and the reason we do that is because we wanna bring a graph technology to people who don't necessarily even know what a graph is but who we believe could benefit from that. In order to understand that query, let me define a couple of things. So the root query that you see here is a network and we distinguish between local and network. So if you have that standalone mode and you're just interested in the data that you have locally, it would be a local query, whereas a network query would be anything where you tap into your network. We then have things. So things are anything that are things in real life. So it could be cars, could be airplanes, could be persons and actions would be any kind of interaction. So in the context of an airline, the action would be a flight, for example. And then the next word we have here is fetch. So fetch indicates that we're doing something fuzzy. So we have the word get where you do something very explicit. You know what network peer you wanna get the data from. You know exactly their ontology, what you wanna get. And fetch is the opposite. So fetch is if you don't know if you have this kind of fuzzy search that you wanna do. So let me give you a bit of a scenario around this. Let's say we have airports all over your country, all over your region, you have different airports. And these airports have any kind of local airport management system. So you don't really know the exact data. This could be a different system anywhere. But you know that each of these airports has data of what planes are currently on the ground. So let's say we're an aircraft service company, a fictional aircraft service company, like a very mobile one that can just immediately drive to an airport if there's an airplane that needs servicing, don't think it's that flexible in real life that doesn't matter for our example. So now we wanna say, okay, I'm good at servicing airplanes of the type 777, where can I find them? And this is where this query comes in. So we're saying on this where clause here, give me anything where the class is plain with a certainty of 80%. So certainty is one of the constructs that we introduced here to sort of narrow down this fuzziness. And by that we mean anything that is related to plane, but we don't mean where the string is close. So the word planet, for example, with a T at the end would be very close if you compare the strings, but of course not in the context. So in the context which we wanna match for is airplane, aircraft, these kind of things. Then the second one, the second property that we're interested in is the model in this case should equal 777. And again here we don't know if this local airport management system says the model is, or the property is called model or maybe model name or just name, et cetera. So again we try to match this with a fuzzy search here. Then of course the word plane is also another one of those very ambiguous words. So plane could be an airplane, plane could be a sort of two dimensional level or surface in a three dimensional room. Plane could be in woodworking, you have the thickness planer to plane. So we have the ability to add keywords here to make it clear to the machine of what context we're talking about. So in this case we're adding the word airline and airport. And then if you hear these words and the next word you hear is plane, you immediately think of airplane and not of any kind of the other planes. And this context here in this case we're entering it manually in the GraphQL query. But if you think one step further, let's say maybe this question that you got to send to VV8, it didn't come from a GraphQL query but maybe you asked your Alexa on it. Then context could just be like what questions did you ask before? So if you were talking about airports and airlines for the past minute, the next question that you ask about planes is most likely gonna be an airplane and not any other kind of plane. Cool, so now we're transitioning a bit more into the technical part. The first one is here, how is VV8 different from existing graph technology? So VV8 is not a graph database and VV8 does not try to replace existing graph technology in any way. VV8 tries to enhance it, to build on top of it. So if you're sort of faced with the question, should I use VV8 or should I just use any of the existing graph databases? How would they be different? The first one of the USPs would be ease of use. So keep in mind here that we're targeting a demographic or targeting an audience that isn't so familiar with graphs. So I assume people in this room here probably know how to run a gremlin query, a Cypher query, a SQL query, these kind of things. But even if you don't, even if you do, you might say that, okay, these do definitely have a steep learning curve. And we're using GraphQL to make that easier. So with GraphQL, you have the ability to discover your API. So if you don't know the API and if you don't know the ontology, you have to use GraphQL or you can use GraphQL which gives you this sort of auto suggestion of what there is to use. And we think that we can reach a larger audience than with the existing query languages at a cost of courses with any abstraction. Then we have natural language processing and the contextionary. So contextionary, I explained already, natural language processing. This is sort of what we're looking into right now. Like ideally GraphQL would just be sort of the middle layer, but in the end, you can just ask a question, ask it to Alexa or just type it in a way and VV8 will give you an answer. Then the third part is also the modularity of the data store. So VV8 does not depend on any one specific graph database. So we try to make this very modular. The first connector that we've built so far is on Janus Graph, but there's nothing sort of Janus Graph specific. You could also put in a write a connector for Neo4j, Redis Graph would have you. One of the reasons for this could be the cap theorem, for example. So let's say you need high availability, then you have to make that trade-off doing one strong consistency or doing one partition tolerance. And in this way, so we've picked Janus Graph, which we interned back by Cassandra, which is known for its eventual consistency. So we said, okay, we need high availability, we need scaling, but we don't need the strong consistency for the first use cases. So a bit more on the architecture. The user-facing APIs that we have, I've already talked about GraphQL and we think it makes it very easy for users to discover the API. But we also, of course, have a REST API. So for importing, for example, if you have a flat list of your data and references or basically just IDs, then a REST API comes in very handy. So that's the other API that we have. VV8 is built as a microservice. So by that we mean that VV8 has a relatively small concern. We're currently looking into, as I said, these natural languages tools where you can just ask the question. And there we've discovered this is a completely different language stack. So VV8 is written in Go and most of the libraries there are written in Java. So that will probably be its own microservice. So we're benefiting from all the technologies that are out there. We're adhering to 12-factor principles we're using Docker in our everyday development. We're building or we're betting on Kubernetes. Like we're making sure that everything runs on the cloud. That is also one of the reasons why we chose Golang. So we're using Golang 1.11 here. So yay, Go modules for anyone who's using Golang. Because we think Golang has just proven to be a very good language in the cloud environment. It's sort of a good trade-off between stability, ease of use performance. Yes, there might be languages that are a bit easier to use. Yes, there might be languages that perform a bit better. But overall, we think Golang is a very, very good trade-off. Then we're putting a lot of emphasis on the design of our APIs. So we always design them first. In the case of GraphQL, we just have a small prototype which is like completely decoupled from VV8 which we really just use for prototyping and trying to get the API as good as possible. For the REST API, we're using Go Swagger. So that has come in very handy to just write the Swagger document first, then build the API to match it. And we're focusing very, very much on modularity, on plugability, for this I have two more examples. So the first one is, don't worry, by the way, if you can't read this code example here, it's not meant to be read. So for the database connector, I've mentioned this before already, we wanna be able to exchange databases. And one of the reasons I've mentioned already, so the CAP theorem, you might need a consistent database or you might require a database that is safe against partition tolerance. But there's other reasons, of course. So scaling is probably the most motivation for us to start with a Janus Graph. Could also be speed, could be any kind of specific features. So we really, we don't know yet every use case, and this is really something where we're saying, okay, we need to be a bit generic here and we need to be a bit modular. So for a database connector, really that's just an interface in our code and anything that implements that interface is a database connector, which gives us the ability to keep the remainder of the application completely database agnostic. I've already mentioned that the first connector we have is Janus Graph. So Janus Graph interned back by Cassandra for the data store using elastic search for the indexing and we're currently looking into potentially also adding Spark for analysis queries. So the reason for this code snippet over here is we have one empty connector that you can get started with. It's called the foobar connector, very creative naming here. And this code snippet is just to say, so this is the Godoc comment on one particular method that you can implement if you wanna build a connector and what we're trying to indicate with this kind of code example here is that we try to keep it very, very well documented. So here you have an example, GraphQL query that you would have to resolve for this kind of connector, an example return value, et cetera. The second part where we wanna be very modular is our authentication and authorization scheme. So in case anyone is familiar with how authentication works on Kubernetes, it's very much modeled after that. If you're not familiar, don't worry. Authentication is very, very modular there and we also try to keep that modularity. So we can say anything for authentication can be a plugin. So for example, you can just use basic auth for developing in production. You could use open ID connect for example, which in turn sort of gives you the ability to include many different authentication schemes, but also be saying, okay, there might be some obscure enterprise authentication scheme that we've never even heard of. So again, we try to be very flexible. And also here, anything really can be an authentication plugin in VV8 that resolves to is the user authenticated yes or no and also extracts the username or groups so we can then use that for the authorization. In the authorization again, very, very much modeled after Kubernetes. We wanna have a role-based access system. So in this example here, this particular user or group would have permissions to read on write on things asterisk. So on anything in the things API, but therefore in this example, they wouldn't have any kind of permission to read anything or do anything on the actions API for example. If you're interested in that, there's a GitHub link with the full proposal. Cool, so I mentioned Semi on the very first slide and I think in the context of open source, it's also always very important to say who's behind it. We all know that software development is very expensive and who are the people behind it and what do they get out of it? So in the case of VV8, Semi is the company and it's called Semi not Semi. The Semi offers enterprise support and consulting. So for enterprise companies, it's very important to get something like VV8 up and running very, very quickly and to get all kinds of help with the enterprise specific things. Advanced network abilities is also there. So I've mentioned these examples before, like wouldn't it be great if you have a VV8 that has been filled with Wikipedia knowledge for example or a VV8 that has a directory of small and medium businesses. And these examples were not too far fetched. So this is something that VV8 can provide. Another thing is a custom contextionary. So so far I've always talked about the contextionary and in the open source version we have a general purpose contextionary that is again trained based on knowledge or based on content from Wikipedia. But you might have the need for an industry specific contextionary for example and this is something that Semi can then also provide. There's also a playground user interface. It is just a user interface to visualize your graph and visualize your data which is really cool. Unfortunately it's not open source at the moment so that's why it's not the content of here. So next up I would like to give you a very, very quick demo which will be interesting with the handheld microphone here. But before we go to the quick demo part let me just quickly give you the roadmap so that you know what I can demo what is vision and what is already there. So we do have our REST API completed so for importing these kind of things. On the GraphQL API we currently have everything that is related to local queries. On network queries we have the get queries and get meta queries so you can query information that is very specific to one network node. We do have the contextionary completed. What we don't have though is a queries that make a lot of use of it so far. We have a GenesGraph connector so that was the first connector that we decided to build and that is completed. We are currently building the network, these fuzzy requests so this example this is actually something that I started picking up yesterday so this is very much in development right now. Also the authentication and authorization proposal is sort of an accepted proposal but nothing has been implemented yet. And we're currently researching very much this natural language thing so to get closer to our vision to answer any AI related task with a natural language question. Our team member Laura is currently looking into NLP technologies to put this on top of GraphQL basically. Cool so let me demo something for this. I quickly have to switch yeah. Thank you very much. Okay first I have to tell the screen that it should not, it should mirror. Cool. And now we just have to find how cool that worked. Awesome so here we have a GraphQL query. Let's start with the first one. So this is a local get query. We're creating for things in this case. Okay I think I'm missing an example but that doesn't matter. In this case we're saying we want cities and off the city we want the name, we want the population. Then we have a cross reference on the context of graphs. This is an edge onto a property called country and again from this country you wanna get the name and the population. Additionally we have a filter here so this filter says we wanna get any city where the path in country on the property country with the property name does not equal to Germany. So in our huge data set of four cities I think that we have here for demo purposes. We're now only getting the two cities that are in the Netherlands. So we have Rotterdam here with a population of 1.8 million. Amsterdam with also 1.8 million. In the Netherlands with 17 million. So if I then remove this, this where filter, run this query again, then also need to remove this. Then we have all the cities that we have in our database which also adds Berlin in Germany and Düsseldorf. So again the cool thing here because it's GraphQL, you really don't have to know the ontology. You can just rely on GraphQL to basically also autocomplete that. So rely on the typing system. This is something that we think makes GraphQL a very cool tool to put on top of graphs. Then here we also have an aggregate query. So in this query, we're saying, okay, we're aggregating things again and we wanna group our things. So whenever we do an aggregation, we have to group it by something. In this case, we're grouping on a primitive property that we call isCapital. So that's a Boolean property, meaning we get two results. One where this is false, one is true. And then we have just a very simple aggregation here on any non capitals. The largest in our data set is 1.8 million. The smallest would be 600,000. So that's Rotterdam and Düsseldorf. And then for capitals, we have Amsterdam and Berlin. So a 3.5 million and for the minimum 1.8 million. So that's the quick GraphQL demo part. So now I need to get back to my slides, which were here, cool. Thank you very much for holding the microphone. I'm not seeing my speaker view anymore. So please do tell me when we're out of time. I would very, very much be interested now in a bit of feedback from this room here. First of all, for, of course, now, but also later. A bit of contact information. You can contact Semi at Semi.network. That's the homepage. If you have any kind of sales related questions for enterprise services, et cetera, please reach out to either David or Misha. You can, of course, also contact me. Why would you want that? I'm currently one of the core developers of VV8. So if you have any kind of development related things, there's my contact info. There's my homepage, GitHub and Twitter that you can follow. Also my YouTube channel, where I have a couple of software engineering and DevOps related tutorials. So if you're interested in that, please do check it out. But more importantly, now, feedback. So for feedback, we're very interested into any kind of general questions or general feedbacks, but also some specific points. So the API design, so we're saying, okay, we're using GraphQL to try and make it easier to use. What do you think? Is that a good idea or do we strip abilities? Like is there anything that you can only do with Gremlin or can only do with Cypher that we definitely miss here? And very similar on the connectors. Like, do you think it's a good idea to have this kind of abstraction to be modular? Or do you think there's some kind of, let's say, specific features that only Neo4j have that we would benefit from? So this would be one of the other questions. So, feel free. So we have four minutes left for questions. Perfect, first one, yeah, I'll repeat it. Yes, yes, so the question was, is there a way to not just fetch anything specific, but basically the entire graph? Like if you don't know the anthology, but you know you want to query a specific domain, is there a way to just get anything? Yes, on the GraphQL API, we have an introspection function where you can basically do exactly that, saying you are in a network, you know your own anthology in the network, but you don't necessarily know the anthology of your peers, but you do know that, for example, your peers are in one industry, then you can use these kind of queries to just discover what it is in that anthology. Thank you. Other questions? Yes. What does the STL look like, the GraphQL schema for your API? What does it look like? Like how do you mean that? So usually GraphQL APIs are driven schema first, so you write a schema, and then you have the API on top of that, and then just wanted to know, do you have a like a static STL, or is it generated, or how does it look like? Yeah, yeah, no, I understand the question. Yes, so we have a fixed anthology that every VV8 has to have. Fixed in this case doesn't mean it's fixed, sort of it just means you pre-define it, but you can change it all the time, and then we have, in the peer-to-peer network, we basically cache the anthology of every peer, so peers can have different anthologies, and then we create the GraphQL schema of that, sort of in a, yeah, sort of it is cached at the moment you do the query, of course, because that's just how a GraphQL works, but it can change at any time, just have to invalidate the cache and update it. Another question, so the question was when we built the Contectionary, how automated is that process, and sorry, what was the second part of the question, and what kind of data sets can we consume? So the process to train the Contectionary is extremely expensive in sort of machine power, so we run that sort of once-on-one data set, the current Contectionary that we have is run on Wikipedia, so it's basically just text, but anything that machine learning can do can basically use to train the Contectionary, so for industry-specific Contectionaries, if let's say an enterprise company would have documentation on specific machines, for example, and we could use that kind of documentation to train the industry-specific one, and yes, it's automated, but it's not something we can do on the fly, like it just takes spinning up a big cluster somewhere in a cloud and running it for a long time. Another question, I can't say that entirely with certainty. I think not at the moment, but please do reach out to Bob if you're interested. Bob at semi.network, he can tell you more precisely, but I think it's not open source. Cool. Cool, and with that? Perfect, out of time. Thank you very much. Thank you for attending and your feedback.