 Thank you Michael for the introduction. The room is not so crowded anymore. All the blockchain people left apparently. Yeah, so I'm going to present a graph query language proposal called Decor to introduce a little bit myself. I'm a graph database systems researcher from theorist and I did research on graph data management, schema evolution, schema flexibility, adaptive indexing and other topics and next to that more industry related activities are together SAP HANA Graph and the OpenCypher project Stefan told us about earlier and also the LDPC Graph Query Language Task Force, which is the entity who came up with Decor, which is the talk about the talk will be. What is LDPC? Some people may know LDPC is the Linked Data Benchmark Council which was founded out of a EU research project and it's a non-profit organization that is devoted to specifying benchmarks for graph database systems to make systems comparable in terms of their performance and out of that work when you want to define benchmarks you have the problem that you have to define queries which are run by the benchmarks and then you run into the problem that it's hard to do that for a graph database system because there's no standardization of graph query languages which was sort of the motivation to have a work group on graph query languages and this working group or this task force consists of a group of people one half from academia like myself and the the other half from from industry and more or less various from various vendors of graph database systems like NIO of course, but also SAP and Oracle sparsity, etc. And then the nice feature of that forum is that it combines practical experiences in how people use graph query languages how they how we can implement systems with them this combines this with the more theoretical knowledge from academia, so what are the complexities of certain query language features etc. So the result of that work, so the task force work for roughly two years and the result of that work is now this query graph query language proposal called G-Core and it is accepted as a Zygmort paper and will appear on this year's Zygmort and if you're interested in the details already you can go to this pre-print linked here on the slide So to explain G-Core a little bit more Let's start with what is G-Core and what is it not So G-Core is not a graph database system But it's a graph query language. So it's a language specification. It specifies a syntax and semantics So there's no system you can download install and then run these queries. That is not was not the intention It's not commercial or proprietary But it's a design of a query language Of this mixed group of people from industry and academia Which tries to blend these different sets of experiences Together in one language proposal, which we think is a good direction for graph query languages in that sense It's also not a standard Doing standards is really hard and involves a lot of politics, etc and The nice thing about this LDBC group was that we were mostly able to keep politics out of it So really concentrate on the matter of designing a good query language and we also didn't want it to have all possible Features but concentrate on a on a core of features which we think are crucial For a good graph query languages So the intention is more to provide Sort of a vision where we think graph query languages that that are actually implemented in systems where those Should evolve to So what the what we think the fee the future of graph query language should look like And G core to get a little bit more into the details G core is based on the graph tree and the sorry the property graph data model We already had a nice introduction by by Stefan to the query model So I will more or less skip over that but essentially a graph of rich objects and When we started with the work on decore, we sort of tried to formulate Certain challenges which graph query languages face or what we see From existing Query languages and the first challenge is that Good query languages are general so that does not only apply to graph query languages is The query language should be composable That mean that means that you can take in a one result of one query and Use that as an input to another query as you can do in sequel ever since But with the most existing graph query languages, you cannot really do that or you can do it just in very limited ways So what we wanted to achieve is what we call a closed query language That means the query language was this in the mathematical sense closed over its data model So in input and output follow the same data model and by that you your queries become Composable and then of course you have to see that we have Certain extension points and subqueries et cetera to to mainly make use of these composability The second challenge which we try to which we try to address is our paths Paths are very fundamental Abstraction on on upon graphs And you want the query language to be able to to reason about that and and process paths and What we see in existing query languages that that The you have query Possibilities, but it's really hard to further process paths and The best what you currently get is that you can sort of extract the path of the query language and then further process it Down the line in your application What we want to have is that? Paths should be first-class citizen in the query language and in the data model And that's with g-core. We we made a proposal for that how that could look like and The the last challenges that we wanted to concentrate on a on a core for graph query language So not don't take this is a full-fledged Example there are or full-fledged proposal there are Things which you deliberately not considered because we concentrated on on the core for querying so for instance one thing which we not considered deliberately is Manipulating the graph data, so which is just really a query language in this core sense of query and What we wanted to achieve for that core in g-core is that all the Mechanisms that are defined there are efficiently evaluatable so that you can build a system where all the Where the query evaluation stays tractable no matter what query you formulate in the language So the language does not allow you to formulate a query which is cannot be evaluated with a tractable algorithm Which is a strong? theoretical Guarantee which this language gives Okay, um the the main time I want to spend on is talking about the The closeness of the language and how you can use that for query composability and Did the second main part is on paths? So why is query composition? Very important for modern career languages Because what we see today is that a lot of data is captured either automatically or Typically somewhere else so like back in the 80s we designed database systems and usually the people who consume the data where the same who Fet the data into the system, but nowadays that that's not the same anymore. The data comes from somewhere else and The effect of that is that your base data so the data which is stored in the database system is typically very as a very fine granularity a Low abstraction so for instance in To give you an example to illustrate that Your database would store individual Twitter messages retreat relationships something like that And on the other hand when when you as a user go to these Go to this data particular for analytical tasks You usually talk not about these low-level concepts, but about High-level abstractions of that like on on such Twitter data you would be interested in communities discussion topics discussion threats, etc. And these are all Concepts which are not in the base data. So you need To abstract from the base data to do actually Things with these higher-level concepts and often you have multiple of these abstractions steps for instance, you might define sir first what is a Discussion a discussion group and then you may define on the notion of discussion group what a community is and so on And of course you To to be able to sort of cross this Concept Chasm, I call it so this division or this gap between the concepts in the base data and the concepts you want to use as a user You need mechanisms that allow you to do this abstraction and of course that should be your career language And to do so the career language needs to be composable so that you can construct or Put multiple of these abstraction steps on top of each other and if these are graphs down here and You want to talk about? Higher-level concepts, but also with graphs Of course you must be able to sort of create new graphs along the way. So define basically Higher abstract graphs based on on low abstract graphs And that should be all be doable in your career Okay, how can it be done in G core? I will use this small graph here as an example. It's basically a social graph inspired by the ldbc benchmark data, so we have people with names and employees and Messages they Exchanged in a form etc. So the details are not so important Okay, as we have seen to to be able to abstract a higher level graph some lower level graphs We must be able to create graphs in a query and for that G core every G core query has a construct clause The the other parts of the G core query match and where is pretty much Like what, you know from existing graph query languages like cypher or pgql So it uses this ask your syntax to define a pattern You can say on which graph you want to match and you have the where clause to add additional predicates As we have already learned from Stefan's talk earlier So what what is basically new in G course this construct clause which defines? What the output looks like and the output for for G core query is always a graph And what we do is we recycle the pattern syntax to also specify a pattern here Which is then instantiated for every Result of the match clause so in that case the output could be for every match we would just output one node and Since this variable appears also in the match clause it would be exactly the The node which we have matched here the person node So the output graph would consist just of nodes no edges, and they would contain all the person nodes from this Social graph where the employer is a Of course, you don't you would not only want to create nodes, but you also want to create Edges You can do that by specifying an edge in your construction pattern And what this query shows is sort of in data data integration scenario So the match clause matches on two graphs the the social graph we saw in the example and some Imaginary company graph in the social graph you match persons and in the company graph we match Companies and then we look for sort of a we do sort of do a join Where the company name is mentioned in the list of employee names in the for the person and for those pairs of Companies and persons we would create An edge between those so that we get a graph of companies and names and Edges, which so who works where and then what we could do we could join this with the original Sorry union this with the original Social graph so that the original social graph is sort of augmented Or that the output is an augmented version of the social graph Where we also have the company name the company nodes and the works at edges But note that this query would not modify the social graph So it's just that the output would also contain everything that the social graph contains Then in the graph construction we can Do more so we can really create new things new nodes new edges So assume we only consider the social graph and match the persons again, and I want to turn the the Employers mentioned in this property into nodes Like like a normalization step for graph databases you could can think of So then we could introduce a new node By giving it a variable which is not mentioned in the match pattern and say okay We want to create a new node with the label company and the name should be the the names we find here in the employee Property and then we link this new node to the person who works for that company This was give give us multiple or the problem of that query formalization query formulation is that it would give us for every mentioning of a company here a Company note so if two people work for IBM we would get two IBM nodes, but of course we want to have just one IBM node, right? So this you can also do in gcore Because you can group the creation of a graph object like a note so you can say I want to create new nodes X group by E which is the employee name Right, and then you would create in a new node X only for each distinct employee name So if you have two persons working for IBM you would just create one Node for IBM, but still those two persons would be linked with this new edge to the IBM node So in that sense it's pretty intuitive of what happens here When you group then of course you can also aggregate so it's very Simple to for instance have Augment this a company node for instance with the number of employees by adding a property here and With an aggregation function which counts all the sorry it's a mistake not all the X, but all the N and then he would count all the Persons who have this E like the company IBM mentioned here as an employer And when you create multiple new nodes, they can have different groupings And so you can also involve a fun query Compute different aggregations over different groupings As that you can with the union you can sort of augment a graph or have compute augmented versions of base graphs With a union you can also do that in a construct clause by just mentioning the graph here As sort of the simplest pattern you can have it would be the same thing the query would compute These or the result of the query would be the social graph Union with everything which is produced for by this pattern and Again, this is no base data manipulation So the graph to which we refer here stays the same. It's just that the output of the query contains everything which also the base data contained plus the Additionally created nodes and edges Then of course you can also query for reach abilities as we know from other query languages So if we use the Syntax with double slashes to distinguish it the querying for paths clearly from querying for edges and then you have Syntax as you would expect for regular Path queries so you can have regular expression over labels here to specify for which paths You are looking for Also similar to what Stefan mentioned in his talk earlier Um It is One additional thing which I want to mention is here we here we in the backlaws. We see a predicate which consists just of a pattern and the semantics of that is That's an existential Predicate so the predicate is true if for the bound variables n and m this Graph pattern exists in the graph and since our query languages Composable we can actually define that as Think as a syntactical shortcut of really a subquery existential subquery which is shown in that On that slide so this the same query again and and this predicate It's just a syntactical short-term for actually having an exist with a full-fledged G-core query as a subquery where the match pattern is exactly what we have here So here you can see that the when when you define or design a query language in this Composable and closed way then You already get a benefit out of that So that you can have Special syntax without defining additional semantics, which is sort of rewriting to what you already have Another benefit of having a compositional query languages that it's easy to Define views So you can think of That that you can have a statement create a graph view. You give it a name And then s and then the g-core query and that would define you a view Which exposes a graph and the graph would be exactly the graph created by this query here Other features we see is that you can have optional matching so outer join semantics Yeah So what this query actually Creates is or when we look at the view the graph we see from from this view would be the Original social graph we have seen augmented by additional edges Which where each edge between two persons have the number of messages which these persons exchanged Like these two persons here have exchanged two messages So the number is two and it's something you can compute with deco queries So let's now I come to Paths as first-class citizen So the idea is basically to make paths first-class citizens is That paths also can be objects so you can query them and then Objectify them by sort of creating a new kind of object which represents that that path you have created and When the when we turn the path into an object you are also able of course to attach labels to attach properties So all the nice things you know from edges and nodes And how would you do that? Well, you would have a match query with a regular expression again, and if you Assign that path pattern a variable P then that variable is bound to that path and when you construct a new graph You can include these this path And what you would get first is of course all the nodes and edges which path that path consists of And and then you can also say oh, I also want to have this path object And you do this with the letter at here and then you are able to give this path object a label You can say okay all this this path describes local people and it has a certain distance Where do you get the distance from? This you do in the match clause you can specify a Variable for all the costs so the when then the path is matched and this variable see here would be bound to the To the costs of the path. What is the cost of the path? Well that? Depends if nothing special is Specified then it's just the hop distance so number of edges And as we will see later we could also have a weighted cost An additional thing of decoys that when you're querying for paths, it's always shortest path semantics That is a crucial property to make the query language tractable so with other Path semantics you easily get a query language is not which is not tractable anymore But what is possible that you would not only query for the very shortest shortest path, but for a number of shortest paths That is also Possible and tractable way So that you can say in G call give me free the shortest paths For this regular path pattern So what this query basically would give us is the free shortest paths starting from a user named John Doe to other users that John Doe knows or reachable over nose edges and And who live in the same city which we have here with the extension sub query again How now can we do weighted shortest paths? So not hop distance, but a question No, this doesn't create here this doesn't create though the question is if I bound free shortest paths Would I create free edges up here? First answer is this doesn't create an edge. It's with slashes. So it creates paths and Then there are two I try to explain that there are two Ways how that can happen the first phase that you just get all the nodes and edges along these paths in your result graph But then so then you basically would have what makes the path but not a new path object To also get the path object to be able to assign Labels and properties you have to add this add sign here And when you have free shortest paths and you can figure if you would for one binding of n and m You will get three bindings of p. So on your binding table. I would have basically three tuples then Okay, so from hop distance to weighted shortest paths For that g core includes the possibility to describe path macros. So a way where you specify What the path should look like and this is done with the keyboard path and then we give this macro name and say equals And then come the specification of the path macro But the first thing is and you to specify the path macro by specifying a path step so a fixed length on Not necessarily fixed length, but in the simplest case of a fixed length path pattern from x to y to up from some node to another node like in that case. It's just one Nose edge, but it could be multiple of those and also in forward or backward direction and then you can have Further constraints on these Variables and you can also specify a cost function with the keyword cost here So that is not to be confused with the cost keyword in the match pattern but in the path clause the cost keyword Tells us what how to compute the cost and in that case we'll just say or the cost of one of these path steps is more or less the Ritzy-Prokel of the number of edges So the more edges the lower the cost and the shorter the path so to say And when you have such a path pattern then of course you can query again for shortest paths over that pattern and Then you would get weighted shortest path semantics, so it would actually find in that case all People or that the sort of city that the socially closest people to who John Doe knows And who I have also share an interest in into the composer composer Wagner in that case So you can do with that you are really able to express analytical query so to say and Here in the construct clause what we do is We have to add again. So these paths we find are actually added as objects to the output graph and how that it look like out does that look like is that we have the original graph from the view and additionally Objects this which specify us the paths we have find with the query Okay with that I'm More or less done. So there are what what are things that we think are missing for now of course to be really useful a query language must We also be able to sort of Go out to to existing data pots which are usually relational data So you want to have a way to also have relational data tabular data as a result to be able to feed Tabular data into The query mechanisms as head over these these are Things we have discussed but not considered in detail yet Okay, what is the takeaway on the takeaways that G core is a query language proposal first of all that which tries to Propose their graph query languages should develop to What are desirable features? And it's designed in a way that is closed composable Intractable so in that sense you can argue that is that it's well designed It's it's and it's the outcome of two or two and a half week years Of work by a mixed group of people From academia and industry so it really combines and in that sense it's sort of the first proposal doing that it combines two Domains of experiences into One design so to say If you're interested in LVC work there will be a technical user community meeting in in Austin and When when Sigma takes place there in June And these two links Guide you to the g-core paper and to a parser implementation for the query languages and with that I'm done and thank you for listening and I'm open to your questions So the the third question was why is I know a patchy think about people involved The the language task force is was an open community so people were free to join and Apparently no think about people Were involved Apparently they were not interested in I don't know I I think this this these are sort of question is Is this query language into interoperable with gremlin from a patchy think about and I think these The the this language like also cypher and pgql are sort of on a different abstraction level or have a different target than Gremlin gremlin really comes from a more imperative approach of specifying Traverses over pair over graphs Why here these languages? Come strictly from have having a declarative language which which builds on graph patterns. So at different Angles and a more general for a more general answer I would refer to a Project Stefan mentioned that there is work that tries to sort of Re-consolidate these things and tries to implement cypher on gremlin And if that is doable, then it would be also doable for gcore Welcome more questions Thank you very much