 All right, let's get started. So this is actually the first time that I've actually given the first talk of the day when it hasn't been like the first day or something. So those of you who managed to, you know, get out of bed and make it here, thank you. I'm going to be talking about analytics and graph traversal with solar. So mostly graph type stuff, but I'm also going to go over some, a little bit of background, you know, in case people who don't know about solar. So first of all, those who don't know me, I'm Yannick Sealy. I originally created solar way back when I was working at CNET in 2004 about, and it was later contributed to the Apache Software Foundation. I'm also the co-founder of Lucidworks, and I currently work at Cloudera integrating solar into their big data stack. So just in case anyone doesn't know what solar is, I sort of describe it as to people who don't even know what it is at all, is it fits in the same part of your infrastructure as a database would. It's a search server. It is just based on different underlying indexing technologies, based on Apache we've seen, than other types of databases are. And it's really optimized for interactive results. And so, you know, interactive, you know, web, web scale. People, you know, doing a search or clicking a button on a web page. And solar needs to return those responses in, you know, less than a second ideally, right? Because you don't want to be sitting around waiting. So it has columns, what we call doc values for fast scans, fast analytics. We have highlighting, spatial search, fast sitting, which I'll go over a little bit. Streaming expressions, which really needs some explanation. I'll go over that, what that means as well. We have some graph search and also SQL, which is layered over the streaming expressions. So just a little refresher on a faceted search. Faceted search is really about taking a set of documents that are result of like a user query and breaking those up and categorizing them. And then giving feedback like counts based on those categorizations. So this is like one of the oldest screenshots that I did, you know, like the first tutorial I did on a faceted search. It's digital cameras, which of course are dying these days because of smartphones. But so here we have a manufacturer facet and a resolution facet. And you can see that they've been broken up. And this lets the user know that we have five Canon digital cameras and two Sony digital cameras. And so that gives the information to the user up front. They know that if they refine their search results to Canon, they're going to end up with like five entries. And then below that, at the bottom of the screen, off the bottom of the screen is like the list of top results, what you'd normally expect from, you know, ranked search engine. And so this allows the user to, you know, click on one of these categories like manufacturer and resolution and, you know, refine their search results, drill down into their search results to find what they're looking for. So we have different ways of calculating things in solar that have been developed over time. So like the first is basically faceted search version one. And that uses flat query parameters. You say, you know, facet.field equals color, facet.limit equals five. And what that means is categorize, you know, this set of results by color and give me the top five. So lately I've been working on the JSON facet API, which I've sort of referred to as the, you know, version two. And that expresses things in JSON. One reason is that it allows much more nesting facets under facets. And so it made sense to use something like JSON, which is nationally nested. And so this just does the same thing. It's a type terms on the color field. Give me the top five. Just a different syntax. Streaming expressions is a whole nother Bollowax. I'm going to get into that in another slide, but this just gives a little sample of doing the same thing. It's not quite an apples to apples comparison because this also includes the search. So we're doing a search and then we're doing a roll up over the color field and, you know, collecting the count. And then, of course, parallel SQL, which I'm not going to cover more today, but that essentially translates into streaming expressions for execution. All right. So getting into a little bit about graph. So graph databases are all about nodes and edges or relationships between the nodes. And so in this example, you know, I'm starting off with, you know, Ann is a node and Stanford is a node and Ann attended Stanford and attended is like an edge or a relationship. And Ann also attended RPI. She recommended Mike. She works at Cloudera and she lives in New Jersey. So what's missing here is properties because the fact that Ann attended Stanford, we're not going to get too far with that because we don't know when she attended Stanford and we don't know what kind of degree that she got. And so we really need some more information. And so we normally have in graph databases things called properties that we can add on both nodes and edges. Most of the graph databases that I know about, like Neo4j, allow properties to be on both nodes and edges. So here's just an example of the properties we added. We zoomed in on the Stanford relationship and we said that Ann attended Stanford and the start and the end date, what degree she got, and what the subject was. And we also added some properties on Ann and Stanford as well. But from a search perspective though, these things, the nodes and the relationships start looking just very similar to, they both start looking like documents to me because you can just put whatever properties you want on nodes or edges and so that really kind of maps towards documents in the search world because documents you can just have arbitrary properties. It's like schema free so it's very, there's more in common it feels like with a graph database and search engines than there are differences. So this is how we map from the graph world to document space or a search engine world. If we have a relationship without any properties, then we can simply model that implicitly based on matches in field values. So at the top is an example of that. Node one points to node two. All we really have to do is have something on node one that contains the ID of node two or something unique that somehow points to node two. And so this is not an explicit edge that's indexed or anything like that. It's just sort of implicit and we can use that relationship to the fact that those values match at query time. So when we have relationships with properties, that's when it just makes sense to treat that relationship also as a document. So both nodes and edges or relationships, everything's a document. And the actual edges between those or how the relationship is encoded, again it's just implicit based on mapping or matching of the field values. And so you can do that one of two ways. You can do it on the left-hand side. We have our nodes pointing to the relationship document. And on the right-hand side, we have the relationship pointing to both node objects. So you can do it either way or a combination thereof. So going back to and attending Stanford again, this is how we model that in a solar or in a search engine. So on the right-hand side, we have a document that represents Stanford, a document that represents Ann, and then we have a attendance type document. Now we added this type that's not required. It just helps disambiguate between like different types of records. They'll probably make querying easier later on, especially since we sort of reuse some of the fields. So we have a name field for Stanford. We have a name field for Ann. We could have used different field names if you'd wanted like person name versus edu name. It's really a matter of preference. And so you can see our attendance document. We have a who that points to Ann and a where that points to Stanford. So now that we've learned how to index some of this graph stuff, let's look at some different ways to query. So first of all, I'm gonna go over a graph filter, also called graph query. It does a breadth-first traversal and it's modeled and implemented as a normal Lucene query. And so what that means is that can be used anywhere a Lucene query can be used within solar. And so that's actually a lot of places. You know, the normal places, it could be like a filter, but it could also be things like a facet query. Another nice property, if it's just a query, is that it is automatically cached in the filter cache. So you could use graphs, say, for setting up like a permissions hierarchy, like groups within groups within groups. You could use a graph query to actually map from a user to the full set of documents that they're allowed to see. And because it's automatically cached in the filter cache, that graph traversal will only be done every time the index changes, but then it'll be like the results of it will be automatically reused as long as the index doesn't change. Oh, and so the output, I called it a graph filter because the output is just a set of documents. There's no ranking or scoring between documents that says this document is better than this other document. All you really get is a set of documents out. And how we follow the edges, the edges are defined by the from and the to field. Those are the only two required parameters. And so we look at the from field, see what values it has, and then match those up to the to field. And that sort of defines the edges. Sort of like, so it's a query time on the fly. So we have a couple other optional arguments to the graph filter. One is max depth by default. It just keeps going until the set doesn't change anymore. We have a traversal filter though, which is really useful. And that says as you're following these edges for nodes to be included in the set, they must follow, they must match this filter. And the filter is just expressed as another arbitrary query. It could be another graph query even. And return root and leaf nodes only, those are just booleans that tell what final set of documents, you know, whether the root is included or not. So the big caveat though to this graph filter graph query is that it is not distributed. And what I mean by that is that edges are not followed across shards. If you've broken up your collection into multiple shards, the graph query will only follow edges in each individual shard. It won't follow edges that cross shards. It is still compatible with distributed search in that as long as you can live with that, then everything, you know, however you're using the graph query as a filter in faceting, whatever, distributed search will still combine results and everything else will still work. And so this is still a really useful query because sometimes you can partition your data such that the traversals you're doing don't have any cross shard edges. Or if they do exist, you might just not care. It depends on the type of computations you're doing. So here's a Futurama example where we're trying to use a graph filter to find the full set of fries ancestors. And so we've modeled people here. We just have an ID field that's essentially their name. And then we have a parents field that just lists the names of their biological parents. And so our graph filter, we are, how you read this, right? It's from parents to ID, starting with Philip J. Fry. And so we start with Philip J. Fry. We look at the from field. The parents values are Yancey Fry Sr. and Mrs. Fry. And we match those values up with the ID fields of all documents and that basically visits his parents. And it just continues from there until the set doesn't change anymore. Now when we get to Fry's father, Yancey Fry Sr., we look at his parents' field and we see something odd pretty quickly at Philip J. Fry is his father. And so if you're a Futurama fan, you might remember that Fry is actually his own grandfather just due to some time travel weirdness. And so we hit a cycle, right? If we were actually following this iterative thing, we'd like get into a loop. But that's okay because we have cycle detection and so it actually does not get into an infinite loop or anything, it all just works. And so after this whole thing keeps traversing, the set doesn't change anymore and we have the full set of Fry's ancestors. And then we can use that with whatever we want. We can calculate the average IQ, I guess, of Fry's ancestors which I'm not sure how high that would be. Okay, on to streaming expressions. So streaming expressions are relatively new to the solar world. I think they started really coming on strong in solar six even though they were introduced a little bit before that. And it's really a new way for solar to compute things. It's a generic platform for distributed computation. It does form the basis for our SQL support. We use Apache CalCyte and translate things into streaming expressions to actually do the execution of the SQL. So like the streaming expressions is really about relationships on streams and it is really optimized and to work across entire sets of data. So this is a little bit different than how solar's worked in the past which is that's optimized for finding the top end of things. Streaming expressions normally look at all the data or all the data matching something. So we have like a MapReduce type shuffle and so if you remember some of the MapReduce word count type things, instead of querying a bunch of shards and having a single internal aggregator for increased parallelism, we can do partitioning and have multiple internal aggregators doing whatever computations the streams are doing and then have an additional level after that that does the final merge. And this can also incorporate data from non-solar systems so it's a little bit less document oriented than a lot of the other previous work in solar. So you have a JDBC stream that points to a different external database and then basically do joins or whatever type of computation with solar results as well. So here's an example of a streaming expression. It's one of the most basic fundamental streaming expressions in that it searches solar. And so it's like, you know, we call it a stream source. It's one way to like, you know, kick things off, you know, a source of data. And so instead of hitting the select or query URL, we hit the stream URL. So it's just the same exact URL. It's just the end part is slash stream. And so we pass that expression equals and then our search expression, it has sort of a functional notation. The first parameter is a collection name and then we have our normal search parameters. If you're familiar with solar, you know, the query, the field list that we wanna return back and how to sort it. And then we have our tuple list as a response and that, you know, that looks very much like a normal solar search request even though it works very differently under the covers. So here's more like a logical diagram but like what the search expression is doing. It's fully solar cloud aware. So when you say search this collection, solar knows, you know, what the cluster layout is, what it looks like, how many shards there are in the collection, how many replicas there are for each shard and the physical locations of those. So it can just go and query one replica of each shard and then stream the results back to a worker node who that's just really another, in this case the worker node will be the endpoint that you hit with slash stream. And then it produces the final result. So we call this streaming because we try to compute things as we're receiving data. So the worker node in the middle, it's receiving documents from all the replicas that it queried at once but it does not fully read all the data before it starts producing its output. Because we take care to like sort the subtuple streams correctly, we can just do like a merge sort and really we just have to see the first item. We just have to have one item on each incoming queue before we can select the next item that should be on our output queue. And so when you build up streams of streams of streams and wrap them, as long as you take care to sort things correctly and have things sorted correctly, we can do things in a streaming computation manner and that really helps with scalability. So you can go from like a million tuples to a billion tuples to a trillion tuples and what's going to increase is the execution time. The amount of memory from that worker is not gonna increase at all. So you get scalability in one dimension at least. So graph streaming expressions, there are a lot of streaming expressions that are joined streaming expressions. But I'm really gonna cover just more graph oriented ones today. So the graph streaming expression does a breadth first search and since it is just another streaming expression, it is fully distributed. It follows edges across nodes, across shards and even across collections. So you can just go from, and I think even across different solar cloud clusters if you really want. And it can compute aggregations as it goes. So here's an example of the most basic graph streaming expression, it's called gather nodes. So we hit the stream URL again, say gather nodes, the first parameters emails, that is just going to be the collection name and then we say walk from a literal John Doe at Apache.org to the from field. So this is just our root set and then we're gonna gather the to field. So really what this outputs is just anybody that John Doe has emailed. So for an actual, a real example with a small data set, sometimes I like small data sets because when you try this stuff out, if you try it on a very large data set, you often don't know if you're getting the correct results, right? You get a result and you're like, I don't, is it doing what I think it's doing? I can't tell. So having a small data set first that you can actually do the computation yourself is really helpful. So this is gonna be about books and book reviews. So we're just gonna index a bunch of books into one collection on CSV format. It's random books that I've read over time. And we're also gonna index book reviews into another collection. And so if we just look at the scheme that we're using here, for each book review, we have a book field, book underscore S, that matches the ID of the actual book. And so that's sort of like the implicit pointers to what book is this review for. And so now that we have book reviews, we come up with this brilliant idea that's actually not so brilliant of how to do a recommender. So this is actually not, you should not do a recommender this way. This is really just to illustrate how you use some graph queries. So the steps that we wanna take are like number one, find books that I like. And we're just gonna define that by like having a rating of five. Step two is we're gonna find other users who rated that, who also liked those books. And we're gonna define that as like having a rating of four or above. And then step three is just find other books that those users have rated well. So pretty simple. So if you actually look at the data, Haruka and Maria actually rated same books that I liked. So they should be the output of step number two. And then for step number three, only Maria rated another book highly that I haven't read. And so that would be book 10, grid length. So that should be the output of our algorithm. So let's actually do it. So step number one is a search expression to find my high ratings. So this actually isn't graph related at all. This is just a normal search expression but we need a root set to start with. So yeah, so we're searching reviews and the query is user Yannick and rating five. Very simple. And we get back the books that I've rated highly. Now for step two, we're gonna wrap that search expression into a gather nodes expression. So that the search expression there is just a copy and paste from the first one, the previous slide. And the next parameter is we're going to walk from book to book. And so what that is gonna do is that we're starting with my reviews, my target reviews, and we're expanding to all reviews by those same books. So it's like a self-join on the book field. And we're gonna gather the user field. And so what that does, that gather of the user field, that makes the node value in the response below be equal to the user field. And so it's sort of like the virtual, we didn't actually index any user objects. So these are like virtual nodes. We're on like user nodes now that we've just sort of invented by gathering the user field. But we're not gonna gather all the user fields. We're only gonna gather the user fields on the reviews that pass the filter because we specified a filter. And that filter is, well, it has to have a high enough rating for two, four and above. And it has to not be by me. And we've also added an optional parameter track traversal equals true. And that just adds in the response that ancestors field, ancestors that says where we came from to get to our current tuple set. And so now onto step three, we take that whole streaming expression from the previous slide and we wrap it in to another gather nodes on reviews yet again. And this time we walk from node to user. And so what this does, if you recall the past, the past slide, the node field was user names, our target user names. And so this actually matches up the nodes to the user fields. And what that does is it selects all reviews by those users. And then we gather the book field because that's what we actually want out of this whole thing. What book should I read? But we only do this if the rating is four or above. And then we also ask for the average rating of all the incoming edges and track traversal true again. And at the end of all this, we actually get out what we wanted, book 10 and we say that it came from, the ancestor was Maria and it worked. All right, now if we want the complete traversal, we can add this optional parameter scatter and say branches comma leaves and then that will give us the complete graph traversal. And so then we can see we started off with level zero with books and level one with our target users and then finally level two back to the book that was recommended. So here's how to do this quickly, how to do the same thing with the graph filter if we didn't want to use streaming expressions. Now to use this all of the data would have to be in the same shard remember because we're not going to follow edges across shards. And so this is actually a single query that uses parameter dereferencing to sort of break it up and make it a bit more readable. So our main query is a graph query or a graph filter and the input, the root set and what it's operating on is V and that is dollar sign G1. So referring to something else, G1 is yet another graph filter. Its input is dollar sign Q1 and Q1 we finally get to our real root set and that is user yonic and rating five. So that's the reviews that I, the books that I liked. And so then back to G1 we walk from book to book and so that selects all the reviews but we select all the reviews of those books. We specify a traversal filter though and so the rating has to be high enough. And so then the output is all reviews of the same books and then we go finally to the first graph or original Q or original graph query where we walk from user to user and that basically does finds all reviews by those same users from the output of the previous graph query but again we passed the same traversal filter it has to have a high enough rating. Anyway, if you execute this whole thing you'll get the same output and why you might wanna do the graph filter is that it does all its computation internally and so it's gonna be faster. So if you put an excreaming expression into the solar admin you can execute it from there and you see the output. It'll also give you like this little logical diagram. This is not a diagram of the results it's a diagram of the query structure itself. We have a few more graph expressions. The first one is shortest paths where you can just give two nodes and we'll do a breadth first search to find the shortest path between the two nodes. Score nodes is an interesting one and might form the basis for an actual better recommendation system. And so it's like say that you put an item in a shopping basket and you wanna know what other items should I suggest to the user. And so the score nodes has a TF IDF inspired scoring system. If you remember TF IDF, TF is simply, well you know how many times does this search term appear in the document. But IDF adds this sense of importance for when you have multiple terms which one is more important than others. And so if you're searching for something like blue whale in a large corpus IDF inverse document frequency will give more weighting to the rarer term. And so if you see blue whale you're like, well probably whale is the more important term than blue because blue is just gonna be just more prevalent in a big corpus overall. And so how the score nodes works is it wraps a gather nodes and that calculates cocurrents count how many times every item appears with this other item, right? But it also adds in this element of rarity. So if something appears a lot in a lot of shopping baskets and something only peers a few times it's gonna weight the rarer one more all else being considered equal. So another thing we can do is when we have our whole streaming expression that we built up if instead of passing it to the stream for execution if we pass it to graph, graph will do the whole execution but instead of outputting the tuple list it will output graph ML and which is just like an XML standard for representing graphs. And then what you can do with that there's a number of tools that can read that you can import it into Gefi for visualization or more analysis. All right, so now let's look at some of the same type of computations or things we can do with the JSON facet API over the same set of data. So the JSON facet API is also it's types of graph operators or join operators are currently not going to cross any edges. And so for this what we wanna do is index books and reviews into the same collection. But that doesn't mean we have to throw away scalability entirely. We can still have multiple shards. We just need to make sure that the book and all of its reviews are indexed onto the same shard. We can still have multiple shards. And that will just eliminate if we're just joining between and traversing between a book and it's reviews back and forth that'll just ensure that there's no cross shard edges. So the composite ID router is really the easiest way to do this. And it's actually the default in Solar Cloud. So if you just create a new collection you'll get the composite ID router by default. And so how this works to co-locate documents together is you just use a common prefix in the ID field. And so if we just have ID book one for example, we compute a hash and there's a hash ring that falls somewhere on the hash ring. But if the ID has a bang in it then it will take the first part of the ID and just use the top 16 bits by default and then take the second part and use that for the second set of 16 bits, the lower set of 16 bits. And so what that does is if the book ends up on the hash ring say here then all the reviews will end up surrounding it into the same 16 bit block. So the same 655 3 6th of the hash ring. And so what that means if you have less than 655 3 6 shards which pretty much everybody does I think you're guaranteed that all those documents will be co-located. So this is just a little refresher on the JSON facet API and how we some terminology, how we think about it. So we start off with a root domain. The root domain is the set of documents normally that just match our base query in the filters. And so then a facet command it takes a domain as input and it just produces more domains as output. So a field facet is gonna create new domains or we also call them facet buckets based on unique values in a field but we can also do it based on ranges or queries. So we also have the concept of sub facets. So we've taken our root domain broken it up by facet A and facet B but facet C is a sub facet of facet B. And so what that means is that for every domain produced by facet command B we're gonna run that through facet command C to produce a new set of domains that normally further split things up. So this could be, we could first categorize by if we're doing cars like by make or model and then maybe by color, however you wanna like split things up. So here's an example of a JSON facet API on our books data. And so our root query is cat colon star. I'm using cat as the category field that's the genre field essentially so its values are gonna be like sci-fi, fantasy, romance, mystery, whatever. But I've only read really sci-fi and fantasy books and so that's why there's only two buckets in all these examples. So number one we wanna know the number of unique authors. And so then right in our root facet bucket we're asking for HLL of the author field. HLL just stands for hyper log log and if you haven't heard that before it's really just a distributed cardinality algorithm. And it is, if you go cross shards it is an estimate. And then we're asking for a genre facet. The type is a terms facet and the field is cat and so that will basically make facet buckets or subdomains based on unique values in the cat field and then in our response we have a count of 13. That simply means that there were 13 books coming into the root domain. We have our num authors that was calculated to be five and then we have our genres facet and we have a bucket list and the bucket list has, well now we can see that we have seven fantasy books and six sci-fi books. So jumping ahead a little bit, I'm gonna go over domain changes and join. So this is something that was just very recently committed. And so let's say that we start off with our genres facet again. It's a terms facet on the cat field. And so now we stick in this new part, a sub facet in blue and we say, okay, we're doing a reviews facet. Type is query. Now a query facet normally takes the input domain, applies a filter, and that's the output domain. So it's a single domain to single domain just essentially applying a filter. In this case we're not even specifying a filter. So the input domain, the output domain is going to be the same as the input domain but we're doing something different in this case that you can do with any facet now and that is we're specifying a domain change and the domain change happens before we facet that domain. And the domain change is saying join from ID to book. And so really this is sort of more like if you're familiar with it, solar pseudo join or a single step graph traversal. And so we're going from the ID field. Remember so we're faceting by, we're starting with books. And so we have the ID field of the book and then we're moving to documents that have that value in the book field which means it's gonna be the reviews if you remember our schema. And so essentially what this is gonna do is the top level thing is gonna produce a bucket for fantasy books and that domain change is going to switch to all reviews for those fantasy books. And so then we get our additional data on the right hand side under reviews, it's colored in blue. And then we can see for our seven fantasy books we have seven reviews and for our six sci-fi books we have five reviews. And so now we can ask for some additional data once we've done that domain switch. And so we add an additional sub facet to the reviews facet and we say, oh, give me the average rating for that bucket. And then you get the average rating as well just stuck in there. So that's one of the nice things about the JSON facet API is the structure is more canonical than it is with the old faceting stuff in that if you ask for additional information the basic structure of the response does not change. You just add some additional things. So say that we wanted to find who gives the highest rating per genre. So to answer that what we'd change the reviews facet from a query facet to a terms facet on the user field. And we're specifying that we want to sort by the ratings descending and get the top three. But we still have our domain change in there. So that happens first. So for all of our fantasy, our fantasy buckets say we switch to the set of fantasy reviews. And then we break those up by user. And so then on the right hand side you can see that our reviews facet now has a bucket list. And the bucket list is the ratings, the average ratings that we asked for and the buckets are sorted by that we've taken the top three. And just another like random thing that I thought of that you might want to do if you wanted average rating trends over time. Well then instead of doing breaking it down by user you could change the type of facet of our reviews facet to a range facet on a review date field. And you can give the start and the end and the gap. And that's just gonna create buckets based on time range and give the average rating for each bucket. So you can sort of see whether fantasy or sci-fi or whatever genre is trending over time. So this data is actually the first one that I made up because I didn't originally stick in any review date. And so for the response that's a real response but they were all zeros and so I just typed in some random answers. Maybe for some future version of the talk I'll go back and fill in a real review dates. Yeah, so and then one final slide wrapping it up on streaming expressions versus JSON facets since these are like two different ways that you can calculate things. So on the JSON facet API it's more focused on web scale interactive responses. So remember to cover streaming expressions how you can automatically parallelize things to really use the whole cluster resources to compute your answer faster. But if you have thousands of users at once hitting your cluster that's gonna be a very expensive solution, right? You really want to use also if there's if it's not just a couple analysts like banging away at a keyboard you really want to use your resources efficiently as well. It does have the JSON facet API has like tighter integration with search components in solar it's just another search component. So for example, if you also want highlighting back at the same time or anything like that right now it's a better solution. And like I said it is more oriented towards very quickly finding top ends of things. Top end facet buckets, top end documents. And it currently has a couple capabilities that the streaming expressions don't have yet like the ability to do block join and nested document support. So streaming expressions though I think hold a lot of promise. They're more general purpose, more larger in scope because you can wrap streams within streams to do pretty much anything. That really doesn't say how long it's gonna take but it is very powerful. In the past we've like really only focused on things we could do very quickly and streaming expressions is much more general than that. And so you start, I guess one way to think about it is like OLAP versus OLTP. They obviously overlap a lot but it's in general that's sort of how they differ. So it's also, I think I mentioned before, it's not streaming expressions are not really tied to documents. So you can get data from an external data source and treat it like just any other stream, do computations, joins on it. We also have things like update streams where you can stream, get your results and stream them into an index into another solar collection and use that as a data source in the future. Machine learning streams, we just started adding a whole bunch of math operations so you can do things like convolution and cross correlation and you can do some things like exact, calculate exact cardinality. Like the HLL is an estimate once you start crossing shards but if you're streaming all of the data past a single node, you can actually do exact cardinality counts if you care. We have distributed joins, distributed graph and increasingly too it's working with the JSON facet API. And so one reason why you'd wanna use say streaming expressions over like some spark stuff in general is that we can push down more computation into the nodes and really use the power of the Lucene indexes that we have. Just like essentially smarter endpoints. Do more calculations first all in memory with the fast indexes and then delay as much as possible shipping out like the results to do more generic computations in streaming expressions. All right, so that's the end of my presentation. I think we have some time for, yeah, we have a couple of minutes for questions. Yeah. Is there anything stopping you from doing a streaming, let me make sure I get it right, shortest path queried with also external data like mixing the solar data and external data? That's a good question. And I had it myself, it's something I haven't investigated yet. A lot of the time it seems, we're visiting nodes and then the gather, I think it currently is based on picking out, I think it's currently based on search within solar itself. So currently I'm not sure if that will work for many parts of it, but there's no reason it can't though it should. So I think that if it doesn't, so the places where we don't currently support it, I think we will in the future. Yeah. Any other questions? Yes. The streaming expression is a little bit accessible. So yeah, so the question was, streaming can access faceting, in the future will faceting access streaming? Maybe. Yeah. Maybe. I think to not it's, we can't do it currently, the JSON facet API can't like at some level all of a sudden kick off a streaming expression. It's the JSON API is really, like I said, it's oriented towards like being really fast. And it's like currently like two phase. So like, you know, phase number one is gather the top buckets, phase number two is do refinements so you get actually accurate counts. And that second phase is even, that second phase is optional. But it's so going in any of those places, going out and kicking off a full streaming expression seems really heavyweight. But just from, just I would say yes, in the future I don't see any reason why we won't be able to, it just hasn't been done yet. And I'm wondering, I'm not sure exactly what the interface will look like. Maybe we'll wrap up streaming expressions into a type of query. No, I'm sorry, that wouldn't work. So the problem, JSON facets today are really oriented towards documents. Every domain, subdomain that you get out, those represented as bit sets about, and they map to documents in the index. And so to actually really utilize streaming within that, we'd have to expand the internals of JSON faceting to work on values. Because at some point you go to a streaming expression, it's pulling in external data, you're just getting tuples now. These don't have any correlation to what's in the index anymore. And so that's more like generic faceting. And so while, yeah, so maybe it doesn't make so much sense. JSON faceting would have to expand a lot to actually make that work. Great, any other questions? All right, cool, thanks.