 Great, thanks everyone for coming. Today we're going to talk about decoupled open search. So really quickly who am I? I'm Adam, one of the lead developers at Previous Next. I've been doing Drupal since about 2010 and mostly involved in Australian government and higher education projects. So today we're going to cover the architecture of the project that I'm talking about and why we sort of chose it. Some of the search API open search sort of Drupally set up for it and then a deep add-in to the React technology or front-end technology that we used and that's my two sons. Cool, so first up what actually is open search? Well it's basically an elastic search and probably at least one time accidentally called elastic search. It was forked from elastic search in 2021 and it's developed and managed by AWS. The good news is all the sort of documentation, queries, Stack Overflow questions, all of that that you'll find for elastic search pretty much apply to open search. It was forked from elastic search 7 so you're all good there and it's an AWS managed service so you're able to run and scale a highly available open search cluster without really having to worry too much about how you manage it. So why did we go with this stack? Well we got a set of designs and they were completely full to the brim with search interfaces. All of them had different levels of content, filtering, sorting, rendering, multiple search apps on the same page, all this sort of thing. We even had stuff like landing pages where the filters weren't actually attached to the results. So a lot of different moving parts going on and these are some of the examples of the design. So we have landing pages as I said with the filters detached, we had global search results and all sorts of different things going on. We also needed to be able to have multiple apps on the same page and be able to apply filters to just one of them or maybe all of them. So how do we go about doing that? Well we went to, we went with React apps that were embedded in Drupal pages. We like to call that partially decoupled. We wanted to keep using Drupal's mighty CMS with all the fun tools that we know and love, the site building, the workflows, the out-of-the-box editor experience and just building what we know and love. But we also wanted to be able to iterate fast on these search apps and be able to build a design system that we could then use directly in our apps. We wanted really, really fast search and we wanted to avoid round trips to Drupal for querying and filtering and sorting and it also allowed us to avoid using views entirely. If anyone's ever used views for search interfaces, you'll know they're a bit of a nightmare and theming can be really, really rough. So we went with the search API open search module, which Vera and Kim Pepper wrote. So thanks very much, Kim. This just implements the back-end integration for open search, mapping your entity fields to the JSON documents that then get index and open search and manages all your index settings and field mappings and all that sort of thing. Essentially it's going to allow you to turn a piece of content into a JSON document and then that gets sent to open search to get indexed. So anyone that's used search API before would be pretty familiar with this kind of screen. This is where you're configuring your fields for your index and you're adding all your content entity fields down there. You're choosing which field types to use for each of the fields, how they're boosted and those field types are going to then determine how that content is indexed. You have things like full text, in-gram, edge-in-gram for full text searching and we'll cover that a little bit later or string fields for term filtering or alphabetical sorting. So I quickly want to cover some indexing tips and tricks and how to get really relevant results because often when you're implementing these kind of search interfaces, relevancy is one of the key criteria to get right. So yeah, we want to provide relevant results with minimal developer overhead and we want to also be able to provide contextual filters to users so they can filter those results and get more drilled down what they're actually looking for. So we went with an aggregated field. So with search API you can configure a single field that aggregates multiple fields into one. In this case here on the right there you can see we're concatenating the values of the title, the body, and the summary into one field and that's what we're actually going to use for full text search. We use the edge-in-gram data type and we'll cover a little bit more on what that actually is in a second but basically it helps us tokenize up values and makes content more searchable. And then also at query time we boost other fields in the index for better results. So again we'll cover more on that in a little bit. So what actually is an ingram or the ingram tokenizer will break up your content into words and then it creates ingrams out of those words at a given length. So if we say Drupal South with a length of two it's going to create all those little ingrams for us to search over. So you can kind of think of ingrams as like a sliding window that moves across the words and creates little sequences of characters. So on the term names for filtering it may seem a bit counterintuitive. Usually you'd use an ID to filter your content, makes it more robust, but using term names means we can use open search aggregation queries. So basically what that allows you to do is get a list of possible filter values directly from a single query and then you can use that to render out your filters. Again we're going to show more on that in a second. The caveat there you might be already thinking is well if I change a term name, that term name then has to be updated on everything that's tagged against that term. Thankfully search API actually has a little setting that makes, that gives you that for free called track changes and reference entities and it works flawlessly, haven't really had any issues there whatsoever. So this is an example of an aggregation query in open search. So we're saying give me an aggregation called organization size and aggregate on the field organization size. That's going to respond with a organization size aggregation. Again you can have multiple aggregations in one and that's going to give you back a list of what open search calls buckets with the term name and also a count of how many documents are tagged with that term. So really handy for building filters. And you can use these aggregation queries alongside filters as well. So you can build up contextual filters using this pattern and really praying to the demo gods that this works. But this is a little demo of our filters in action or the whole app in action really. It's going to work. Watch this. It's a live demo. So you can see as I go back to one, yeah, one really good thing to note here is as I go back to filters that I've already applied, the response is almost instant. And that's using a library called Tan stack query which we're going to cover more in a second. I'm going to oops. Play that really. No, I'm not. Never mind. Basically though, as you return to filters that you've already applied the response is almost instant because you're just getting cached results based on the query. So how do we get relevant results? We said, okay, we're aggregating all of our content into a single field. But what if we want to say boost content higher if it's got the search term in the title field or some other field? We can index those fields that we want to boost separately. We can index them as full text and then in search API is field interface that we saw before, we can select a boost value for that field. And then we can use an open search query called a should and we can combine should and must queries into our search into our search query. And that will basically give us more relevant results. So having a look at a full query. If you didn't know when you query open search, you just post a bunch of JSON at it and it gives you JSON back. But we'll go through this in a little bit more detail. So up the top, we've got our must query. So every thing that returns from the search search query, we're going to only match things that have COVID in the aggregated field. And in this case, we've called it text. That's what that little key there is. The next part, the next chunk of this is filtering on a term name. This in this case, it's the resource type field. And we're filtering on policy policy and legislation. And lastly, we have the should chunk, which is what I was mentioning before. This is going to say boost any documents with the word COVID in the title for text field. And what that's going to do is increase the relevancy of results that match that, but it's not going to filter out anything that doesn't match. So really, really handy for getting more relevant results. Cool. So that kind of finishes up. No, there's one more. So we want to use a query builder if you were like, what the hell is that big giant JSON blob? It's might be pretty hard to handcraft that yourself. So we use a query builder called elastic builder. Again, this works fine with open search. This basically lets you build up a object that represents your query, and then directly cast that to a JSON object that you can post directly to open search. All the links for that are on the slides by the way will be at the end. And these slides will be uploaded so you can get them from there. So looking at what a query builder function might look like, this is basically what we do might be a little bit hard to read but in essence, we build up a list of filters, and we add on our full text query, we add our should fields, and then we use that library, the request body search function from that library, pass all that stuff to it and then just call to JSON and that gives us the query object that we just post directly to open search. That ends the little tips and tricks part. So how do we actually get these apps into Drupal? Well, we just use layout builder. And this is an example of a landing page that we've got on our site. We've got two blocks embedded in the landing page. We've got the header block at the top. That's actually got a search app embedded into it, which is showing all the filters using those aggregation queries. Then we've got another search app down the bottom, which is showing our results. In fact, that's also using an aggregation query to show those little buttons filters or pill filters, as we like to call them. So these are just two block plugins that we've got in our project. And they are very, very simple. So we just have a build function, a little bit of config. And essentially, all we do is give a unique ID to our application. We use Drupal settings and we marry up the ID with the ID that we pass into Drupal settings. We pass in some properties that are going to be injected into our React app. So that's the heading description and index URL. And then we attach a Drupal library. And that's just going to be a very, very thin wrapper around Drupal behaviors, Drupal settings, and actually React itself. So this is an example of one of our library JavaScript files, literally just three lines or one line, really. We use this React component renderer, which is a little custom library we've built. I won't show you the code in that, but it's pretty simple. All it does is take that ID, marries up with that Drupal settings that we saw before that was injected by the block plugin, and then passes those settings as properties to a React app. And in this case, our React app is called all resources app. So on that note, we're going to dive into the React text. So this is getting into the more complex part of the talk. You definitely don't have to understand what's going on there. We're going to dive into each part individually, but this isn't just showing an example of that full React app that we were passing through into our library and marrying up with those Drupal settings that were embedded through the block plugin that was embedded in LayerBuilder. So yeah, we're going to cover all these technologies. Just a side note, I'm not actually a front-end developer, but I have learned a hell of a lot on this project. And because of how we've actually set up all these applications, I'm actually able to work on these apps without being a total expert. But big shout out to Jack Taranto for setting up this whole architecture. So first up, we talked about Tan Stack Query. This used to be called React Query, but now it works with View and other frameworks. This is how we get that really nice, those cached results when we're filtering. It's just a really powerful asynchronous state management tool for React, View and other frameworks. Out of the box, it provides really good caching via query keys. It'll do things like D-Drup requests. So if you have multiple apps on the same page that hit the same API, it's not going to do two requests, it's just going to do one, has auto retry for errors, and you can do really nifty things like updating stale results in the background. So if you do a post and you've got an updated list of results, you don't have to then, again, query those results to update your app on the front end. You can just do that in one callback and it's all handled for you. It's also got really good error handling as well. So this is an example of using Tan Stack Query and actually hitting open search. So up the top we've got, we're getting our index URL and our request body from context providers, which we're going to cover in a little bit. And then lastly, we're using the use query hook, which is from the Tan Stack Query Library. That has two arguments. The first is a query key and you might be thinking, hang on, there's an argument called query key in that array. That's actually something we pass in as a property and then we add the index URL and request body. That gives us a unique query for the combination of filters and full text search, which means our results are really nicely cached. The second argument to the use query hook is our callback, which is going to actually return the data and we'll have a look at what that looks like now. So this is our elastic fetch callback that we're using in the use query hook. We literally just get the index URL, the request body from the query key directly and just post that straight to open search and then return the results. All right, so context providers, if you saw that big application that I showed you, almost everything in there was a context provider. I mentioned the index URL and the request body before they were coming from context providers, but essentially they allow you to access data, any type of data from any component in your tree that's underneath the given context provider and you don't have to pass properties all the way through. You can use them for really simple data, like a string, which is an index URL, or you can use them for much more complex stuff like a whole set of search results. And Eric is doing a talk on understand react hooks later today at 20 past one, I think, and he'll cover this kind of thing in more detail. But we can have a look at one of our context providers here, which is the index URL provider. This wraps a huge number of our components. Basically, we just give it an index URL. When we call use index URL, we basically just get the index URL back. We don't have to pass that all around in our application. And the previous example that I showed you of the query that was hitting open search, I knew I'd do it at least once, that was actually in our search results context provider, which I won't show you because it's too big for the screen. In fact, everything in red here is a context provider. That middle all resources layout component is the only thing that's actually rendering anything in our application. Everything else is providing data that is then used inside that all resources layout component. So context providers very, very, very useful. And using a context provider is literally just this. We go use search results and we get our search results back. And any of our components underneath that context provider. Cool. So next up is getting to the sort of more limit, the higher limit of my knowledge on this topic, but I'm going to try and do it some justice. So Redux is our global store for our data. In essence, Redux has one store to rule them all, hence the DEF star. But our initial prototypes on this project actually went through that. We're going to have one store for each app. That's going to allow us to have independent search apps and we're not going to step on each other's toes if we didn't want to. Turns out, yeah, don't do that. It's a bad idea. So instead, we use something called slices, which comes from Redux Toolkit. So slices basically allow you to split that store up and you can think of it as slices of a pie. Split up the state into little features or applications and that gives you actions and reduces to fire just on those apps. So it allows us to manage that state like we needed to for each app independently. And again, Jack, thank you so much for setting all this up. This is kind of how you can think of a slice. So you got your global store, one store to rule them all, the big DEF star. It's cut up into into separate slices for each application. And then those slices can be used in conjunction with the context provider to pass down that those actions and reduces and state into the lower level components and they can all work nicely and compartmentalized. So this might be an example of one of your apps. You have this slice provider, which will cover more in depth soon. You reuse the same components in there. The results per page, a pager, filters, whatever. And that's just going to work on just the slice and that's passed into the slice provider. It's not going to affect anything else. So creating a slice looks something like this. You give it a name and a bunch of initial state. So in our search apps this might be things like the current page, how many results per page, some of the filters that are applied. You could have filters applied by default and reduces, which are essentially stuff that you're going to apply to that state. And of course those reduces can be reused across multiple slices, but they're only going to be able to apply them only to one slice at a time if you want to. So setting up the store, we just combine all our reduces from our slices and then create our store from that. And how this is all wired up and the real magic behind all this is the slice provider component. So this is a little custom context provider. It basically wraps all of our apps components. You pass it one of those slices and then it's going to give you back actions and state for the app to use. That link there is to the much more detailed blog post by Jack, which I highly, highly recommend you give it a read. It's really kind of the bread and butter of what makes this whole project hum. And yeah, it's pretty, pretty amazing. So next up, how do we test all this stuff? Testing is super important. It's pretty hard to test all this stuff into end, but we can do some pretty amazing unit testing combining these libraries. So we use Jest, which is a kind of a unit testing framework. We use the React testing library in Jest for assertions and rendering and so forth. And we use something called MSW or mock service worker, which allows us to set up mock responses for OpenSearch. So this is an example of a test for our search app. So we render the app at the top and notice we pass it a mock index URL. That's going to help us in a minute with mock service worker. We wait for our filters and our results to render. And then we just match it against a snapshot and that snapshot is stored in the code base. So things like CI or other developers will be running these tests against the same snapshot. So very easy to know if anything broke. And they're very fast. Cool. And then the mock service worker. So this is where we basically set up the MSW. So we're saying in this example, we're intercepting post requests to slash mock slash all resources. It is reject. So it's a little bit hard to read. But that was why we set up that mock index URL previously. Then we're saying if there's a eggs key in the body, if you remember way back at the start of the talk, we talked about OpenSearch aggregations. That's the key that's going to come through in a query to build those filters. We say if there's an aggregation key in the query body, we return our mock eggs. It's just a JSON file. Otherwise we just return our results. And that's going to allow that test to then render the app with mock results without hitting like a real open search instance. Cool. So tying it all together under the hood. We had a content index by Drupal with search API. We have index that's queried by React at the front end. So how does that actually all tie together and how do we keep it really fast and most probably most importantly secure? So we use OpenSearch on Skipper. So Skipper provisions some credentials for Drupal that can then write to the index. And then our React apps go via a proxy and execute searches against OpenSearch. But the front end can only read that via that proxy. So it has no right access at all. So we have the proxy for a number of reasons. It makes it really really fast because the instance almost looks like it's local. So we have like a slash API proxy. That's what React is hitting that thing gets proxied on to OpenSearch. It's secure. The front end can't write anything to the index. Only Drupal can. And we don't have to expose any OpenSearch end points or anything on Skipper. Cool. And if you want to try that stack out yourself, we've got a OpenSearch image that you can spin up in Docker or anything else. And then all you need is a little bit of engine X config to proxy your API URL onto your OpenSearch instance. Cool. So recapping all of that, we use search API OpenSearch as I said. We inject config into our React apps via block plugins that are embedded via layout builder. We query via React apps using a read-only proxy that hits OpenSearch. And we smooth all that developer experience out with some pretty nifty React tooling. Here's a giant list of links. That's it. Any questions ladies and gentlemen? You there. I think there's a microphone coming. We are using the managed AWS service. It's hosted on Skipper. So Skipper will essentially, if you're using Skipper, you can provision one of these OpenSearch classes really easily and you get all that tooling, like I said, just out of the box. So it is using AWS's service under the hood. But it does things like provision the writer credentials, all that kind of stuff for you. As you said in the presentation, you use itching ground. And what if you have a requirement that a full phrase needs to go above all such results? How do you handle it? Because worth itching ground might have more instances in another entry where it's not a full phrase. Yeah, so we don't tokenize the query string. I think that's the most important part. So when we're actually doing the query, you have to specifically set an analyzer on your query string so it doesn't break those tokens up. But your question was how do you how do you ensure that the full phrase goes above everything else? Yeah, I guess you'd have to index that separately. As I said, you can set up as many fields as you want, right? So you might have your main text that's fully tokenized and then you can boost separate fields or separate strings with different fields. The second question is, one thing I've noticed in solar at least, boost values isn't the only thing that boosts it. There's behind the scene tools that boosts it like normalization and other things which you can't directly control in such API. How do you handle it in open search? Yeah, so there is, I did actually forget to mention that, things like the number of instances of this of the search term. So if a piece of content has the word COVID and the title and the body and all over the place, that's going to get boosted higher. Yeah, as you said, and internally in the black box, so to speak. Yeah, it's a good question and it is something that you'd have to iterate on a lot. I think there is always a lot of fine tuning and fiddling with little knobs and things like that to get relevant results, like really precise. But I think as soon as we added that should boosting, our clients were much happier with the sort of results they were getting and we've only just started sort of diving into that stuff. So I think, yeah, there's definitely a lot more that we could do there for sure. And the final question is, does facets work with open search as well? Yeah, I think aggregations are basically like a one-to-one to facets, yeah, Gibran. Have you tried direct field boosting instead of shoot boosting? What's the difference between shoot boosting and the direct field boosting elastic or open search? Nope, I haven't tried that. But yeah, I think this should helps you sort of boost stuff at the query time as well. So you can have different like boosts for different queries depending on your requirements, yeah. Sorry, how do you handle sediments and stuff like that? You spoke about search conflicts and understanding what person was off to. Yeah. Yeah, so Lee actually recently built Lee Rollins built synonym support into the module. So you can configure it at an index level if you want, but there's also more dynamic stuff that you can do which we're implementing where the problem with open search as far as I can see is that you need to index the synonyms as settings on the index. So they have to be they can't. Yeah, as far as I can tell why I haven't dove into it too much more than this, but you can yeah, you index the synonyms at the index level and then when you query it'll just automatically pick those synonyms up and use them. Yeah. There's also stuff like auto search suggestions and things like that that you can take advantage of. I just wanted to ask about the architecture and security of the cluster endpoint. So you are using a query builder to create like a payload from the reactor, right? And then you post it to a proxy and I'm just thinking like does it expose where does the proxy live? Is it in the middle way of the react or? No, it's an index on our cluster. Oh, gotcha, gotcha. Yeah, yeah. And then it kind of forwards the back to that's right. Behind the scenes. Yeah. Yeah, in the regards to this search AP attachments is the response the same in terms of you know, having the flights data extracted and being indexed is the response the same when it comes to the files being attached? I assume so. Yeah, I haven't I haven't tested that module out. But yeah, as long as the that index is other extracts the that's the you're talking about when it extracts AP attachments. Yeah, extracts the PDF content and index is that yeah, I can't see any reason why that wouldn't work. Yeah. Right. Thanks. Cool. If you have any other questions later on, I'm pretty easy to spot. So just yeah, hit me up. Thank you.