 All right. Good afternoon, everyone. I think we'll kick things off post-lunch. This is Next Level Search API. So tips for custom search in our beloved Drupal. Let's get started. I'm Saul. I am a Drupal developer at Previous Next, or PNX, as I'll probably refer to it from here on in. I've been doing Drupal since 2010-ish, and helped found Drupal Gold Coast, and Drupal Chili in South America. I honestly remain inspired by what the community to do is creating, and it's probably why I'm still here today. I live in the northern rivers of New South Wales, back across the ditch in Australia with my wife and young kids. We're building our own house. We live off grid with a spring-fed dam for water, solar and batteries for power, composting toilet, and starling for internet, which you might have to make out in the picture, which has changed our lives. This is all possible thanks to Drupal, and PNX's distributed team. To side note, we are not currently hiring, but probably will be in the future. So keep an eye out if you'd like to come and work with inspiring teammates. What are we going to talk about today? So software as a service, or SAS, has many great out-of-the-box features for search. There's a slew of examples, things like Sejari or Algolia, Google Programmable Search Engine, Swift type, to name a few. They're awesome tools. However, sometimes they can be one of two things, costly, and you might hit feature limits. So implementing your own search can overcome these. Today, we're going to cover a brief intro into search and Drupal in general, dive into Drupal Search API in particular, going to assume some familiarity with Search API, or SAPI, as it's abbreviated to. And then we're going to go into four specific areas of SAPI in-depth. I'll give you some tools to master your own custom search. Search in Drupal has been around for a very long time. I found the first reference to it in Drupal 3. So it's probably there all before, which is 21 years ago. It's fine for small sites, but it honestly hasn't evolved very much. And it has a relatively limited functionality. So that's one of the breeding points of why there is the Search API module, which is, it offers a framework for extending search and allows searching essentially all Drupal entities. It offers things like filtering and faceting, full integration with views. So it supports multiple backends. So Apache Solar would be an example, elastic search, open search, Luna, and a long list of others. So we're going to dive into features of a few different backends, which brings us to Apache Solar. So Solar is an open source project managed by the Apache Foundation. And it's an interesting project. It's been around for a long time, and it's very mature, which is good. There are also SAS hosting platforms for Solar, so you don't have to host at your own. But we'll primarily be focusing on your hosting your own Solar here. So it has excellent integration into SAPI. It's probably the most supported SAPI back end. So we're going to dive into two less known features of Solar. So Solar can index data to be searched from multiple sources. It's an interesting concept. So for example, multiple subsites could be sending data into a single index. These sites can be on different platforms, so legacy Drupal 7, WordPress, Namio platform. Results then can be searched seamlessly from one search interface, such as from the main or parent site. So this is a potential SAPI configuration for this. You'll see that there's two Solar servers. The first is for the main site content with an index of normal Drupal 10 content. And the second is a Drupal 7 specific Solar server. And external Drupal 7 sites will be sending their index content in Solar document format into that server. So pushing content into the index. This is a wonderful Drupal 7 site implementation of Hook Search API Solar document solter. The first Drupal mouthful of the presentation. It's simply looping over each document to be indexed. So a Drupal 7 node in this case. It builds a rendered item using the node title and summary, absolute URL to link to the full content. So the output is themed. But it's important to note here that it's just HTML, which Solar can consume and add into its document index. So finally, we call add field on the document to be added to the rendered item. So at the same point, additional fields could be added, like title or taxonomy in index separately as needed. So this is obviously Drupal 7 code, not all that sexy. But there are corresponding event subscribers for Drupal 8 and above. And other platforms like WordPress could build the Solar document and send it into the index as needed. So now we're ready to display the search results. So all the index fields that we just added are available for processing as normal in SAPI. So you can apply a boost, for example, to a title field. There's an important option on the server here to make this sort of work, I guess, is to enable Solar to send the full search results back. This allows our index Solar documents to be returned. So fields are then available for full text search and views. For example, here we've configured views to search a title, a body from the Drupal 10 content that we saw before, the index, as well as the rendered item and the taxonomy title, the Ngram fields that are coming from Drupal 7. The upside of this is the results in a single search interface, which is seamlessly showing results from across the multiple sites. Next up, number two, we have facets. So just briefly, facets allow users to narrow results by applying filters. You can think of them as checkboxes. A classic example is the amazon.com left-hand interface where you've got all those filters to allow you to drill down into the products. These filters often map to taxonomy terms in Drupal. So facets are very well supported in SAPI via the facets module. We had an interesting feature request from a client. So when facets were active, they shouldn't limit the list of other facets. All facets should display at all times. So we found a lesson on Solar Feature to do this. And it was pretty nifty in that it did it all in one single query. The concept is a bit tricky. Just go through an example to help us get it thrashed out. So here are simplified facet requirements. Got a list of content displayed in images, in this case. We've got two filters or facets on schools, on programs. Sample values, school 1, school 2, program 1, program 2, et cetera. So each facet then displays the count of items. For example, there are six pieces of content tagged with school 1. So up to this point, it's all standard facet features. The difference comes when a facet is activated. So here, program 2 has been clicked. So the facets with zero results are still showing. So in normal faceting, these results would not be shown at all, as they're not part of the query results. In this case, school 1 and 3 wouldn't be in the list at all. Instead, here, inactive facets, or ones with zero results, are going to be grayed out. So keeping them in the result set alerts the user to the full set of available facets. This was the client requirement. They wanted users to know about school 1, 2, 3, even if they drilled down into the facets. So the client had three facets and around a dozenish terms in each, so this approach might not be ideal for a big data set. Again, the goal here is to surface all the options, even when the facets are applied. Let's look at the implementation. So when adding our fields to the SAPI index, we add two taxonomy terms, fields, for program and school. We'll alter these later to exclude them, so they'll return the results as if no facets were applied. We then add a second fake filtered field for each term. For these, we'll tell Solar to run the regular faceting on them. Let's look at what we're trying to achieve with the solar query itself. If you've never seen a solar query, this will probably be complete gibberish, which is understandable. But basically, we want to tag the filtered program field. Here, that's being filtered on a taxonomy term with an ID of 123. So then we mark it as excluded, and when the unfiltered version runs, we're sorry, we mark it as excluded when the unfiltered version runs. So that's the theory. How do we do this in Drupal? In our custom code, we will implement hook search API solar query alter, second Drupal mouthful. We need to do three things to wire this all up. So first, we mark the original facets as excluded. So that is to say that we don't want to filter on these. We want them to always be present like there's no filtering applied. Next up, we create a second set of fake facets without the excludes. And finally, we set the fake facet option to match those of the real facets. So massive hat tip to my colleague and generally awesome human being, Lee Rollins, who found that this was actually possible and helped a lot in implementing it. This is a bit of an edge case of a client requirement, but it's likely something that couldn't be implemented on a SAS search platform. Next up, we have Luna. So the original slogan, which I still enjoy, is like solar but not as bright. They've sort of tamed that down a bit and a new slogan is search made simple. They're both a good summary of what Luna can offer. It's written entirely in JavaScript. So the search runs on the client side or it can also run on the server side in Node.js. So the Drupal implementation extends search API. It was written by former Pynex colleague, Sam Becker. And I have a soft spot for it as they maintain it now these days. So the module itself takes a lightweight approach and is basically probably feature complete, but obviously patches are most welcome if it's your thing. So the Drupal implementation runs a search on the client side. When you begin to type, suggestions for matching pages appear immediately. So this can be implemented as an autocomplete to show the results as you type. It can also be implemented as a normal search page where query is submitted and the teaser results are displayed. The back end configuration is basically like any other SAPI search provider. So it allows configuration of searched entities and which bundles you want to index, configuration of the fields you're indexing, boosting, and also processing pipelines can be applied to each of the fields. So the interesting difference here is the SAPI index processing, which is what is running here in this batch, is creating a collection of documents. So these documents are sent as bundles of JSON to the client where their browser builds the index. As a result of that, there's no JS dependency on the server. So this means no build process. So things like regular or even schedule content changes are indexed with no latency. If you've ever deployed something like a statically generated site with no JS or its friends, you'll know that this build time and associated latency content updates is quite a big deal. It's essentially a problem that doesn't exist with this implementation of LUNA. So it also has fuzzy search, partial matching. It effectively gives you an Ngram-like matching system supporting matchings on misspellings, word fragments. For example, you can see the misspelling still match results. In practice, this matching is not going to be as configurable as something like you can do in solar, but gets you 80% of the way there with basically no setup. So it's a nice little win there. There are certainly some important considerations when using LUNA. So there's no cost incurred for any session that doesn't actually make use of search. This is because the index is only sent to the client when they first interact with the search. Still, it can be worth optimizing by creating a lightweight index of document titles for the autocomplete with a separate larger index, only used once a search form is actually submitted. And that would be indexing things like potentially the body field or larger chunks of content. There are definitely some practical limits to require the client browser to build the index. It performs well for hundreds, or even up to a few thousand documents in our testing. So it's probably not going to be a good fit for larger search indexes. There's also a cross-building JavaScript API which allows the front-end to query the index with no reliance on the Drupal backend, which is obviously quite useful in any decoupled situations. Nice. Finally, we have open search. So this is an open source software stack, a suite for search. It is a fork of Elastic and Kibana. And it was created by Amazon in 2021. There are two sides to, I guess, every fork. It's essentially AWS opted to go its own way and create open search. If you're familiar with Elastic, then it largely maps to open search. The project is under the Apache 2.0 license, which bodes well for the maintainership of the project into the future. The Search API Open Search module is maintained by my colleague and boss, Kim Pepper. You guessed it. It's a fork of the Elastic Search connector module. So again, it has all the regular integrations for indexing, field mapping, views, et cetera. It supports facets, more like this. Synonyms support. Boosting is supported at index time and at query time, which is actually quite a powerful feature. And it tweak your results. So recently, we added a search as you type field, which provides a really simple way to get set up with an autocomplete functionality. There's no need to configure ngrams and the like. It can be a little bit daunting at times, because it can be quite fiddly to set up, though there is also full support for ngrams and edge ngrams if you need them. It also supports a did you mean feature, which is quite nifty, to suggest corrections for misspelled search terms. Out of the box, it supports basic auth to connect to the back end. It provides a connector plug-in type for customizing back end authentication. So this means you can roll your own back end or connect to something like a host and service. So open search itself is JSON driven. So here's a sample query. It's simply searching on Drupal South via something called a must match query. The response returned from open search is also in JSON. It's found two results and has a title and venue field for each. So that sample query was pretty straightforward, but it can get complicated quite quickly. If those giant JSON blobs are a bit scary, it's OK. There's a tool that you can get help with. So rather than hand crafting your queries, you can use this tool. Again, it's for elastic, but perhaps largely almost entirely to open search. So this allows you to build up an object that represents your query and then cast that to a JSON object, which you can post directly to open search. Now, why all this for JSON? Can't views just do this for me? Well, yes, views can. All the regular integration of search from within views is totally possible with open search. But views for search can sometimes be a little limiting, not to mention hard to theme. So because of that, a number of our more modern approaches to search are implementing open search in a different way. React apps embedded in Drupal pages or, as we're tending to call them, partially decoupled search. So this allows us to keep all of the goodness we love from Drupal but allows faster iteration and developer control over search. Essentially, React apps are embedded by blocks, which can then be placed by a layout builder. So in a particular example that we implemented, we required support for multiple search apps on the same page, again, an interesting requirement. And they had independent filters, something that would be quite interesting to implement with views. By independent, we mean filters that can be applied to a single search app on the page listing or even to results across the entire page. So again, something that would be quite complicated with views. So our implementation also has clever out-of-the-box caching thanks to 10-stack query, which essentially can avoid round trips to Drupal for querying, filtering, and sorting. Let's take a look at a skeleton React search app. So here, the results page is displaying two separate results sets. There's one of them for global search, one for images. We're using something called a slice provider, which comes from Redux Toolkit. And that allows us to reuse the same React components, like, for example, the results per page or the pager. So when these are rendering, the effects are just targeted to that slice and nothing else. As I mentioned before, this approach is also using 10-stack query, which provides cached results when filtering. So when you filter the results, anything that's already run previously is cached. Essentially, the results show instantly. If you've got a search interface where there's a lot of interaction with the filters and you turn them on and off, it's essentially instant if you're hitting another filter that's already run. So the full implementation is a session or two by itself. For more info, you can see my colleague's Adam's session from Drupal South Brisbane and also Jack's amazing blog post outlining the approach to slices. So the flexibility of this approach is great, but you probably want to keep in mind the simpler requirements. If you have simpler requirements, it might not be a good fit. Again, open search supports. All the regular point and click, use integration. Finally, when it comes time to deploy open search, you can roll your own where a hosted version option is also available, a hosted search like Amazon Open Search Service. So this offers an alternative back end and provides hands-off managed search by AWS. So AWS are maintaining the open search stack and manage things like version updates for you. It means you're able to run a highly available open search cluster essentially without having to worry about how you manage it. It's highly available, which is actually quite a rare thing, for example, in the solar world. So if you have high uptime requirements or the like, hosted open search could tick a lot of the boxes. It also offers services like replication, role-based access control, and data visualization. So you'll notice that we've come full circle here with a hosted search solution. The reality is sometimes SAS is a good fit. So these slides are available on GitHub, if you'd like to refer to them. It's a quick recap. Search is an important part of a lot of Drupal sites. While SAS solutions can be great, custom-building your search stack can solve your exact needs. So solar still offers a powerful back end to power your search with many points to customize to your exact requirements. Luna.js could be a super fast solution for smaller data sets. And finally, open search can fulfill a lot of enterprise requirements, all rolled into an optional hosted service. So hopefully this session has shown you a few new tricks to add into your search tool belt. Feel free to reach out to me on fendstrat on drupal.org, drupal Slack. Happy to answer questions here or offline whenever. The case of having a self-hosted engine like a solar or open search, there is any way to implement in these a language model to make stuff a little bit more clever. For example, we know how difficult is defined synoments in both, well, for me it's easy in solar, but a bit more difficult in open search, especially because you cannot define them from the back end of Drupal, or you need to do in a text file, put an S3, and then regenerate, so quite complicated. Then I'm wondering, is that possible maybe to implement in search API something where you can host yourself a language model where an index like of the word PC is equivalent of a personal computer or something like that without defining all these synonyms or even better, categorize pictures. As we do with solar and Tika module, patch article, you can also pass attachments, and then you can, I don't know, PDFs, you can also index the content of the attachments. Can we do pass pictures to do OCR or even better in media, like find me a cat, and then all the pictures get automatically categorized by this, we are getting there? Yeah, we are, there's a few aspects of that question. The first one of sort of synonym support is, it's something actually we might be tackling as a client requirement quite soon actually, so in open search your synonyms are deployed as config, so you do have an interface in the back end of SAPI to list your synonyms, and probably that, so that's how they're getting out right now, so but it requires a code deploy essentially. So you can configure, ignore it, and give them interface for editing in the front end, that's probably the lowest barrier to allowing them to edit it. In general, I think a lot of our synonyms wouldn't be shared across projects, they're quite site-specific, so clients really know their topic area, and their list of synonyms are very specific to that site, so I don't know about that concept of sharing that across sites, it would be interesting though, it would be interesting. And to your second point of the, the concept of searching within documents, I think is the idea, again as an external service, yeah, we haven't run across that yet, but it would be an interesting feature to build. When using OpenSearch, is Jeff Bezos personally benefiting from that? That's a very, very fair question. Only when it's hosted. Which is why I specifically made mention that you don't need to use the hosted version of it, right? I think the fact that it is licensed as it is bodes well for the project itself. It's hard to decouple the main fund of a project. If you were to take AWS out of OpenSearch, what would it look like? I don't know. Yeah, but it is a very fair point, and definitely the reason I stress is that you don't have to use the hosted version. It would always get Raspberry Pi's and go off grid. Yeah, Starlink. It would be a fan. It's all the time we have for questions. If you want to have any more, please talk with Sol.