 and we're going. My name is Brian Miller. I work for the University of Ottawa and we use solar. Who else here is interested? Well, I assume you're interested in solar. Does anyone else actually use solar yet in live production? And are you university? Are you? We're an agency. You're an agency? Different clients use them. So our client project is what we actually implement there. Okay. So this is a discussion about how the University of Ottawa decided to go with its solar journey and what that meant to us and what opportunities that will avail to you as adopting solar or whatever other search strategy you want. This is sort of a lessons learned and there is a little bit of code sharing that we've been able to wrangle up for this version of the presentation. So hopefully that will help you out in terms of incorporating it within your Drupal usage scenarios as well as figuring out how solar works for you. With me today is my colleague, Karim from the University. He's the one that's done all of these wonderful graphics on the visuals. So as I was saying, presentation is about our journey through the customized search solution and our choice of solar. Origins. So a number of years ago in 2016 Google Search used to be a very, very prominent factor in a lot of university and public sector areas. We used to buy these things called the big yellow boxes. Basically Google sent us a server and it in itself contained all of the software, all of the server configuration to allow it to crawl all of your site independently and produce the same results as what Google would actually show in their internal search results. So a few years ago they decided we're going to get rid of that. We're going to go to a cloud only solution that we're going to provide to all of our clients and you will pay for that cloud solution. At that point in time we were running a number of very useful aspects of that Google search appliance. So one of the things that it doesn't allow you to do is focused keyword matching as well as focused collections. So for instance in a university environment we would be saying we want to be able to search only for the information and present only the information on our Department of French, and if that was the case then we would have to actually create special collections within this Google search appliance that will allow us to do this. With the Google search tool online typically you can do this but it requires you to set specific parameters and stuff and it's not the same configuration ease. So this is one of the things that really drove us towards looking at what options were available outside of the Google search engine infrastructure that would allow us to essentially create a tool that was useful for us as a university to promote our business needs. So lessons learned. Basically we start our journey with Solar with the two projects that we had previously. We have an internal search tool that is used for our event calendar and that uses solar but then we also developed an internal employee directory. So with all of the universities typically all of the information is public knowledge so we have to actually have a public sharing of all of the employees on the university in a searchable format. The old tool for this was managed through basically a DOS 32 bit interface at the point in time that we were looking at and we basically had to take that information and migrate it into a searchable format that would allow us to present it online. And this was what we call our ECIS employee directory. So employee search directory. We developed this with a very rudimentary understanding of solar at that point in time. This is sort of our first project that we looked at with solar and it allowed us to take all that information put it into a solar index and then extract that out when you make a request. So we basically looked at this and said how do we take this to the next level? How do we look at solar and look at it as solutions for tomorrow for next year for the future? So internally what we did is we said let's take our guys send some of our guys for training specifically with the solar experts in San Francisco and find out what they have to tell us about what is available with solar. So we did that and actually two of our guys went to Lucine and were able to get some great hands on training. Lucine also provides direct vendor solutions in solar if you're looking for managing a vendor to do this for you you'll actually provide that service for you. We chose to go internally ourselves but again depending on what your requirements are you may want to go with a vendor supported solution that relieves you of that burden. So when we were looking at this we were looking at the way that we actually use our website. I've mentioned before to a few other people that we have 300 websites running on a Drupal CMS multi-site platform. So from a perspective of searching individual sites it wasn't really that feasible to have each of the Drupal API searches running and try and correlate that so what we did was we basically said okay we need something that will broadly crawl all of those sites and be able to return us with the results that we need that we can manipulate and that we can provide to our individual clients. We also needed to be able to crawl external partner sites so as with any organization sometimes you have your 300 sites that are core to your business but then you also have a few outliers that like to run their individual special problems and special solutions to their marketing needs and we also wanted to include those within our overall search requirements and it needed to be scalable and future proof. So we wanted to look at something that was very very relevant in the market at that point in time and that had a good roadmap in terms of future proofing it for extending in the future extending with future versions of Drupal extending with future versions of whatever solution came up so we were looking at solar. So why do we pick solar? Solar has a very high amount of market share in the open source community for the search options. Its main competitor is Elasticsearch which also actually recently AWS Amazon Web Services now also provides a solar option for their search options its the Amazon Cloud Searches and the reason why they did this was because when they were working with the Amazon Web Services Elasticsearch tools they were having some problems with the Elasticsearch team and their Elasticsearch team is going more towards a Oracle approach towards Elasticsearch so you get the core Elasticsearch as open source but if you want any of the extras, any of the bonuses they are now providing a private paid solution that does not always come free with the Elasticsearch whereas solar it is open source as with all of the Apache projects and it will remain so. So from our search of solar Elasticsearch looking at that Google search appliance that we had previously used and what it morphed into with the Google search engine, the cloud search engine we determined that our best solution from a market share perspective was to go with solar. Number two, regional usage. So this was kind of a toss up we had looked at a lot of the Canadian universities and again this is data from 2017 so two years old but approximately a lot of the universities that we've talked to actually had not migrated beyond the Google search online tool and many of them were looking at solutions I know that in speaking with a few people here yesterday Waterloo has now gone with a Elasticsearch model where they're purchased and the vendor supports them so where we said that we had no Elasticsearch instances now there is one that I'm aware of but again it was kind of not really a determining factor for us other than that a lot of the universities were on the tail end of adoption at that point in time so whatever choice we made was sort of lead the pack. Third one was the meeting of the business appliance so in terms of what our business requirements identified for the University of Ottawa a number of things that come into it is the usability the ability to manipulate the results and the ability to direct our audience to what we feel is the most relevant content for their requests. When you're working with your Google search appliance when you're working with the Google search cloud you're basically under the auspices of Google's algorithms to define what they think is the best results for a specific individual. In some cases that works wonderfully in some cases Google will return you with 40,000 100,000 results that are not clearly defined so what we as our communications and our marketing group wanted was to be able to say to ourselves if we have a student that's coming to the university and they're looking for specific content how can we drive them to that content most easily and through the most easy user interface that we can provide for them. Another major factor for us and I'm not sure how much of a factor it is for a lot of other people but we are fully multilingual so we're fully bilingual at university all of our results have to be in English and French at all times so the University of Ottawa has a French first mandate and essentially anything that we publish to the web has to be available in both official languages of the university and we needed a search tool that allowed us to actually manipulate the results in both official languages and present them in a way that would be interpreted well by our end users so this was also an option that was given to us with this focused customizable tool and then as I said targeting the tailoring results targeting or excluding specific content was also important to us so again these are a lot of the business needs and again elastic search, solar a number of those other tools would allow you to do this it's just a matter of how much effort you wanted to put into customization of that tool so we were ready to go we had our guys trained we had chosen our tool and we were able to look at releasing our solar system so what does that look like? for those of you that are familiar with solar you have basically two options you have the option to have solar as a single instance connected to your Drupal website or to whatever infrastructure you connected with or you can go with a solar cloud instance so a solar cloud instance has multiple indexes has multiple servers called zookeeper servers which basically cover off redundancy cover off your crawling information another aspect to this is the Apache Nuch project so Nuch in itself is a crawler and all it is is a crawler but basically what Nuch does is it creates that result set that you feed into solar that will allow you to use the solar indexes to basically target specific information so without Nuch solar is basically just a data repository so in our high available solar cloud configuration we have our one solar Nuch server or our Nuch server which is constantly crawling our website constantly crawling the content that we create and creating the data indexes on our two solar servers and then we have our three zookeeper servers which are essentially the load balancing data servers which manage where the data goes and manage which solar server we are actually returning the results from so what does this look like for us so we have our main uOttawa domain our web servers so this is our Apache web servers on a redundant to Apache server configuration and this contains 300 Drupal sites in a multi site environment and this is then queries our solar server which basically takes all the data from the Nuch crawl as well as so similar to I guess what you would call an agar-like type of tool we have a command line database configuration tool which identifies all of those 300 multi sites that we have in Drupal and that basically feeds Nuch and says we want you to crawl this information we can also tailor that specifically with extra XML which indicates which other sites that we want that are not included within our Drupal site and then Nuch will provide all of that information from the crawl into solar and then we have our public search that's available on our uOttawa.ca set so yes as I was sort of saying with Nuch it begins its crawl from the site list it feeds that information into the solar and then we have a custom Drupal module which queries solar for those results so from a Drupal perspective a lot of you may be wondering why we didn't go with the Drupal solar search module some of those what we found was that it was very very poorly built for multi site support and in terms of multiple searching of multiple indexes it wasn't where it should have been at the time that we were working on this project since then I do believe that the multi search is better you can actually reference other solar indexes from that module but at the point in time we were actually working on this two years ago it was not the case we're going to go into a quick demo of what our search does so this is the University of Ottawa search and essentially all of this is driven and controlled completely through a Drupal custom module that we built on our side of things that's basically enable us to access our solar results I will provide some code snippet information about how that Drupal module does those query results as well as providing filters and stuff so what you see here is basically a quick search for education and let's make it bigger so here we have our English and our French results so this was something that we needed specifically from our business requirements that I mentioned because we basically wanted to make sure that if an individual was searching for something and they did not find it in English results realized that oh I need to see that English results it was an easy switch between the two languages to determine whether or not the results are relevant to them now the other thing that you may notice here is this special bold area so this is essentially our keyword matching yes is this Drupal site? this is a Drupal site yes and what are you going to do from then just to present the result the theme basically this is a a Zen variety variant theme with views and templating that basically receives that information with some custom CSS coding it's actually very straightforward and simple and so part of this is basically there's an XML file from the solar side of things that we basically say if an individual puts in something like education as a search result can we basically say them the very very first option that we give you is the faculty of education so basically we can target specific words, specific phrases that a user will search for and give them basically promoted results for those terms so this is all customized you don't use such pages or any other models it's a custom theme so this is a custom theme and yes it returns multiple results and we paginate those results and basically it's a special view specifically for the results that are coming from the solar server so we have our multilingual stuff we have our specific coding we are also working on unfortunately I don't have the demo for it today but essentially what we've done is we've duplicated Google's advanced search so a lot of the real language querying that you have in the search results we can have search plus apply and it will actually return those results so where you have applied to a specific with education as it comes in I didn't turn off the lights yeah I did did you apply it translate to Picasso? yes so there are synonyms so the other thing that you can do with solar is you create a number of synonyms you create a number of anthonyms you create a number of equivalent aliases so for instance there are examples like HR and if they're searching for HR they typically are searching for human resources well how long does that last? it's XML controlled and it's on the fly so essentially what happens is if you want to change this or add something to it we update the XML we provide that XML to the solar server the solar server immediately will respond and provide those results so there's nothing on the Drupal side that's required in order to refresh or update those results okay it's part of the multiliner's features yeah what is this yellow background so this so this is also again HR is one of our keyword matches so if they see HR in those results it will automatically promote human resources over potentially individual things like senior HR managers or whatever information is available so this has been an ongoing process for us in terms of improvements in terms of additional elements that we wanted to focus on so another thing and we will go back to my presentation now because we're almost at the point of Drupal metatags and SEO I'm going to try and do you remember what this was there we go you get that a lot of you may want to read about how we actually get Drupal to provide all of the proper information for the data set for solar so we do this through metatag information specifically on the Drupal side of things basically your content is all created with basic information and we target specific elements within the content model to be shown in the header of the HTML pages that solar and Nuch will then be able to use so when Nuch is doing a crawl it will look at the title of your page it will look at the description that's given plus it will look at any metadata that is actually provided extra to that so one of the things that we've actually targeted specifically for that and one of the things that I want to share with you is we were looking at published data so one of the most relevant issues that you have with search information is how relevant it is to today and how relevant it is to tomorrow so basically you want I can give you a good example of this is currently one of our problems and what caused this was when we're looking for something like dates and deadlines and if I spelled it right so other things come up and you get 2011, 2012 2016, 2017 you don't want those and this is another thing that we're deploying very, very shortly we have basically gone in and figured out what Drupal hooks are necessary for Drupal basic content elements as well as panels pages and we basically looked at how to expose those as meta tags in HTML so basically your head publish date field is actually driven by the content update from your Drupal pages so if you create a page in Drupal with specifically today's date it will actually show up in the publish field on the HTML and we then ingest that with our crawler put it into solar so hopefully in the next month we will be able to actually start showing publish date results for specific content and be able to say in the instance of that dates and deadlines specifically it will actually be able to boost the one for this year over all of the other ones without actually having to do that targeted specific keyword match and so this code for how you do that with both regular content pages as well as for panel pages we are making available through this GitHub link for our UOttawa team that we put together to share with you it's fully commented in terms of what hooks and stuff were used to expose that information and if you have any problems if you have any concerns questions about this my team is more than happy to help and give advice on this specific code drop us a line in the comments or email us directed I'm not going to go through the entire code but essentially it allows you it's all of the hooks that are used for on publish sorry this is Drupal 7 yes so this is still Drupal 7 I'll discuss Drupal 8 and what the future is but that is also part of my presentation but the current situation is with the Drupal 7 and so this is Drupal 7 code but are you going to provide people to decide if they want to have things published in the last week or something that or you will so for our site basically we have in our multi site environment we have an authoring environment and we have our public so the understanding is in our institution that anything that's put on the public prod server it's published as well it would be very easy to double check but that publish node content stuff essentially also targets whether or not the content piece itself is actually in a published state so the date only gets actually pushed when the content you have provided is in a published state so it's publicly available on the website basically our key takeaways from this entire journey were that using what we knew keeping it simple not trying to reinvent so much as just reuse what common patterns were available so our search display and the tools that we sort of integrated into that are all basically just liberally copied from the way that Google handles common searches we didn't try and reinvent the wheel a lot of those elements are available from the solar community they basically say if you're looking to do this this is the way you go about configuring solar to address this it's all very engaging in terms of the community information that's available for everything from being able to create those synonyms and those keyword matches to being able to even now integrate machine learning into your search results so making suggestions and having those suggested terms based on that and all of this is because when we were looking at this tool we were looking at it from a data first approach so the keynote this morning was actually really interesting from the point is that it's not necessarily our websites that are going to drive our content is going to be the data and where do you host that data where do you provide that data for us a decision was made that we wanted to control it ourselves and that we wanted that data to be available in a format that we could extrapolate so that this solar for us is that data source so from the perspective of the future where we can go we've just implemented facets so in this code we have both the so another snippet that is provided is what we're calling our solar search php this is Drupal 7 php code which allows you to query solar for filters and for faceted search information so when you're talking about facets you're talking about when you go to Amazon.com and you're saying I want to look at TVs and I want to look at 47 inch TVs that are LCD or LCD that is a basically a filtered down faceted view of your information so again I don't my team we actually just finished the last spring last week and we have to wait until our web update in June to push the header updates for the Drupal sites so by the end of June I believe that when is our June web update the third week so after that all of those publish dates will then start appearing on the pages and then we'll be able to start filtering that and we'll actually expose so right now the module that we have for that advanced search is essentially deactivated in all of our Drupal custom modules the way that we handle the custom module is that we actually feature lock it with a config checkbox which basically says that we can push the code but unless we actually click that checkbox it doesn't actually show up on the public website so we're at a point right now where that's where we're at and unfortunately it's not publicly available and I was hoping that it would be for this presentation but unfortunately not next few weeks does that help answer that question so this code that we provided is both the solar config XML and basically what this does is it defines a number of the variants and the fields that are shared in order to generate those facets so from the university perspective a facet might be you want to target a specific faculty you want to target a specific department you want to target a specific program and then we can filter through all of those levels and that's what this will provide for us and this will show up on our search results in the right hand once it's enabled and then the code that basically does those filter and search requests and the hooks that you would use in a Drupal module to access those are in this, in these code snippets so one of the reasons why we're using code snippets and we didn't actually create a module is the University of Ottawa uses an API gateway to securely encrypt all of our API data requests and that does not actually so there's other things in our modules that attributes directly to that API gateway that can't be shared so these are the code snippets for what the module should have in it but it's not the complete module okay yes and so as I was saying the future of this is more sub-site searching basically being able to create a specific search block on a Drupal site and say for the faculty of education I only want to search the faculty education stuff and that will provide the default parameters back to the search engine to only target that specific site so now with the tools that we have with Faceting with the advanced search elements we are in a position where we do that plus the AI machine learning stuff that's coming in I'm not sure when it will be available but there have been a number of major advancements within the AWS marketplace as well as the Google marketplace to allow for as they were saying plug and play usage of machine learning there are tools now that will allow you to basically have a live transcription of a meeting while you're having it and have the notes actually live transcribed into a notepad directly from just speaking there are a number of other things so I had the good fortune of actually going to an AWS summit where they highlighted that they now have Alexa actually speak proper Quebec French so if you switch your settings for Alexa to French and query it it will actually properly respond in the French that you expect rather than proper French French so the transcript shows the search results or the actual video well as he was showing this morning the AI stuff that Google so at the keynote he was showing that they're actually now doing AI investigation deep diving of podcast text and other things like that so from the power dynamics and power BI stuff that Microsoft is showing with Azure and with the AWS stuff they're basically just plugins that you basically can buy a third party plugin from AWS from Microsoft and essentially expose that through a scripting language input in your data and it will output out the results so if your data input is an audio file it will export out a text file of what was said and that's all data and so the advantage of solar and the advantage of elastic search and the advantage of these data repository elements are that they are the data that we use to feed these machine learning algorithms right so what we can do with the results that we have now we find out who's using what and we say okay these are our most common terms feed those into the AI and eventually we'll be able to basically replace that keyword matching with these are your top suggestions rather than have to go through the whole element that Google goes through with their algorithm these machine learning tools will actually be able to respond from our bias of what we want them to respond for and the advantage of this in terms of the Drupal community is that as we move towards the API services in Drupal 8 it will become more and more easy for us to integrate that data into Drupal 8 and to integrate these elements of machine learning of all of these other advancements so thank you very much for listening to me thank you we're going to go towards using the API so the data first approach will be for that so we're currently in a web transformation project towards Drupal 8 and the API service base from start to finish and have you had a chance to look at that search fabricated solar search API fabricated solar we have not looked at that I saw it but I have not had a chance to dive into it yet so one of the upsides of search API is that you can index the deltas of the changes that happen on fly how does that handle now that you use nuch can you do individual URLs or are you doing the whole site nuch can be threaded so our nuch server can actually run multiple threaded processes we basically can say add these web pages to the nuch crawler and spin off a separate thread that just does those well the daily one is turning through and looking at it and basically the nuch server the configuration you can control how frequently you want to search that information how stale you want the information to get and how much of that information you want to retain so you could essentially say I want nuch to make sure that every 15 days it crawls the entire site for us we have a lot of pages so we're actually on a 30 day cycle but essentially it also factors in all of those deltas of knowing when the content was done and now that we have the publish date we'll also be able to use that publish date and feed that publish date into the configuration for nuch to say if this publish date is X then we need you to crawl this information more frequently so then so it's not really a real time then it sounds like are you doing it like you're then telling nuch to go re-index that page or is it but no saves it can take at least 30 days before it's updated well no so our nuch server is constantly crawled so if it finds new information and it takes probably about a day for it to find new pages and new information but it's constantly in real time trying to update itself so it's running through its list of URLs that it constantly is going through we also have a separate process which well it's not real time we can basically sort of jump start that by saying please spin off a separate nuch instance to search the specific box we have not done the connection between a Drupal module that basically says as soon as it's published send it into a queue to spawn off this nuch but that's completely one of our overarching nice to have things that we would like to do it is one of the problems like the current module that we have on people the issue was the only question is according to the fields that we define at save and there are some stuff that we need more from the pages right and also third party external pages are not contributing to the other or like we can't push that information so we can just use that so we need to put everything in one search index so we don't have to manage the progress yeah like mainly because we can't manage what because not all the sites only that specific site installation would have those fields like some other site that uses WordPress then we expose those fields to be pushed to the they require a separate nuch configuration for those specific things so the advantage from the university perspective that we have is that all of our sites are on a common look and feel and so the fields that we use are very consistent between site to site so for our 300 Drupal sites we basically have one configuration in nuch that says on a Drupal page this is what you're looking for this is what we're providing in terms of the fields that are going to solar those fields need to be public and then you can be public yes you can have some private fields say I want these to be higher well no so the way that it works is that after you collect those fields then you have boosting within the solar configuration that says I want to weight this field higher than this other field for instance when we have the publish date we're going to boost that in terms of the viability of that search result versus the other elements so it's going to be more important to us than say that you've actually have education seven times in the content body of that element does that help sort of no yes so how do you manage the elements like you said HR is actually searching for human research yes let's say tomorrow there is a requirement if you search for research which should give you the result for human research so do you actually go on the solar side update the synonym.pxt to add the new synonym and make it work or do you have an automated way to make it work from the web to synonym that's another one that my lead developer wanted to add in where basically we could go from the Drupal side and have it populate that synonym yes however currently our solution is that we have a get repository so in the xml for that for the synonyms we basically update that and then we have a direct runner which basically updates the solar configuration every single time that we update our get repository and that's the way that we handle that we're not handling it directly through Drupal but that is the runner and then we have to update the file so you have the virtual solar and the web solar and you see the solar because no that's important and it's independent so the Drupal side of things is independent and doesn't require anything to be updated from the Drupal side of things for basically say I have new information on the solar side that controls that so Drupal basically is completely agnostic of that information and I've heard of a site specific search yes which as we don't have what that looks like it's just a boost so if somebody is searching from a certain sub site and you want results really targeted to just that site so what we've done in our Drupal side is we've identified every single one of our multi sites with a unique identifier and so we can actually target that with a mapping tool that we have and basically say in our solar search results as a field so as I was saying before communication you actually have a specific department of communication identifier associated with all of the links that were crawled by NUTCH so we'll be able to leverage that with our classes and our filters right thank you very much enjoy