 My name is Greg and So I've been working with Elasticsearch for over four years now all at automatic and I lead our Elasticsearch Development efforts, and we've been deploying Elasticsearch for a while. I want to sort of talk about Different use cases. I've seen people use Elasticsearch with Sort of across WordPress. I skew a little towards automatic just because I know that really well But I've seen it across many different cases. So so first of all I want to say Talk a little bit about what Elasticsearch is Just quickly how many people in the room have heard of Elasticsearch before this talk? Wow a lot. All right, cool. How many of you have used Elasticsearch? Hey, really good number. Awesome. All right So I'm gonna give a brief introduction not really diving too much into the details Show about six different use cases that I've seen people use and seem commonly used and then finally talk about Some opportunities in 2016 where you can contribute to wordpress.org and help us use Elasticsearch more So what is Elasticsearch? So at its simplest, it's a search engine in the box It is an open-source project. It's been around for over five years and it's an old it's Apache licensed and Really works very well sort of surprisingly well You can deploy it on some servers and you can also deploy it across many servers with scalable distributed and Kind of magically just scales for up to a pretty good point tens of millions of documents Additionally, it's it's an analytics engine. So you can index things about number of comments on a post and then look at Distributions of those comments across posts look at most common terms across those posts and finally it is multi-lingual So it's built on a library called Lucene. This library has been around for 15-20 years and it supports many different languages Which is important for us as people working in an area where we want to help 25% of the web and help everybody regardless of their language so The way I think of deploying Elasticsearch is sort of as this mirror a mirror of your data So WordPress all your data is stored in my sequel and for Elasticsearch we want to take all of that data Sort of instrument WordPress so that every time a post changes every time every time a post changes We go and put that into Elasticsearch re-index the data every time a tag or category changes We have to go and re-index all the posts that those tags and categories are on So there's a lot of sort of back and forth between the two where we're trying to keep this this data up to date But I don't like to think of this as a one-to-one mirroring of data. I like to think of it as a fun house mirror so It makes sense to me to really think about Elasticsearch as a different kind of data store a different way that you're looking at your data and so this this fun house aspect allows you to take Elasticsearch and and take your take your WordPress data and Get a different view of it a view that enables you to search across it But also run these sorts of analytics and different types of queries on it. So a number of folks Across WordPress have found Elasticsearch useful There's a number of folks doing Hosted WordPress plus Elasticsearch and that this seems to be growing Stuff I've worked on is WordPress comm VIP where we've been doing this for over two years now And I've seen you know page Lee and WP engine are both both working on this There's a number of other cases. I think I saw a tweet yesterday from cloud waves We don't really know anything about but has Done is sort of promoting Elasticsearch and WordPress and also there's a number of agencies out there. So Alley interactive and 10-up have written a lot of code and open sourced a lot of code for Merging Elasticsearch and WordPress There's a number of other agencies that I think are also working in this area and so Many of us are sort of out there Looking at data looking at how we can use Elasticsearch to improve To improve WordPress and there's a bunch of code out there. So there's four different plugins that I know of Probably a couple more actually and a couple of libraries out there. So there's sort of different people exploring how to do this integration All right, so onto our use cases So I broke this up into sort of six main use cases that I've seen out there. There's certainly more The first one of course is site search. I think everybody in the room probably has their complaints with built-in WordPress site search it's Complained about relatively regularly. I had someone complain just a few minutes ago and And yet lots of smart people have tried to work on it And the problem is really that my sequel is not built for that. It's hard to scale. It's hard to get really good relevant data So we need to move to some other technology to do it and Elasticsearch I think really really enables us to do that I've seen a number of people customize site search to do this sort of thing So this is the first site that was launched on WordPress comm VIP where we did Elasticsearch support. It was the Kaiser Family Foundation site. It was built by Ali interactive and In addition to really good relevant search results that they've been really happy with They've also done things like faceted searching. So the idea of for instance on Amazon when you go and search for something you can then filter down by You know what company made something or things like that So in this case you can look at 120 ish results and filter down to the ones that just have slides As an example and this sort of filtering aspect is something Elasticsearch is great at you can also see them doing date filtering and using these sorts of things to Sort of speed up your search and enable new features for users filters in Elasticsearch are extremely performant. They're well-cached and Yeah, work really well for building a lot of different features. So Also, I want to highlight sort of the the multi-lingual aspect of this. This is I'm told the top site in Turkey for for recipes and They've been recently building out a whole lot of Elasticsearch infrastructure and Find, you know, the developers there are all I'll speak native Turkish speakers and they're doing they're finding Elasticsearch working really well for them You know as a as an English audience. We often don't think about just how hard these sorts of things sometimes can be so I think Elasticsearch is sort of out of the box gives us way it gives us ways and the community ways to build on these sorts of things So what's missing? Site search is something I could talk about for days There's a lot of things that that aren't great about it. It's hard to fit that all into this talk, but The highlight I sort of want to want you to take away is search is not just about posts search is about providing answers and There's a lot of different features that can get you there. Some of them are listed here but If you're just indexing posts, you are limiting yourself to what information you're giving someone so comments matter Widgets matter everything on the site matters Even things that aren't on the site matter. So if someone types contact information into a search box It should return information about how to get in touch with the website Even if that's not anywhere displayed on the website potentially, right? This is the sort of thing that Just because those individual words are not on the site doesn't mean that you shouldn't be able to answer that question and in order to really Get to a point where we're we're doing this sort of thing. I think the the key piece We need to do is look at data. So these are a spelling These are spelling errors on support.wordpress.com 50% of all queries on there have a spelling error in them Which is shocking right and Not what you expect and yet these are this is 50% of users who have a bad user experience if you don't correct for their spelling You know you can look at these and you know exactly what word they mean But if your search isn't handling it, then you're you're failing users All right, the second use case is related posts I spent a good chunk of a year working on jetpack and wordpress.com related posts with some other folks at automatic and it's a Very major feature really for any modern website pretty much every website sort of expects that when you get someone on your site You want to keep them on your site and related posts is a great way to do that We see tons of queries this you probably can't see too greatly, but this is a The top line is the number of queries for wordpress.com and the bottom is for jetpack for related posts And we're in the 70 80 90 million range per day And So it's a very common use case that lots of people use And I and the relevancy of related posts is you know from a technical point of view I'd say that Providing related posts is relatively straightforward to do but providing good relevancy is more than just what is related So here's here's an elastic search query. This is the only real code. I have in this top And the highlight to point out here is that the query It's for a post that only has three words in it Trinity College Library. It's an image post and There's so there's not much context to find other other posts with So what we do is we run this query and then we take the top 50 results and we re-rank them And the re-ranking is for using a feature called re-score, which is sort of at the bottom Which might be hard to see But the idea is we take Like your IDs and comments are IDs people who have interacted with this post and other posts And we look for overlap between those and we get more information about what might be related And we get more information about what might be relevant to a user when they're if they like something They may be more likely to go somewhere else and like additional things So deploying this change on wordpress.com, which we did a few years ago It actually increased our click-through rate by about half a percent which in the realm of click-through rates where you're talking about 123% click-through rates is really very Significant third use case. This is I think the thing I'm most excited about recently. I've actually been excited about it for a while So WP query is the class that wraps Queries for wordpress going to the database so getting posts. So for instance Anytime you're you're trying to get all your posts for your home page It's going through WP query to query the database and the idea is that you could instead replace that query Don't run it on my sequel, but run it against elastic search And a couple cases where this happens the elastic press plug-in supports some amount of replacing WP query And there's a library called ESWP query by Ali interactive that actually passes the wordpress core unit tests So you can entirely replace really any WP query with ESWP query And this is really easy to do. So here's an example. This is a change set for VIP on wordpress.com and the the the commit message is offload very slow taxonomy not in query to elastic search and in order to do this change to Just switch from one from from my sequel to elastic search. It's just one line of code So it's args ES equals true in the get posts call and This is what this is a very popular way that wordpress.com VIP Solve scaling problems. So it's very popular. So these are all the times that happened in October and one from August I've highlighted the the the painful words that people write into these commit messages like expensive and poor and Performance So we see this sort of thing all the time. It's a good way to solve problems It's also a very popular query that we run at this point So this is the top line is the ESBWP queries and the bottom line is VIP site search So sites are so VIP is running site search and you can see that even from the initial get-go of when we launched ESWP query, it's been 5x more popular than actually doing site search So the number of queries we run is very common It's actually even more common than jetpack related posts. So the again the bottom line is is jetpack related posts So this is a common use case where we we can use elastic search to solve real everyday problems and scale wipes It's better All right fourth use case is log stash. So log stash is an additional open source project It works with elastic search. So it takes Takes logs text logs and imports them into elastic search and then enables you to search on those logs So I've seen this deployed for things like email and PHP logs and access logs and All of this sort of enables you to run systems and enables you to debug systems that are running in real time We actually recently launched This function called log to log stash on WordPress comm Which enables us to just take any arbitrary set of fields and stick it into elastic stick it into Through log stash into elastic search This I was recently accused by our head of systems of using this as a hammer and hitting too many things with it And But I think I've seen it deployed three or four times in like the three weeks since it was launched So I think you know, this is the sort of thing that's it's solving a lot of problems and enabling people to solve problems and Part of the reason why it's so popular is on the back end you can then read these things and create Histograms of the data and look at logs and what's most frequent and sort of dig into your data interactively Find problems and fix them So yeah, and I mentioned here WordPress comm and I know of many others doing this So I know a number of other hosts are doing this. I know a number of agencies are working with this sort of thing So it's pretty common across across WordPress in general and across the tech industry, frankly So the fifth use case content re-ranking the idea here is a user comes to your site and You say, okay I know something about this user they have at the very least an IP address But probably some other headers and information about them or maybe they're logged in and we really know a lot about this user In that case, maybe we shouldn't just be showing them the default home page with whatever date based listing it is Let's show them what they want if we think we know them and we think we know something about them Let's show them things that they are actually going to engage them So Someone who is amazing at this of course is Facebook so Facebook will go and You know their news feed is in my opinion really awesome and Impressive to look at so I go and look at it and I want to click like on things and I want to comment on things It is Optimized and they've put a whole lot of effort into it. I think this is a general thing that the web Probably is going to have more of you know optimizing for these sorts of things optimizing for people actually Interacting with your site can help your site a lot and So some cases where I've seen this deployed so geo search so the idea so elastic search supports Lat and latitude and longitude in documents and so you can do searching or ranking things based on distance and You can take this idea and be able to Look at a user's location based on their IP address and then show them only things that are local to them So a good example is radio stations where you only where someone goes to a Website that maybe has many radio stations across the country and you want to just show them things that are local to them Show them articles that are local to them that are going to be more relevant to them All right our final use case breaking the blog boundary great for alliteration so The content of a WordPress site Essentially for a single blog basically lives in nine mysql tables There's a couple of global tables that I'm ignoring but the nine are the the key piece of it And then those nine tables, let's say we have 20 sites suddenly you have 180 tables So if you have 180 tables and you want to find content across them Obviously, this is going to be much less efficient than looking across a single single blog and This this comma and more than just searching for it. What if you just want to rent list those posts by date? Look at everything you've written across all of those blogs by date So instead of putting into my sequel, let's use our fun house mirror and throw it all into one big elastic search index And if you want to look at a single blog you can still do that you just filter by blog ID But you can look at subsets of blogs you can look at the entire data set you can filter by user, etc So a great example That was recently open sourced of this is Calypso So Calypso for those of you who haven't heard is is the wordpress.com logged in administration and It is it was recently open sourced and there's two pages the posts page and the pages page And both of those the endpoints are powered by elastic search for things to be like so for instance in this case I have about a hundred sites On wordpress.com and jetpack and I can search across all of them and find all the pages I wrote about elastic search And this is the sort of thing that it's really not impossible to scale very scale on standard wordpress So the having this fun house mirror that that provides us all these features the things like search the things like breaking boundaries between blogs really Enables us to look at our data differently And this this extends to other things so you can think about related posts Or search across all of these sites related posts where you want to Show people related posts that are on on other sites of yours and you know sort of have that cross posting aspect across things Alright So those are sort of the six main use cases There's a couple of opportunities in 2016 to if you're interested in becoming more involved in learning about elastic search And contributing to wordpress.org So the first one is wordpress.tv is getting a major rewrite This has already been announced on make dot wordpress.org slash TV and there's some ideas here of Improving related posts, but also things like we could do this geo lookup type thing of we could see where a user is coming from and Only show them those videos that are local to where they live We could look at their browser settings and understand where they are Where they are so what language they are likely to want content in And make wordpress.tv that much more engaging and that much more useful So this is an area where I think there's a lot of opportunity queries need to be written integration with theme the theme Thoughts about how to make this a great user experience integrated with elastic search that frankly, we need people to start working on and And an additional case in 2016 that is the translation of wordpress itself. So Wordpress gets translated with open source software called Gloppress And there's this idea that's there's been a ticket open for a couple years The idea being if you have a string like are you sure you want to delete this page? And it hasn't been translated to a particular language. Let's look at other strings that have so let's look at are you Sure, you want to delete this comment? Are you sure you want to delete this user? Show those to the translator and enable them to more quickly translate the other string. That's not translated and You know translation is a I'm pretty sure I feel like every time I've seen a state of the word that Matt gives I feel like translation comes up every year and how do we get more? Translations across all different languages so that we can really make the software work well for everybody on the planet And I think this is one of those ways that we can try and help enable that So we need people to to look at how to build these fuzzy queries and these fuzzy matches And how do we how do we actually scale and run an elastic search cluster for dot org for instance? And of course there is an elephant in the room That elephants name is wordpress.org search There's there's been lots of work on it over over the years and a lot of folks have There was some really good work a year and a half ago on like plugins and theme searches that improved a lot of stuff But I think everybody feels I feel I see this come up a lot on Comments threads and things on the internet of people complaining about it And the truth is I don't think this is going to happen in 2016 I think that in order to get there We need more people who know how to use elastic search and deploy these services and to can learn to do so And so these other two projects glotpress and wordpress.tv is kind of how we get there So this is sort of partly why I give this talk is to sort of encourage people who are interested in elastic search or and it within the wordpress community to Contribute to these areas and learn how to do it And then we can do things like improving forum search for plugins where there isn't even a search So that's my talk and I'm on I'm happy to answer questions about elastic search now I'm also on Dot org slack as gi brown and happy to answer elastic search questions really any time so Thank you. Hi the question ends Earlier you showed a slide about the queries being 50% like having typos 50% or sign Yes, so the spelling of the third and fourth are actually right spelling in French So I think that the end that leads to my question is is Elastic elastic search is great that multilingual content in separate silos but how do you deal with multilingual searches like Letting the user entering any language and actually defining what language that is and providing them with the proper content So that's an interesting. That's a very interesting use case So we so on wordpress.com. We actually do run language detection So there's a plug-in called lang detects that we run on wordpress.com It's a plug-in for sorry a plug-in for elastic search elastic search also has a plug-in ecosystem And so that detects like six upwards of 60 different languages. It doesn't work as well on short content So for a single search, it's a little trickier but it is it is certainly a you know, I think it highlights an important part of Spelling errors is it's not always clear that it's a spelling area. It could just be a non native speaker Saying it wrong, which is another reason why it's it's sort of failing users I think there's a there's a huge area within site search where The words that a you that you use on your website Sort of there's a disconnect between those words and what a user using your website has And that's a hard problem to solve because the the differences in vocabulary and spellings are sort of Interrelated and I think that's that's going to be a harder long-term challenge We need to to try and solve but I don't I think we're quite a ways from being able to solve it Thanks for the question Yeah, I hope I understand elastic search as it kind of creates a centralized database of information that is then searchable You know, what kind of is it able to go into the different media library pieces like PDF files or text documents And almost index the content of those and get that as results So I have seen people do that. It depends on how you index it So natively it does not necessarily do that although so usually what you'd have to do is is take those documents those Media documents or whatever pull out whatever in type of information you'd want and then index it so some examples of that is Actually indexing media in WordPress, of course Where even if you wanted to take? Images you may want to pull out sort of meta information And be able to index that meta information for filtering purposes or for you know being able to help people find things there's cool examples of even so Google was recently released stuff some API's on Taking an image and actually Describing it using their API so describing it in natural language So you could even think about ideas of of taking images Running it through that sort of API and indexing that data so that you have more information about what's in your images in your In your site, so you know, that's something that's far away, but theoretically possible. Yeah. Hello. Hi My name is Taylor. Love it. Just wanted to add to that there are there is an elastic search plugin right now I think it's called like the the attachment mapper plugin that lets you index PDFs and some other cool content types cool Sorry, you're at 10 up, right? Yeah, I'm we're the last express people. Yeah. Thank you very much for your code It's really awesome. Thank you. Yeah, and I thought there was some so a Way of is it only PDFs and it pulls out of it's it's a bunch of content types And we actually have something for indexing that I think we're about to open source that it's not quite ready yet But check back soon. Awesome Any more questions? Commentary How do you guys handle actually mapping the IDs and how do you handle updates in elastic search? So mapping of IDs You mean for like post ideas ideas specifically. Yes, so we use I mean we use we index the same post IDs that we have In the database, so we use sorry we use we was blog ID and post ID together so when things are in the same index we're using to Uniquely identify any individual post we use both blog ID and post ID for all that all of that and when we do our queries We're usually we take blog ID and post ID as our results And then we actually do a lookup from my sequel to get the final content because we don't necessarily trust that Elastic search mirror has the final content And that's something we've been doing pretty much from the beginning to make sure that To ensure that we actually have the real final content that we deliver to the user Did that cover all of you also asked about updates? I guess Or keeping things up today in 1.5 in elastic search They took out the ID mapping so you actually can't like control the IDs and I'll search. I see what you're asking Yes well Hmm Let's take it offline and talk about it all right So I'm not a sys admin and spinning up an elastic search instance for me would be very challenging While I wait for my hosting provider to get elastic search instances running so that I can actually use them Is there something like? Like a third-party elastic search service that I can subscribe to or pay to so I can use elastic search Yeah, so there's there's a number of different services that run elastic search at this point the the main company that All the the contributors to elastic search the company called elastic elastic.co They run a hosted SaaS service. There's a number of other ones that also do as well And a lot of them end up sort of running it on AWS And so you sort of pay them to manage the the elastic search into indices for you So that is a way to go. Yeah Could you describe how you control access to the data in the elastic search service? Yeah, so Elastic search by default does not have any security built-in. There are things you can Purchase from elastic.co for security But so far wordpress.com, which is the real case I know about Everything for us is internal and hidden behind our APIs. And so all of our access checks are actually Going through some other system a lot of them are going through wordpress.com itself before the query goes on to elastic search so it's it's something that You have to sort of pay attention to one one really non-obvious thing is if you deploy it locally to your own VPS You really need to turn on you need read to only allow local host type access or only things from a certain place because by default You'll elastic search allows traffic to come from anywhere and if you do that you will absolutely get hacked Anything else? Awesome. Thank you very much