 than three of you. I have given this talk in a room of three people. So hopefully, no, they had a lot of trouble. But we got to work back and forth and figure it out. Anyway, so you're here to learn Elasticsearch. I've got 50 minutes, I believe. So we'll see if I can fit an hour-long talk within 50 minutes. Hi. My name is John Berryman. Just to do a quick introduction to myself, find me on Twitter. A lot of my self-worth is derived from my Twitter followers. So growing up, I was a pretty, pretty nerdy kid. Started reading programming manuals when I was in the first grade. Ended up getting into aerospace engineering. That was my first career. Decided that satellites and all that stuff were pretty cool. But I liked the programming and I liked the math. So I moved after about four years in the field. I got into search technology. And I was consultant. I wrote a book. That guy right there wouldn't necessarily recommend doing that with your life. But it's a good calling card. And now I work at Eventbrite. I am a discovery engineer. So search and recommendations and stuff like that. So to give you a little preview of what we're going to talk about, this is not really an advertisement for Elasticsearch. But what a lot of what we're doing involves my mental model thinking through the Eventbrite problem. So just to give you a little shared background. Historically, the company I work at, Eventbrite, has been a very organizer-focused startup. We allow organizers who want to put on their own events to come to our website. You can build a nice little web page with little effort expended. You can sell tickets. We take care of all the credit card mess. You have a platform for messaging attendees. And we gather metrics. So after the event is done, you get to look back and make sure that your next events are as good or better than this event. But after years of actually nailing down this side of the market pretty well, my company realized that, look, we've got all this inventory. Everyone, we're basically white-labeled. But everyone is plastering their events on our website. If we can turn around and sell our inventory to everyone else, then organizers are happy. The customers are happy because you can find something to do over the weekend. And we're hoping to generate the so-called flywheel effect. And this is exciting for me because this is where I belong. Creating the marketplace is all about building search and browsing and recommendation features for Eventbrite. And of course, this technology is based on elastic search. But we're talking about it today. Can you guys keep it secret? So I know we're supposed to talk about elastic search today. But I got to tell you, I'm actually more interested in talking about my new startup, EventDark. Yep. So don't tell anybody, but I am going to directly start competing against Eventbrite. And our guiding principles, and I'm sorry to do this to you. I know we're supposed to talk about elastic search. But elastic search is hard. So I'm going to focus this new startup, and you guys can join me if you'd like. It's going to be focused on MySQL because everyone knows how databases work. Databases are easy. And let's just build on a tried and true platform. And let's not overthink it. And our specialty, because I found a free data thing online, is cat-related events. We start at cat-related events. Good. So we have some attendees already. And then we'll expand other fields. All right. We have someone who will lease by our ticket. So we have a marketplace. Excellent. All right, so building this new website is going to be pretty easy. There's not really too much to an event. So here's our schema with MySQL. We're going to have IDs, an integer, name, description, city, start date. You can look at all that. That makes pretty good, simple sense, right? And my hypothesis that I know will play out well is that we can build a website based on this. And I'll demonstrate it. So here is our event search. Select star from events. That gives us all the details we'll need back for the website. We have date range search. Obviously, you'll need that to find something this weekend. We have geo search. Not hard. Why invest in all that stuff? We can just do string matching. And finally, it's easy to search for events that you like. So I wanted to find an event where the name equals cat. The results are nothing. Oh, so this is interactive part. Why do you think there might not be any results for that particular MySQL query? Oh yeah, okay, so that's a little problem. I could spell cat with misspellings. These are all, you guys are overloading my brain, but I think we can still make this work out. Now you guys can't spoil all my slides before I get to them, all right? So the particular problem here is probably no one's gonna, to the first answer, no one's probably gonna name their event cat. Would you like to come to see cat? So, okay, so MySQL solves this for us. We can use a like query, like percent cat percent. And the results come back is, yeah, teach your cat to niche an evening of cat bowling and BYOC cat dance party. This is, we're on board. So, okay, so that was just a silly thing, just to show you that we can probably accomplish this. Let's get more serious with a more serious query. Someone's likely looking for a cat farming seminar. So we're gonna help them, what? Not in a bad way. That might have particular meaning to you that it doesn't to most of my audiences. No, not that event. So anyways, so how do we look, how do we search for this? If someone comes to our website and they look for cat farming seminar, select star from events for name like percent cat farming seminar. Results, well, it's in red, which that's the thing I would like to match, but it doesn't match. Interactive time, what have I done wrong now? Case, that's right, right, right. MySQL is all uppity about case, so this is also not hard. All we have to do is whatever the people type to us, we lowercase it and it'll still work. Cat farming seminar, so okay, great, that matches. But seminar for farming of cats, not such a match. Anyone have any ideas how I can deal with this one? Cats or farming? Well, let's try and, I want to make sure. Yeah, so okay, so let's do something like this. Good idea, good idea. And, well, it's starting to itch me a little bit because I heard that like is not as efficient of a query as just like a pure match, but surely not, right? And we're doing it three times, so it's kind of like scanning every document in the database three times, right? But we'll probably shard it and that scale will be fine, I'm sure. So anyways, we do indeed match that. Seminar for Farming of Cats, but we don't yet match making a cat farm the seminar. And now you're totally in my head because I didn't realize that this was a potentially derogatory thing. Making a cat farm the seminar. So why does that one not match? Farming and, well, but they're the same thing, right? Yeah, so it's like, with search technologies, they do a pretty good job about understanding language and I guess we'll have to like cut off the ends of the words. So farming, farm, at least, that'll match farmer, farms, that'll match other stuff. And we indeed do get back the results we want. I'm starting to poke some holes in my little theory here, though. This is an old presentation. Are you telling me I should retire my presentation after this time? Oh, yes, you're right. Okay, so I should have updated the dates on my examples on my slide. For Mr. Michael Handler in the front row. So next one, cat farm class. Doesn't match either. It's a class. It's kind of like a little mini seminar. In order to make that work, I'm gonna have to do, what am I gonna have to do for that one? Oh, okay, okay. It doesn't match all the terms, but at least if it matches like a couple of them, that should be good enough, right? So I replaced my hands with oars to someone's suggestion earlier. And what happens? I do indeed match everything I want. And I match all these things I don't want. And since there's no notion of which match is better than the other match, all the stuff with a cat event goes to the top and this is the whole thing about cat events. So guys, I think we're sunk. I apologize for taking you through this startup with me, but databases are very good at some things, but search engines and search technology are very good at a different set of things. In particular, search engines are quite good at finding documents that not only just match exactly what you have, but contain specific tokens and phrases of the tokens in different mutations of the tokens. They understand English in a way that I think you'll understand when you leave here. Scoring and sorting of documents. MySQL finds the set that matches, whereas Elasticsearch, as we'll see in a little bit, you can put into it an understanding of how good or bad a match is to particular search terms. And finally, this is something that both MySQL and Elasticsearch are good at, but it's become an interesting, more recent use case with search technologies. Searches are actually really good for filtering, grouping, and aggregating data. So search engines came out of information retrieval field, but they're being used more and more for like log analytics and stuff like that. And we'll touch on that right at the end. All right, so now since we've failed, let's go ahead and get back to the main talk that you guys got came here for. We're gonna teach you about Elasticsearch, and in the next 30 minutes, we'll do a really quick and dirty application. I'll show you how to pull down Elasticsearch, create an index, index stuff, and retrieve it. We'll take a peek under the hood so that you can see the data structures and algorithms in place. Fortunately, the data model for Elasticsearch is simple enough that you can leave with a basic understanding of it. And we'll get, as I promised, we'll get into some of the data aggregation stuff that Elasticsearch has been used more recently for, and then we'll have hopefully a little time for questions. What in particular I want you guys to get out of this is a couple of meta goals. One, I want you to see me using the very basic implementation of Elasticsearch, and I want it to be approachable for you guys. So it's a tool on your shelf that you can grab for and learn more about when you need it. The second thing, and I encourage you to do this with any technology, any data store technology that you want to use, I want to impart an intuition about how these data structures work and what they're good at, and a little bit about what they're not good at. This means that when you reach the shelf to get your tool, you actually get the right tool. So building a basic search app is not that hard. And you can get, I'd say for, there's a lot of tuning that comes with Elasticsearch and getting the behavior and the relevance, notion of relevance, just right. But getting the thing out of the box and turning it on, it'll actually get you about 50% of the way there. So it's a real quick technology to get up and running and get some good results. In order to install and run Elasticsearch, this is pretty easy, you all probably know what WGATE is. So you can pull down the, find your favorite mirror, pull down Elasticsearch. In this case, I do need to update my notes here. It's a little bit older version of Elasticsearch. But pull down, unzip it to whatever, wherever you want it to live, CD into that directory, and then start the binary, bin slash Elasticsearch. Once you do that, you can just curl local hosts at the Elasticsearch port 9200, and it tells you, hey, you know, for search. Like in case you forgot that it was for search. But Elasticsearch is now up and running. And just like with MySQL, with Elasticsearch, you will want to think in advance about the type of data that you're gonna be interacting with and build a schema for it, or as they say in Elasticsearch, a mapping. Now, Elasticsearch is interesting here because early on they advertise that they were a schema-less data store. In the age where MongoDB was rocketing off, everyone was kind of tacking on to this. And it was true to an extent that you could just start dumping information into Elasticsearch, and that's gained Elasticsearch a lot of popularity, but it's still kind of an anti-pattern. So in my opinion, over years using this technology, it's still very important to think through what you're getting ready to do with this thing. So setting up the mapping is simple. Everything in Elasticsearch is a JSON interface. And in this particular, this is a Python conference. So every example that you'll see here, I am using the Python client, but it's really nice. It's really a fairly thin layer over the JSON interface of Elasticsearch. So when you're setting up the schema, all you have to do is specify the fields that you're going to have, in this case, ID, name, description, city, start date, price, and you get all of the things that you would typically think of existing in a data store. So you have numbers, integers, floats, strings, dates. So you can start to get more complex things, like dates, you can get locations that are a little bit more aware than just two numbers. It knows what a location is. But one thing I'll be focusing on is not only can you have strings, but you can say that your strings are special in some way. For example, an ID is a type of string, but it is a string that is not analyzed. That means that we're not going to do any special massaging and trying to understand this as a string from natural language. However, both the name and the description here, I've marked as having an analyzer that is English. So this is me giving Elasticsearch a hint that not only is this blob of bytes actually text, but it's text of English. And I'll show you what that means to Elasticsearch in a little bit, but it's interesting because you don't have to put English here, you can put Chinese or Japanese or any language, most any language that you'd want, and you can make up your own stuff. So there's interesting things that you can, extra rules you can put in for like if you have camel case strings because you're indexing programming languages, you can break it up and make your own analysis chain for it. And then of course, here's me using the client. You create event rights with that mapping structure. Okay, so we have an index set up ready to receive events. Actually adding the events at that point is pretty simple. You have an array of events and it's just JSON blobs again. The client is nice because you can do, you can use date times and it does the right thing. And then the simplest version is for just an iterator, for every doc that you have, then dump it into Elasticsearch. This does make an HTTP request for every doc so there are batch methods once you actually really want to put this into production. That's an easy way to get up and running. Okay, so now we've got a bunch of documents in the index, the next bit is to pull stuff out of it. And the easiest way to explain this, oh yeah, sorry for the microscopic text. How horrible is that to people in the back? I'll just speak louder. So the simplest building block for pulling stuff back is this match all query. And it does exactly what you think. It's effectively the select star from the events table. It gets everything back in the order that you indexed it in. And you don't have to understand what is on the screen here, but I'll provide these notes on my Twitter account later. You can see it, but it gives you back what you'd expect. It tells you how much time the query took. It tells you if there's any errors and obviously importantly, it gives you all the hits back. All the documents that match the query sorted by how well they match the query. In the case of match all, there's no notion of relevance so you just get them back in the order that you index them. All right, so that was like the hallow world of making a query, but there's a lot of different things you can do to craft the notion of relevance. What is an important document? What should match? What should not? And the smallest building block for these is the so-called term query. So if we have an index document, it's in an event in Nashville, if I wanted to make a filter over all the documents and only hit documents corresponding to the city Nashville, then that's a term query. I say this is a term, the field of city, the token is Nashville. The special thing about a term query is just like earlier where I said not analyzed, term means that this is just a token. It has to be capital N-A-S-H-V-I-L-L-E. Doesn't do anything special. And so that's a match. But where it gets interesting and where you really get a benefit from a search engine is when you start incorporating this notion of, hey, this is not just a string, this is actually English text. And so if we have a sort of stupid document here, name equals Dilbert sorting for fun and profit, then a query that is not a type tech term, but a type match actually applies to that special knowledge about this is English. And so rather than looking for sort Dilbert, the exact tokens there, it knows that it can be lowercase, we can split on spaces. Sorting and sort should be basically the same information. And so that's a match. So compared to what you think about how you'd have to do that in MySQL, you would have to make a horrendous query to make that one simple match right there. And it would also be very poor performing for reasons that I'll get into in a little bit. Getting more and more complicated because your application has to have a lot of different ideas mixed together. You can do phrase matching. So not only do we have the notion of matching documents that have these terms, but we want a document that has the term sorting and Filbert in it in that order. This is not a match because the original document had Filbert sorting. However, if we search for Filbert's space sort, that is a match despite the fact that it's different from the original document. Original document has uppercase and has different parts of speech. But think about as a user looking for something, you don't quite remember the name of the movie, but you're probably gonna get something like this. So getting these type of fuzzy matches is a specialty of search technology. Filbert fun won't match because there's space between Filbert and fun. Just more example of how match phrase works. But you can add this notion of slop and everyone chuckles when I do that one. That's what it's called. You can add slop and it'll find any document that has these two words within a space of two. And you can go nuts with this. I once had a gig with a US patent office and their search technology that they were getting rid of and moving to a different search than the Elasticsearch Solar, they really wanted to know, I wanna find this word within the same sentence is some other word and I wanna find it before or within some number of words. And so you can take this same behavior and overload it and get some really complex search behavior. But everything I've showed you to this point is just like atomic. It's like I want this thing or that thing. You have to have a way of gluing these things together. In Elasticsearch, that is a Boolean query. In normal notions of Boolean queries, in normal notions of Boolean, you think ands and ors and nots. Elasticsearch has that but using different terminology. Rather than ands, we say must rather than should or we say should and then not is must not. So that one makes pretty good sense. But the idea and if you play around with a few queries, you see why they moved to this terminology. Usually you have an array of things that must match. So in your Elasticsearch query, you have a must key and so you stick everything that must, all these sub clauses that must match there. And additionally, you have several things that don't have to match but should match. If they could match, if you could find documents that also happen to have these other things, it should boost a little bit higher. So that's yet another array of things that if it matches, then you get a better score. Each one of these pieces, you have the ability to also adjust weights. So we're starting to get into a notion of how search understands what's important to your customers and to your business. You can not only match documents that match their queries, but you can also boost documents that we need to sell quickly because they're expiring inventory or something like that. And that leads us to our next big topic, search relevance. I'm curious, how many people here have heard of the notion of TF IDF? Okay, only this half of the room, that's interesting. You guys should have mixed in a little bit more. It's not a hard concept. And so I think it's intimidating at first, but I can break it down pretty easily. This will be a little bit of a mathy slide, but not too bad. First off, TF is really, just means term frequency, and I'll get into that. And IDF means inverse document frequency. And the best way, rather than giving you the Webster's definition, the best way of explaining this is through an example. And let's say a user comes to your website and makes a search for the diddle. Now that seems odd until you realize that one of the matching documents in your index is hey, diddle, diddle, the cat in the field. That's actually a pretty good match for it. So let's do a little practice round and see what this document would be scored as from the search engine's perspective. Term frequency is simply the number of times a term occurs in a document. So the TF or V in this case is two. The occurrence of V is twice. Similarly, just by coincidence, diddle also occurs twice. So TF for both of those guys is two. So far so good? Inverse document frequency. Sometimes I just wish they called it document frequency and just put a one over it. It's basically how many number of times, so the document frequency is how many number of times the term occurs not in this document, but across the entire set of documents. So document frequency for the, pretty high. So the inverse document for the is just about zero. Makes sense? And the document frequency for diddle, not a very common word, is about it's only occurs in seven documents. So it's actually very important and it gets an inverse document score of one over seven, which is a lot, lot, lot higher than zero. So when you finally are figuring out the total score of this document against this query, you put all those pieces together. The score is the TF IDF score for the plus the TF IDF score for diddle. And you probably make sense, but just be a little bit redundant. TF of the is two, IDF of the is zero goes away. TF of diddle is one seventh, there's two and IDF is two seventh. And so you get the final result of 0.2857, but the idea is every document is going to go through the same process and be sorted. And so the way that you craft your query informs the way that this math works and the documents that you have about 10,000 matches, but you wanna make sure you do the right thing. So the top 10 search results are what they want. Okay, yep, so that was a pretty overloading slide. I always like to take a break after heavy slides like that. And I think play work is really therapeutic. And in particular, I think that this, this is, this is my favorite one. That's great. We're gonna watch that one more time. I love this part of this talk. Okay, service is a good break. So to this point, how much time have I got left by the way? So at this point, we've done a lot to get you in the mind space of how search works from a mechanical perspective, how to dump stuff in, how to pull stuff out, what it can do as compared to other data stores like MySQL that I was picking on. The next thing that we wanna do is dive inside the data store and give you a little of intuition about how the pieces inside work. And what you'll find is it's not that complicated. So after this section, you'll have a little better understanding about when it's right to use Elasticsearch and when it's not. So getting data in, in any data store, there's two main chunks that you have to understand how you get data in and how you get data out. So that's the outline for the next bit. The first step of getting data into Elasticsearch is a step called analysis. And basically we're gonna take a document and in this case, I've got just one field out of a document and I will show you how it effectively gets shredded and rearranged and shoved into the data structures that make search technology so fast. Our example in this case is the, send it's the conspirators conspire conspicuously. I chose it so that I could almost not pronounce it at a conference. Tokenization, that's the first step. In this case, we have told Elasticsearch, hey, this is English and that gives us some interesting things that we can play off of. We know that English is split on white space and also punctuation, we can basically throw out punctuation. An interesting side note that I always like to make here is this is not true of a lot of languages on the other half of the earth, right? So like my wife is Japanese and so there are places where you could have symbols right next to each other and they're different words and the same thing, doing the same thing in Japanese, which you still have to do, you have to have a really complex algorithm to know where the best place is to split these things to make a logical sentence. So tokenization itself is a fairly deep topic. Next step is actually a fairly shallow topic, lower casing, pretty easy, but if you have someone type in lower case, you better make sure that it matches a document that has uppercase letters. Stop wording, a lot of the words in English are just noise words. They help us understand where things are placed relative to each other, but they don't really change the content. So we can throw away words like the and is and was and stuff like that. And perhaps my favorite step of analysis is stimming. This is another place where, because we've given Elasticsearch the hint that this is English, it knows some interesting tricks to do. If you want a document for farming to match a query for farms, which is often the case, then effectively what stimming does accomplishes that. You can take a word and using a statistical technique, you can effectively chop off and sometimes modify the end of these words to make tokens that are easier to match no matter what the intent was of the people searching. All right, next step after analysis is indexing. So our example sentence has turned into these three tokens, conspire, conspire, conspicue. Sounds like Latin. Let's say that this is document one. The secret sauce of Elasticsearch for being so fast is effectively during the indexing process, it takes these sentences, turns it into a bunch of tokens and then it effectively transposes that. So instead of document one has these tokens at the end of the analysis, when you've gone through all of your documents, you say these tokens have these documents. So document one had these tokens, but in the end conspire appeared in document one as well as these two other documents. Conspicue appeared in document one as well as these three other documents. And so effectively from a Python point of view, you could implement this with a dictionary where the keys are tokens and the values are an array of IDs. Now under the hood, this is actually implemented in Java and they do a lot of sneaky stuff. They shim extra information in the keys. So all the notions of document frequency, which we use for scoring, get shoved over into the keys when you look stuff up. And all the notion of term frequency, that's the other half of the TFIDF, are basically hidden into the values on the right as well as other information like the positions of the words in the documents so you can do phrase matches and stuff like that. But effectively, a simple search engine is just a Python dictionary like that. All right, so we have now gotten all the information into the index. The next half of the equation is getting information out of the index. So our inverted index looks like this and make this interactive. How would, given that data structure, what's the easiest way to find all documents containing conspicuous and aardvarks? Anyone? Yep, that's all you have to do. Effectively, you have, these are lists, but they might as well be sets or iterators and you find whichever one IDs occur in both. And you can build arbitrarily complex things on the same idea or just a set union. And if you combine a more complicated search, it's a set union followed by a different set or set intersection. Pretty easy. So, but that's only half the puzzle because my SQL is really good at finding documents that match. I just showed you how Elasticsearch finds documents that match efficiently, but Elasticsearch has to turn around and do a sorting algorithm that is part of the important aspect of search. When Google gives you back the 60,000 results it supposedly says you have for your query, you only see the top 10 and they're usually pretty good. If you scroll down 50,000 pages they would probably be less good. So it's important to know how that works. Effectively, what happens is when your user gives you a query you have an iterator of all the documents that match. And so what you do to find the top 10 is you initialize, you have a priority queue. Do you all know roughly what a priority queue is? We can talk about that. But effectively what you do is every document that comes through you take it off of that iterator, you look at all the other secret stuff we've hidden in there and find the score for that document and now you put the document and that score on your priority queue and there's something there that's just iterating doing that with every single match that exists. The interesting aspect of this priority queue though is that it doesn't keep up with every document it ever sees. It's only of length 10 or whatever you tell it to be. So as soon as you get past the top 10 documents you've got one that scores lower than the documents then it compares itself to not even 10 like it's log, order log in or whatever. It compares it to a few of the documents and says I'm lower than all of these. Never think of me again. And so the action is actually pretty efficient. Now there's a little side note. This is another intuition that might be important for Elasticsearch. If you're doing some sort of relevance but you also want to return 100% of the documents think about how you to implement that. If I want to, deep paging is what this is called. If you've got a robot scanning your website for the 10,000th to 10,010th most fun event then this means that you have to have a priority queue that is 10,010 long and you sort all the documents in, throw away the first 10,000 of them and give that chunk back. And guess what happens when the robot goes the next page carelessly? It just gets worse and worse and worse. So that's one important intuition to think about search technology. Elasticsearch allows you to turn that off so if you don't care about relevance but if you do I would recommend not letting anyone get past about 500 results. All right and then Ari said it returns the most high priority contents for that queue. That's effectively what we do. Like after top 10 they go away. The data structure is only 10 items long so it can't hold anymore than that. Oh yeah, yeah, that's not a bad idea. I don't know how I would implement that in Elasticsearch. I don't think they make that easy for you but yeah, that totally checks out. All right. Okay, so I need a little transition slide here but effectively that gets us through everything that a search engine has been until about three years ago. But Elasticsearch came out of information retrieval, library, technology type stuff, finding whatever I wanted to find. But Elasticsearch has started to prove the point really strongly that the same data structures that serve search results are actually really good for online analysis, log parsing, stuff like that and a big chunk of that is its ability to do aggregations. And I think I can convince you that it's basically what we're doing before just one extra step and you get this nice ability to do aggregations for free almost. So just like before whenever we're aggregating we wanna find a histogram of the ticket prices or something like that. We have all the results that we had from before. We do the sorting like we did with them before but while we still got that document in hand we push it through an aggregator. It's basically just a little in memory thing that says, okay, how many documents have I seen in from $10 to $20? And it just increments those counters. For every document it does this and at the end of it you pass back this aggregator thing and you have these really nice results. And it was just something that you did almost as a byproduct of the actual search itself. So with the building blocks that I've given you right now you can see how we have the ability to easily filter just what a search is. You can group stuff because you can see as the documents are coming through you can already figure out which group it belongs to. And within each group you can do calculations to do running averages or anything like that. So to give you a little more intuition about how you might use aggregation here is how I encountered it for the first time. Let's say you go to Amazon, you're chuckling. Have you seen my, that top book by the way is a really excellent book. So anyway, if you go to e-commerce sites you see a lot of the original use for aggregations. They were called facets, faceted search. You have a list of subcategories on the side. You have the counts for how many things are in that category you can click on it and it serves as a filter. It gives you a little bit of what I call relevance feedback so you can understand what's actually happening. But people have taken the same data structure you turn it on its side and you've got really nice histograms which at event right, we're making them prettier now but you can use them to feedback good information about how many tickets are sold from a particular class. You can take exactly the same information but a different data set and give spark charts for how many tickets were sold in a particular day. And you can take again counts over buckets and you can plot it on a map and you've got a really nice geo information console this to give you intuition about where things are happening in geospatial relationship. And finally it's, I don't know exactly how to make a picture for it but analytics, log analytics in particular are great with Elasticsearch. Building Elasticsearch, building aggregations in Elasticsearch is easy. I'm gonna kind of fly through this so I have a couple of questions but effectively all you have to do is you have your normal query. You keep asking your query like normal but you add a new section to your query to Elasticsearch called acts. And in this particular case it's gonna be hard to read so I'll blur over it but you can say things like my aggregations I want you to do counts for how counts grouped by city. So that's a term aggregation where the field is city. And I also want you to do a histogram aggregation for the prices with an interval of 10. So that's the second thing. The results come back and you have the normal search results at the very top but you have a new section that has these aggregations in it. In this case I've got the city bucket right there with my Nashville and Dallas and BFE events. And I've got my price buckets for what distribution of events occurred. But a neat thing that you can do so right now I really needed a graphic for this. Right now I've got two separate aggregations. A neat thing for us to provide back to our users is not only the histogram of all the events but we can do a histogram per each city. And you can do this with elastic search. Aggregations can be arbitrarily nested. There's performance issues after some point but I can say at the top level do a terms aggregation so we bucket everything by city and I get the counts back. And then within that aggregation do a histogram so that we can show our users here's the price distribution within the city that you're interested in. The results turn come back. Very similar structure except it's appropriately nested so that for each city bucket you have the count and within that you have sub-buckets for the histogram so that you can draw it on the screen. That's effectively it. I've been doing this a while so I have a lot of things to learn but a lot of other things that I would enjoy talking about. Also if you're interested in learning more on your own I know of some reading material. And find me on Twitter. Tell me what I did right and what I did wrong. Anyway that's it. What have you guys got? Any questions? So repeating questions I guess right. The question was around how do we deal? We can specify English or not but how do we deal with unknown terms, different languages, jargon terms, stuff like that. The easy answer is you still just say it's English if it's basically English and you still get the ability to split on white space and all that stuff because that's presumably where you might come from. I'll go to the extreme in a second. And you still do stimming which means if it's like maybe a verb but it's a verb I haven't heard before stimming actually does pretty well for English like things. I mean but if you're willing to put the work in it you have an arbitrary amount of control over what you can do. So at the other extreme end of things I mean I guess you could write your own Java. It's all pluggable. It's just Lucene, Java Lucene. You could write your own classes to do whatever custom logic you want. If you don't want to go quite that far there are other kind of middle ground things like synonyms. You can say as a preprocessing step before you do the stimming chop off and throw it into the words you can say here's a file of every jargon word you might see and you can either say don't touch it for the downstream stuff or you can say this maps to three other words or these three words map to one word. So there's a lot of flexibility about what you can do to tune that relevance notion but it might be a lot of work. He had a question first. You, yes and no. Part of that is the not only do we hide the term frequency that counts for each one of those terms that they occurred in the documents but we also hide a few other small things that we stick next to the tokens. We hide its position in the document which gets to your answer about phrases and you can also hide, there's a couple other things that aren't used as often but like you can hide part of speech there if you have that set up and you can hide a payload which you can do whatever you want to if you can boost on documents that have certain words in it, a little bit higher but it's still there. One thing you can't do though is make a search and reassemble it into the original document from this data structure. That's why whenever you store a document elastic search it gets shredded, turned into that and at the same time you have a different file on disk that's holding the memory that reads the original document out. So you're effectively storing it twice every time. A document at Eventbrite is an event and it has what I call the boring fields that are expected, the name, description, the date, geolocation which actually that gets interesting. But we also have, this is in progress but we're working on interesting fields like machine learning things like event cluster that we can later match up with a user cluster that comes in or event quality which is another thing that we're inferring from the metadata around it. So those are all things that elastic search is happy with dealing with and then there's not too much more than that that's mind blowing from departure from what I showed here. It's elastic search. This is a data search. It's a, yeah, JSON record that's, we do exactly that thing with it. Hand it to elastic search and make it variable. Elastic search stores both, like connector type things. We rolled our, when I was in solar land, the predecessor sort of to elastic search, there was a lot of plugins where you could connect them. I forgot what they called, pipeline thing, ETL. There are some of those for elastic search but we ended up just rolling around because we wanted to control over it. So not a very specific answer, one more. I think he had his hand up first but please talk to me later. So James, really cool question. A really cool benefit of elastic search is it's a write only index. So segments on disk effectively are never touched again but the caveat is when you actually change a field what you do is you go back, find that record where it used to be written, read out the entire document, change that one field and write it to a new segment file and the only place you can change the old file is you mark one bit is dead. James done it. So not great but it's a trade off. You get benefits for treating it that way. Definitely not a table scan. It's still pretty quick. Cool, so I have exactly zero minutes left. Please come back, talk to me later and thank you very much for coming.