 Good afternoon everybody Everybody still awake Good, let's let's see where we can take the revolution evolution thing. So I'll start with the revolution and we'll get to the evolution afterwards. So Kind of that the current state of data stores is Maybe some of you know DV engines calm They have like the tioba index for programming languages. They do the same thing for data stores And it currently the February edition looks something like this Oracle is still sitting on top Then we have my sequel We have MongoDB as the number one no sequel store and then we have following Redis elastic search Cassandra and I kind of take you through the story a bit like how Elastic search got up to place nine here. That is kind of the Revolution part and for the evolution part we'll do some hands-on demos like what we've been trying to change recently and Yeah, you can always discuss about this list. I think it's pretty reasonable except for this one here I'm never sure why this is on here, especially if you see stuff like that The statement is too complex. Yeah, I don't think it qualifies as a data store. Um, but anyway, so Who uses elastic search? That's a good number. I like to see that Who is already on version six? Very nice. Who is still on version two or one? Okay, I'll try to give you some motivation today why you might want to upgrade as well. So we'll see Why am I talking about that? I work for elastic the company behind elastic search and the other open source products We have I'm part of our infrastructure team. We do stuff like the docker containers Internal testing any automation clouds and I always say this is a unix pipe I kind of pipe that into developer advocacy. So I try to talk about the good stuff that we do so How did it all get started? Shy who started the project and is now a CEO he Started blogging at the do the bives comm That was his blog and it's still up and he has a blog post how it all got started and in the beginning The product wasn't called elastic search, but it had two predecessors which were called compass and how he got started with compass was his wife wanted to become a chef a cook and she moved to London and as a good husband He tried to help his wife and she had lots of recipes and he wanted to write some system to search her recipes and he kind of over engineered that because That's how he got into the full text search stuff and she's still waiting for that recipe search by the way And it's kind of the running joke internally when he will ever finish that but he says like at the moment He's a bit too busy But maybe at some later point we will somebody might do a recipe search for him so we can kind of move past that and He started off with compass one and then he totally rewrote that calling it compass to and instead of having another rewrite Calling it compass three. He called it elastic search afterwards and everybody knows three is kind of the lucky number So that one stuck and that is what is kind of widely used to demise most widely used search engine nowadays So that was the initial logo our Photoshop skills have slightly improved since then But this was the initial elastic search logo and back then shy did everything he wrote the code the documentation the website He answered all the questions At my previous company we always called him the search beast because he was always doing something about search and he was super productive and Yeah stuff developed pretty well and you can see this is the the fiber version and he gave the product the tagline You know for search It's also in the title. That's kind of like the the core of what elastic search is about or at least where it got started So if you're searching anywhere, there's a good chance that elastic search is kind of doing the search for you If you're searching on any of these sites behind the search box, there's elastic search doing the actual work for you We're not responsible for the actual result kind of quality of those. It's kind of an implementation detail then Anyway, so this worked well and then people figured out that there is more stuff that is actually a search problem So people figured out we want to have some visualizations of the data We have in our system and then people came inside like oh, we want to put logs in there because having locks is kind of a search problem as well We want to collect our locks somewhere put them in a system and then be able to search for all the arrows or whatever happened in the system and Then kind of elastic search joint forces with Kibana and beats Kibana being the visualization Sorry and lock stash lock stash being that Part to get data in and together they formed the famous or infamous elk stack You can see elastic search lock stash Kibana And yeah elk you get it So that was the elk stack that is also working very well and is widely used Here just three very common examples who are using the elk stack for log aggregation or security analytics So Mozilla has a an open source product around Security analytics or seen slack is using it internally for all the log aggregation and blizzard if you're playing any blizzard games They're also doing or aggregating all the events, you know in the stack somewhere and that was working Well until we added another component. This is beats It's kind of like a lightweight agent or shipper or forwarder written in go because lock stash was Ruby and now jruby and is always kind of heavy and telling people if they're not a Java developer to put the JVM on Their notes just to collect logs That wasn't really what made people happy and then kind of the elk had to evolve and It thought about it and the elk had to develop and then it said like maybe I'm an elk bee personality Because you see elastic search lock stash Kibana beats. That's the elk bee or Sometimes we also call it the belk and you can see it's the bee and the elk horns However, since we're always about scalability Even marketing figured out that this is not very scalable because what happens if in the future we add another open source product then we need to probably add another letter and then we need to make up another animal and eight will get harder to make up animals of more letters and be then you need to do the rebranding and rebranding is always like I always have a feeling rebranding takes like ten years So still a lot of people are saying elk stack and it's totally okay. We get that We are just trying to push the name a bit forward and what we've gotten to is now We just call it the elastic stack because that's super scalable whatever open source product we have we can just put it into the elastic stack It will still be the elastic stack So that's why we call it the elastic stack and that was kind of the revolution So now it's time to get to that evolution. What is going on in the evolution? Before we dive into the demo just to make sure that everybody is familiar with a few terms Cluster is like all the different nodes working kind of over the same data So you can query that cluster and it has spread out the data internally and you can query it in that cluster I know it is basically one JVM process of elastic search running as part of that cluster and doing the actual work Then you have an index an index is basically a collection of stuff that is kind of similar and belongs together in the past We have said it's similar to Yeah, maybe a table in the relational world But that is a very bad comparison that we don't want to say that anymore But it's something kind of some things that belong together Every index is then made up of shards shards are just like split up parts of an index And that one is actually then Apache Lucene writing the data in the background to the disk Because often people ask like what is the data store behind it? There is no other data store involved Lucene is actually writing the data to disk and doing the kind of heavy writing reading querying and elastic search around it provides the rest API the query DSL and all the Sharding and replication of the data The smallest thing you're normally writing is a document and every document has an ID and that ID Will be hashed to make sure or to know when to which shard We will write that document and then you can search over it So these are kind of the base terms if you know these you're pretty much good for that demo. So We kind of like to compare that to a person growing up that Software is also like a person. It's also growing up in the beginning It starts a bit like a toddler and has a lot of potential and can develop in a lot of directions And along the way you learn stuff when you improve and it's a continuous learning process And I'm I'm not exactly sure at which stage elastic search is right now If you're a kind of teenagers or if you're kind of like in the 20s or 30s But it's kind of developing and that evolution part now We want to see a bit of that evolution of what is going on in elastic search So the first thing we have learned is Strictness in the beginning when you start a new open source product What you want to have is you want to have it to make it easy to use that product And that's what we did like whatever data you gave us we tried to store it Even if there was some syntax error, we tried to work around that just to make it easy to get started But as your systems mature and you get more production users at some point you start valuing more strictness so you catch errors early on and There might be a bit more pain up front and it might be slightly harder to get started But it will take out a lot of the pain afterwards Yeah, so if you're doing bad stuff Don't do that. We have learned we're trying to avoid the bad stuff that will tell you upfront. That's bad. Don't do that So the first thing I Can quickly show you I'm curing everything to through kibana because it has a kind of nicer way to query stuff But you can totally do the same thing With curl as well Is that large enough for the last row to read? Good perfect So what I have here is I have the latest five Version of elastic search running. It's five six. I have three nodes Which is a bit hard to read here, but you can see I have three notes here elastic search one two three One is the master note that is this little star here And my cluster is generally happily running. It's in the green state Everything is good and working well. So the first thing I wanted to show off was this strictness Can you see the typo in here? Yeah, it's hard to see and even for us It was often hard to see typos like that and in earlier versions What we would do is if there was a typo and elastic search didn't know a specific parameter It would silently ignore that and we would do the same with configurations And if you have a hundred fifty line configuration file and you have some typo in there That is very Yeah, I'm going to try to find out. So what we've added now is if Something is misspelled elastic search will actually tell you what is going on and thanks to the Liebenstein distance We can actually tell you on this here. We don't know but there is something else that we do know and that is maybe you mean tokenizer and tokenizer if I can spell it It is Deplicated to use it like that, but now at least it works and it will tell you okay. This is the right way to do it So strictness we kind of learned that this is very helpful if you actually show the arrows up from the other thing We have added in five was the so-called bootstrap checks So if you have like a note that is badly configured and we know that you will have problems in production We will not start up that note as soon as it can form a cluster We always assume if you are just running on local host and you're not clustering it or there's no way to cluster it is a development instance We don't really care, but as soon as you can form a cluster We assume this might be a production system and we will actually tell you there is something very wrong here For example, you don't have enough file handles or you're using a specific Java version But we know that the garbage collector might add corruptions to your data And we would rather fail early and tell you upfront like hey, this is not working do something Then fail later on and lose your data and then everybody is super pissed And also our support is happier if we can avoid stupid mistakes early on so These bootstrap checks are actually making sure you kind of avoid common error scenarios And there is no way to circumvent that because at first we thought well We could add the flag like override that I don't care But then what everybody would do is just run in production with the flag like don't care And we wouldn't have gained anything. So there's no way to avoid that bootstrap Checks are here to stay and they will be enforced Okay, what if you want to upgrade? Let's upgrade to version six. So We'll try to do a live major version upgrade and I'll actually upgrade to an internal build which is not yet released So let's see how that goes. I guess if that works today, we can actually release it to production Okay, so the first thing you want to do is we have for if you want to migrate from version five to six We have a new tool which is called the upgrade assistant and the upgrade assistant actually first. It tells you please back up your data Always do that. I will skip that since I don't have any proper data in my cluster I will skip the backup But if you do that in production, please, please do it or don't complain to us at least Okay, and then we have these cluster checks and the cluster check is basically telling you here Something is wrong here I have in that dot Kibana index, which is kind of like an internal index Storing what is going on inside or like what is configured in Kibana and that needs to be changed And it could either forward you to the documentation to tell you which commands to run Or we have this reindex helper here, which can actually do that for you For example, you can see here that Kibana index we can reindex and it will automatically fix all the stuff You need to upgrade in the background for you And it told you okay I have done that if you refresh that here you see we don't have any indices which need an upgrade anymore And if I go to the cluster check you can see okay, everything has been done as we need it, okay So before we do the actual upgrade, I will quickly insert these three documents We'll come back to them later. So you might want to remember them them what I have here I have three documents. They're all in the index types. So this is the index types. They are of the type Type one type two type three and then they all have the ID one and we'll just store them We'll get back to them. We'll see another thing that we have changed and we need those later on for them So let's insert our three documents. We've done that and now we can actually start upgrading stuff So I have this running as a Docker container And we'll simply change in the dot-n file. I will switch over from the 567 version to 620 which is not yet released like the current stable version is 613 but 62 will also be released relatively soon But you know you always need to test your upcoming releases Today is a good opportunity So we are changing that to 6.2 and what I would we'll do then is I would basically shoot down the node 3 I have here and this will be replaced with a new version So let's run this it will basically kill the node and recreate it in the background And you can see here elastic search 3. Let me scroll up here exited with code 143 so this was just killed and It got a new color. It's starting up again in the meantime. We can keep querying that Cluster so you can see right now. We only have two nodes in the cluster One and two which are 567 and you can see elastic search 3 already came back up and joined the cluster and this is now in version 6 and While this looked super simple This was one of the main pain points we had in the past because a major Version upgrade always meant taking down the entire cluster and then upgrading all the nodes and then restarting them And now if from 5 to 6 we have this mixed version. So you can do an Upgrade with a mixed version 5 and 6 and we can just rotate one node by the next until everything is upgraded to version 6 And in the meantime, let me quickly kill the node 2 And I can show you that we can still We can keep querying the data in the cluster so you can run this you can see right now I have one and three one was the Or is the master node that might take a little longer when we kill that one then And you can still read your data and write your data even though one of the nodes just was kicked out of the cluster and is being restarted and If I entertain you long enough that node should join the cluster soon again, hopefully Yeah, so you can see two and three are already on six to two So we can now upgrade the final node The one thing I need is I need to copy out the curl command Kibana is always connected to one single elastic search node So when I upgrade the elastic search one node, this is where who Kibana is talking to Kibana will not be usable. So we need we'll need to fall back to curl in the meantime. So Let me kill Or upgrade the elastic search one node You can see here Stuff is happening in the background When I try to reload that page it will fail because well, I cannot reach my elastic search node anymore But we can still check on the command line. So if I run the curl command You can still see the cluster is still working as expected Elastic search 2 has become the new master node now so The ports were nine thousand two hundred one was elastic search one the two is elastic search two and three would be elastic search three So this is working and we're basically waiting for the node to join back. So it's back We can switch back to Kibana Let's refresh that one. We just need to scroll down here to the right place where we had that Yeah, our data is still there the cluster has been fully upgraded now The remaining thing we need to do is we need to upgrade Kibana So let's run that I'll just kill the Kibana node and Everybody who's complaining about Java taking a long time to start up again. Let's wait until that node process comes back Because I always have the feeling that That one node thing is taking up the most time. So we have killed Kibana Kibana will be gone if we Go here page is not available since Kibana is starting up again The cluster in the meantime is happily working in the background. So you can still run your queries. Just Kibana is not available and you can see it's still it exited and It's starting up again and we will take some time until I don't know a million npm dependencies are being loaded. I Still left the Kibana team, but This might take a while Anyway, so we are fully upgraded the the major version now and now we will be at version 6 And you can see Kibana also slightly changed its color I always say 3 was black 4 was white 5 was colorful and 6 is blue But I have been told that this is much better readable if you're visually impaired or a color blind And like contrast are better. So this this is the new thing. So this is what you want And you can see major version upgrade done Everything up and running no problems there. So that was surprisingly easy. Let's continue with other features Yeah, so this was basically the upgrade. It's the train is running. You're just laying the tracks while you run on them That's pretty much the rolling upgrade we've added. So that's a kind of nice feature so Other security things In the context of not kind of destroying your cluster are flood stages if you have used previous versions before six There were two flood stages So flood stages pretty much are you don't want to run into something and Everything falls over because you run out of disk space So we always had the low and the high watermark we called them So the low watermark was basically we would not allocate a new shard on a note If more than 85% of its disk space were used up and once you have reached 90% of disk space of a node Would try to actively migrate shards away from that note problem is if there is no more space in the cluster It cannot go anywhere else And what we didn't do is we never stopped writes So if you kept the right thing to the same twist to a shard That was on a note at some point we might run out of this and I guess everybody knows if you run out of this It will be a pretty shitty day And we're trying to avoid that now So we have added the flood stage the flood states we will basically reject your rights once you hit This is the default setting more than 95% of the disk space use So if you only have 5% of the disk left, we will reject your rights Rather than potentially corrupting your data and I can very quickly show that as well Let me scroll here. So I'm adding a new document into the index my flood And we can check the how much disk we have left you can see here We have total bytes free invites. We have all the statistics and what I'm setting now is That laptop has 250 gigs of disk So if I set the flood stages to 400 350 and 300 gigs I will have the hit all the flood stages immediately and That will refresh after 10 seconds. So let's apply that we have set the flood stages. I can still read my documents We need to wait for 10 seconds until this setting is applied But my right should now be rejected So if you're right here, it will actually tell me it was forbidden You can only read data or you can delete data, but you cannot update or index new data So those are not allowed because otherwise you might run out of this space and corrupt your data. So we kind of turned that off Let's revert those settings So in version 5, I think we added that to just set something back to the default value You set it to now and all the settings will be reverted. So Should this command now work if I try to write another document in there any guesses who thinks this will work Who thinks this will fail? The rest are undecided Okay, so the thing is once a class or a node once you So if a shard of an index has been on a note that has hit that flood stage watermark We will lock that entire index and you will need to unlock it to re-enable rights So if I run that it will fail because it's still in the locked state I need to unlock it by setting that index blocks read only allow delete. I Reset that basically once I reset that then I can write documents back into my index So this is a little trap you need to be aware of once you hit that flood stage watermark. You need to kind of Re-enable the index for writing again Otherwise, we will not allow rights Okay This is one of the things that will protect the data in your cluster even though it will reject new rights But we think this is kind of the right trade-off The next very big feature we have added our sequence numbers sequence numbers are basically keeping track of every change you do in your data and add some Sequence number to that and this is actually surprisingly hard So let's see what this is giving us just to give you an impression of what is going on behind the scenes So we have the primary shard here and you're writing data to the primary shard and then that data is replicated We have two copies here. We are replicating that data the yellow line is a local checkpoint and With every right we kind of piggyback on the acknowledgement back with the local checkpoint and then we can advance the global checkpoint So you can see we have written Two in three now While writing it out and the acknowledgement back is basically telling the primary shard that those have been Acknowledged and we can advance the global checkpoint and now we write four five six and Then the primary shard dies or the node with the primary shard dies and the replica shard one Gets five and six and the replica two gets four and six This one is promoted to be the new primary. It has never seen the for update So it tells the other node get rid of that for update and only apply the other ones So this makes sure all the data is synced up and we don't have any phantom rights or stale rights This is China for the integrity of the data a very important thing and it makes keeping everything in sync much easier And you can actually demo sequence numbers, which sounds kind of hard, but it's actually not that hard so I'm creating a new index This new index has one primary shot and one replica shot. How many shots do we have in total? Who is for one? Nobody who is for two Who is for more than two? No, it's two It's one primary shot and if we say a replica we say one other copy so it's one primary shot and one replica shot and Since I need to kill notes again, and I still want to keep using Kibana I say this data cannot be allocated to the elastic search one node because Kibana is talking to that specific node So this index one primary shard will be on the elastic search Or one one shard will be on the elastic search to note and one shard will be on the elastic search three node But I can show that in a moment. So Let's insert that data It has been acknowledged and then we can actually check out. Okay. We have Two shards as promised you can see that primary shot that is that P here The primary shot is on elastic search three and the replica shot is on elastic search two remember that Elastic search three is the primary shot. We will need that knowledge in one minute and I will probably forget it. So I will ask you And then you can start inserting documents and if you have seen inserts in previous versions This block here looks very familiar. This one here is new So we have a sequence number and the primary term the sequence number is basically the number of right operations So if I do another right operation, this will be incremented here The primary term will change every time the primary shard Changes so when I kill the elastic search three node later on that primary term should change So let's do some right operations. You can see the sequence number keeps changing That's all easy if I take a specific document here, for example the document one and I insert it It increments. Do you think the increment also increments when I do the same operation again? Yes, it will because it's just an in-place update. So this will replace every version If I delete that document one again, if I delete it, okay, it increments the counter if I delete it again Nothing changes. Will it increment the number? Yeah, it does So we are keeping track of all the changes that you send to the cluster even if we have already applied them We don't really care about what they do in the kind of like in the effect afterwards. It's just like counting these operations Okay, which one was the primary shot Three right, so we will kill the three node Since this is randomly allocated. Let's kill the three node and Then you can check and you can see now the new primary shot is elastic search two and the replica shot is unassigned Why did we not allocate it on the elastic search one node? Because we had the chart filter where we told it it could not go there Why don't we allocate the replica copy on the elastic search to note as well? Because we don't win anything. We never allocate the replication on the same note because if that knows note goes down Both the primary and the replica shot will go down and you have just wasted half of your disk space You would not win anything So that was easy we can keep Indexing new documents. Let's keep track one two three four five I have inserted five new documents, which was only going to the primary shot now. Let's restart that note It should join as a replica note again So it will take a few moments until that one comes back up So right now the replica shot is unassigned But in the moment once that note comes back up the elastic search three node should be the new replica shot and elastic search To will stay my primary shot And Let's hope the demo gods are with me Let's check. Okay. It's a started. So this should be good. Let's check again Yes, so you can see elastic search Three is now the replica shot everything went as expected and now comes one of the big improvements of this approach when I run this command here, which says Tell me how we did recovery wise. It will tell me Scroll this over a little you can see elastic search to to elastic search three Recovered five documents those were exactly the five documents I inserted while the primary shot was there But the replica shot was not there and what we were doing in five or in elastic search five and earlier versions was We would do a file based comparison so we would compare the Lucene shards and since we were writing them independently these Lucene shards were often totally different and We would basically I think take a hash of them And if the hash was not the same would just copy that file over so if one node was just kind of leaving the cluster for a minute You might need to transfer gigabytes of data for no good reason other than we didn't have a good comparison way or Way to replay the missing operations with that transaction lock We're actually keeping track of all those right operations and we can recover and replay just those five operations that we have missed So that will make flaky nodes or just adding nodes that have been Disappearing for a short amount of time much simpler and more performant Downside is we need some more disk space because we need to keep track of the transaction log Keep that in mind when you kind of plan for the disk space you have okay Let's delete that index and recreate it this time. I am setting it to 10 shards and one replica. How many shards do I have in total? 20 hopefully Let's see if I check the number of shards and Make this a little smaller. You can see 20 shards 10 primary 10 replica ones And now we can keep adding documents and the confusing thing is keep track of the sequence number for a bit Now we are at zero if I run that again zero again 0 0 1 0 0 0 1 2 1 Any ideas why did this might happen? Yes, the sequence numbers are per shot So every shot keeps track of the operations if I say this is the ID I want to insert it the ID will be hashed it will always land on the same chart So if I keep using that one it will always go to the same shot and this will be a nice Increment like it will just every operation will go to the same node and will keep incrementing So that's easy So both the recovery is much more performant and this feature will also allow in totally new features in the near future Yes, the other thing Especially postgres people often ask like what if you reach kind of the end of that number because postgres has this concept of Is it the transaction ID? I think Which is which is an integer and if you hit that limit your cluster will go down in a very bad way So people are often concerned like how do we roll over that number and We don't for two reasons a this is not a global number, but this is per shard So normally you will have quite quite a few shards on one node So there should be some room and be this is not an integer This is a long and as Bill Gates would say is 63 bits is probably enough for your increments for a long time And we don't plan for anybody to or we don't expect anybody to run into that I don't know how many right operations you would need to do for a few years to run into that, but It's definitely a lot. We don't see this as that is a problem And the other thing that we will or this feature will allow in the near future This is under heavy development. So I'm not promising any version numbers It's six of X or it might even be zero seven dot X We will see is cross data center replication because we have the transaction log and we can then just replay those transactions to a different data center Without adding any big latency in the communication. So this will enable great new features in the future Now this is the thing that a lot of you might have seen and this is one of the main hurdles for upgrading or perceived hurdles Is that types are going away? Who heard that types are going away in elastic search? That's not that many so let's let's try to clear that up and see how we can do that as you can see in the version number, this is a Very long-term thing We try to make this as painless as possible and it will take multiple years until we finally reach that But first off why are we getting rid of those types? The thing is? We kind of lied Because this type thing never really existed We made it up. This was when I said at the beginning we in the beginning We tried to say well, it's very similar to relational databases because an index is like a table And that was just not a very good comparison So and then we had the type Do which you could have in the different types? And it was just wouldn't match together Lucene doesn't have that concept at all and you would run into three specific problems If you were using multiple types for index the first thing is Since Lucene always sees the field name or the data types for one specific field name in one index need to be the same So for example, if you have a user which has it which has a field disabled And you think in one type it's a Boolean flag and in another type It's a date when you disabled the user this will not work because it needs to be of the same type Secondly sparsity is not that great in Lucene Lucene seven, which is an elastic search six improve that But it's still kind of a thing and finally scoring always works Kind of the quality attribute of searching always works across all the types and that was also confusing So we're trying to get rid of that artificial concept of types And this is actually the plan of what we're doing in 5.6. You can opt into the single types So there can only be a single type for index that is you can actually enable that in the configuration in 6.x We will enforce that so by default every newly created index can only have a single type anymore If you import an index from the previous version it can still have multiple types But any index you create can only have a single type here in Seven the type will become optional in the API since we assume that you have only a single type there It can be optional and in version 8 the types are finally gone So this should allow for kind of like no hard upgrade Changes or like no breaking changes in the upgrade path But it is a kind of like a multi-year process and we will try to get there. So how do you upgrade stuff? You remember the three documents I have inserted in the beginning I had in the index types I had the types type 1 type 2 and type 3 Since they were created in Elasticsearch 5 you could just insert them and everything is working as expected Now if I create a new document I create in the index no types Underscore doc this is the one we are kind of proposing you can't pick any type It just has to be a single one, but the one we are proposing is to use underscore doc Since this is kind of the default we will expect in 7 then So I am inserting a document with that type doc here What will happen if I try to insert that document now as well? Then I will run into the arrow that it says like oh you would have multiple types in that index We already have underscore doc and you want to add that other now. This will not work. I'm rejecting that And the other big question is how can I actually migrate my existing data to that new? Single type approach per index So what I what you do is we have that nice reindex API where you basically can take documents from one index and Replay them into another index and we can also change them in that replay thing So I'm taking my documents from my types index and I replay them into the node types index And I also run a script against that where I do the following change I change the ID to type plus ID since all my three documents in like each of the types had the ID one I need to do that. So I need to concatenate type and ID to get a new unique ID. I Say the field underscore type which was this kind of magic field. I just transfer that the custom field type So this is just like any other field, but you can still efficiently filter on that for example, and I say The new type I set for all the documents is underscore doc That is the one that I had when I created the index. So when I run that it will Insert my three documents that I had in the Types index and if you search in that node types thing you can see it has all the documents that we had before So it had the node types doc that was the one I inserted But here you see in the node types I had that one that was the type two with the ID one and It's replayed that and now everything is down to a single type And that is how you can migrate your data So either you have like some temporal data, and you just throw away the old data, then you just switch to a single Type pattern or if you have like more of a search use case where you keep your data for a prolonged time You can use the renex API to just replay stuff to see how it's going Okay, and we are down to nearly the last one already These are just two things without the demo. These are Nice performance improvements. We added the one is automatic queue resizing. So what automatic queue resizing is on When you do a lot of operations, they will queue up and we wanted to have a kind of like a way to Guarantee that your queries will respond within a certain amount of time and either they can be done within that certain amount of time Or they will be rejected and this Clearly resizing actually make sure this happens. So for example here. I say The target response time for my searches is two seconds And if you are currently processing 50 requests per second your query depth You sorry your queue depth can be a hundred elements And if you try to add 101 elements that 101th element will be rejected or hundred first element will be rejected So we will rather reject the right then queue it up for a very long amount of time And then your client can decide what to do retry that operation or do something else That is the adaptive queue sizing where you can basically guarantee a response time Rather than queuing up. We will reject an operation and you can then handle that in your application And the second nice thing is adaptive replica selection right now if you search for data You can either go to a primary shot or any replica shot and we will just randomly selected and pretty much round robin between the shards however, what happens if one node is busier than other nodes and The adaptive replica selection is exactly doing that. It's based on a very nice paper paper Waiter actually found out how you can kind of weigh how busy and notice and you will send requests always to the least busy node So your queries are running faster So yeah, you have this exponentially weighted moving average And basically when doing your requests the node will tell you in the response how busy it is And then you can your queries can select the least busy shard when doing queries And we've done some benchmarks In most of the cases Even for the 50th percentile It's improving but at the late or at least in the 1990 percentile all the scenarios we tested If you enable that setting it will improve In 6.1 at least this is disabled by default But you can enable this feature to kind of try to pick shards more cleverly The other thing is shrink and split we haven't had that for a long time And we basically said we will not do split ever. We changed our mind. We just added split in 6.1 So let's take a quick look what they are doing. So shrink is basically you have too many copies and you want to combine them And hopefully stuff works out and it doesn't end like this So yeah, how do we do the shrinking? That's not what I wanted Yeah, the shrinking is basically you always combine a number of shards by a factor Is 5 a good a number of shards by default then Not really because 5 is a prime number. So the only factor to which you can go is Yeah, divide by 5 down to 1 So Let's quickly demo that So we have an index called shrink since I have Five shards by default five primary shards by default It will create that for me if I show the shards You can see I have five primary and five replica shots. So 10 shards in total and then I say Before you can actually do that You need to tell that index that all the shards that you want to combine need to be on a single note So for example here, I say all of the shards need to be on the elastic search three note And you cannot write to that anymore because what is happening in the background is to make that operation very quick We're basically hard linking on the disk all the different shards that we have So that operation once you have everything on a single shot is super quick. So let's run that We have done that We can show now here I'm sorting that by shot and you can see every shot is either the primary or the replica at least one of them is on the three note now and Now I can then can run that operation where I say take the shrink index run Underscore shrink and write the results to the shrunk index So the data will then be written to the shrunk index and go down to one shot So it runs and I can then query the shrunk index or here is the number of shots So we have one replica one primary shot and they can go to any note again now elastic search one or elastic search two are fine If you query that you can see we still have our data and we can also go the other way So we can also split data Splitting pretty much looks like this And that's why we always kind of said this is a very violent operation and it's very heavy And that's why I don't really want to do that But we have now made some preconditions The precondition is you need to define the factor into which you can split it up front And we kind of synthetically have these different charts than up front But they are kind of packed together. So you don't have any overhead for them in the beginning But you can split that up very efficiently and easily afterwards so Just to show you what this looks like is here. I create my index split with one chart but in the background this has 20 of these to be split up shards So we have created that one. I'm inserting a document and then I say This has one primary and one replica shot. This is what you expect I block rights again You always need to block rights when you do an operation like that and then what I'm doing is I say take the split index call the underscore split on it and Store the result in the split in five index and here I say the split in five index basically has five primary shards so if you run that and Then check the shards you can see now we have five primary and five replica shards and The document we had in there is still available and this is how we could break it up. So that operation here the Index number of routing shards pick a good factor that you can split into a lot of different indices or number of indices Which you might be interested in don't pick a primary number here That that's not not a great choice Okay, finally benchmarks benchmarks are kind of an ongoing theme This is my favorite comic whenever I see benchmarks So whenever you see a benchmark of somebody doing a benchmark against themselves and some of their competitors This is pretty much what they are doing They pick some some similar conditions and their own product is much better in that those similar conditions than the others It always reminds me of I think two years ago So MongoDB Cassandra and couch base each made a benchmark against their two competitors And each one of them did that and each one managed to find some scenario where they were at least twice as fast as the other two And everybody did that within the span of three months or so And that's pretty much that the value of all the benchmarks What you should do instead is benchmark very heavily internally. We're doing that to avoid the slow boiling frog problem Everybody aware of what the slow boiling frog problem is If you throw the frog into the boiling water It will jump out because it knows something is wrong if you put the frog into the cold water And then slowly turn up the temperature It will not jump out because it doesn't recognize it and you have the same thing for benchmarks So you need benchmarks which tell you okay This change made everything 0.5 percent Yes, lower and this other change made everything I don't know 1% slower and over time these would accumulate So you will need to very aggressively benchmark that you don't have this slow boiling frog problem that you don't even recognize what is going wrong And we have written our own benchmarks and we or benchmarking tool and we even publish those all the time So you can see what every single change is doing to the performance of the system over time So to wrap up we have seen quite a lot of stuff in action It's kind of the growing up of elastic search. We got more strict We tried to make upgrades less painful. We tried to make your data safer And we tried to add new features like the sequence numbers with the cross data center replication Sometimes we have to make hard changes like removal of types, but in the long run it will be a good change Because we need to clean up that lie basically And then we have seen some performance improvements and shrink and split those were two things that people requested for a long time And especially the split days we slightly changed kind of the preconditions and that now enable the totally new feature for us and That's pretty much it. So I Always take a picture with you so I can prove to my colleagues that I've been working today Because nobody knows where I am normally I'm just doing conferences. So Everybody say elastic search Very nice Do we have time for questions? Okay