 Yeah, there it is Okay. Hi everybody Michael talked to me a couple of weeks ago like hey We are starting the Java user group in in Singapore So maybe it would be nice to to have you here and and talk about the elastic search so I work for elastic search and We recently released the last version the version 5 and We consider okay, maybe what we can do is we can talk about the new features That wins reducing version 5 Before I move on Is somebody here using elastic search? One guy a couple of guys were there. What kind of use cases? I know you are using Search login. What kind of use cases? Okay, login use case. Okay, so you collect logs. I guess from different systems applications and ship it to elastic. Okay Logging also use different use cases. Yeah Any other feedback, okay, good to know and Which version of elastic search are you using are you using version 2 1 5? to okay To okay, that's good. It's pretty okay. So let's see if we have Yeah, we have some of you. Yes, that's good So let's this is version 5. So this is the corporate video, you know the company It's big right everything about it is better. So five though means Well, it's Incredible what we have accomplished it the entire elastic stack has evolved so much and become so much more consistent And easy to get started. I think the initial experience is just going to be one of pleasure Five I know is going to be the first really big step towards unifying our stack Previously we had, you know, you had Kibana 4 and Log Stash 2 and elastic search 1 It was difficult to figure out what would work what 5.0 represents a way of helping show that we are a unified suite of projects we have Log Stash, Kibana, Deeds, elastic search And this is the first time that they all kind of feel like they're one product We have the same version but we do the releases at the same time So that we know that they work together get tested together and it's a more integrated experience This all looks at feels like one product. So to me that's really exciting 2.0 was a very big release that fixed a lot of stuff, but didn't add that much and 5.0 is different because we're building on the foundations that were laid in 2.0 But it is so full of features so full of cool new stuff that I think users are going to love it The amount of effort that I've seen going to 5.0 is amazing And I've gotten so used to it whenever I go back to 2.0, which really wasn't that long ago. It was a year ago I'm like, ugh, what am I doing here? And so now that I'm on 5.0 I can't wait for other people to experience it because it'll be like only much better product for them So this is like a quick summary what we have been doing in the last time So let's let's talk about this. First of all, who am I? As I mentioned, I work for Elastic. My name is Matias, and I am originally from Argentina from Buenos Aires I have been living in different places for the last around Eight years or so. So I spent a couple of years in Europe different places in Europe and now based in Singapore My background is software development. So I used to write application I used to write code for several years until I decided to switch to a different kind of role But still I write code And basically my toolbox I have Java, Python, Node.js. So probably those are my main programming languages when I need to do something I have been working with open source Either using open source or working for open source companies for the last eight years Before Elastic, I used to work for MongoDB. I worked for MongoDB for a couple of years also So my background is open source, strongly open source and With Elastic Search specifically I started to use Elastic Search approximately two years ago And I joined Elastic as a company like a bit more than a year ago My role here in Singapore is what we call Solutions Architect and the idea of Solutions Architect is a mix between let's say Customer-facing role and a technical role. So it's a mix between both Both worlds. What I try to do is I try to help users to help customers to be successful using our technology in different places in Asia-Pacific Michael tried to arrange a meet a couple of times and I was always traveling somewhere. Sorry about that I would like to spend more time in Singapore. So happy to to be here tonight. I Like memes Everybody So this is what we call the elastic stack. I asked before Who is using Elastic Search in this room? But we also have other technologies like Kibana or Logstash or beats or we also have a cloud service Running in AWS by the way. So this is what we call the elastic stack We also have what we call this Expack that these are commercial extensions. These are paid. Let's say paid software on top of these open source stack But some people may ask hey if I have a if I pay to elastic Do I get something like elastic search enterprise version or something like that? The answer is no There is only one version of elastic search and it's open source We have plugins or add-ons on top of that, but the core is open source There is nothing like oh, I need to migrate. I need to go from open source to enterprise version No, just one version so Why we we have this big jump in the version numbers, you know We used to have version one one seven one something then we had version two to two to one to two To three to four and then we jump to version five Maybe you are going to think oh these guys they introduce a lot of new stuff They introduce a lot of features actually we did But not to justify three numbers, you know the reason why we have version five is that As I mentioned we have different components in our open source stack We have Kibana. We have logs that we have beats and some people suffer with this matrix of hail Like oh if I need to use elastic search to point something is compatible with Kibana for point something and beats One point something and that was a matrix of hail and was not easy for for everybody to use this so one of the things that we introduced Approximately six seven months ago was this idea of that we call internally unified release process and that means that we have all these Projects are open source projects totally different projects totally isolated. For example elastic search is written in Java Kibana is written in Node.js Beats is written in Golan Logstash is written in Ruby, but runs on J Ruby on a JVM So totally different isolated projects But one of the things that we wanted to have is a unified release process Means that everybody releases at the same time and the next step when we achieve that the next step was okay Now we want to have a consistent version numbering So if we release a new version you're going to have a new version of all the components So now you don't need to know this compatibility matrix of hail, you know, so it's much easier So the idea is we want to make that the life of beans, you know from pulp fiction easier So he's not confused anymore. And that's why we talk about version 5. It's the new the new Let's say consistent versioning across all our components. He's more happy So let's talk about the elastic search Who never heard about the elastic search? Come on. Don't be shy Nobody that's good. Okay. Good elastic search imagine some people If you search in Wikipedia, probably you're going to find something like is a search and analytics engine in real time. Wow. Nice You can imagine is it's like a data store. Actually, some people may use elastic search as a database I am not saying that's a good idea. Please don't get me wrong. But some people may do it so you can imagine something that you can store information you can retrieve information and A couple of interesting things to talk about First of all is it's very easy to use because the interface that the elastic search Exposes to the rest of the world is HTTP rest So it's very easy. It's HTTP protocol and basically what you are going to do when you store information Actually, the way how do we call that is when you index information in elastic search or when you retrieve information in elastic search What is going through the wire is a JavaScript object Jason? Through HTTP protocol, so it's very easy We also have some client libraries and Florian he's going to talk about that later that these libraries are going to wrap this HTTP protocol, but at the end is HTTP protocol And it's based on Apache Lucine Apache Lucine is is a information retrieval software library that has been around for more than 10 years now Florian yeah, so the very long run so was created by that cutting the creator of how do so he initially first created Apache Lucine Then he stopped working there and created how do And the idea is a low-level Let's say library to build information retrieval on top of that And what is another interesting feature about the elastic search is that the elastic search is is going to run as a server because it's a HTTP server at the end and When you need to scale you can scale in horizontal way So if you need to deal with more information if you need to deal with a higher throughput a Her number of requests per seconds when you need to scale the way to scale is you have more servers in parallel And elastic search is going to balance The information across these different servers. We call them nodes different elastic search nodes And using the same idea of multiple servers. We can also guarantee high availability So means that the information is going to have multiple copies in different nodes in different servers So if one server goes down your application can continue working because there are copy copies of this information somewhere else That's in a nutshell You can index everything that you want something extractor like when when you have well-defined fields or something an extractor like Full text for example, I have a bunch of text I want to index in elastic search and I want to have very fast Tech search you can also do it For example, if you are using Wikipedia when you go to Wikipedia and use search that search is coming from elastic search When you use Tinder if somebody here is using Tinder, it's not my case I am a married person, but if somebody here is using Tinder, then it's dinner is using elastic search for those searches So when you swipe left swipe right all those searches are powered by Tinder so It's it's some people may ask oh, but this is like a login solution is a security solution Not really it's it's like a data platform that can be used for different use cases And what you get at the end of the day you get very fast Searches and very fast aggregations when I when I say aggregations I mean imagine when you want to group information you want to generate some facets you want to Execute some arithmetic whatever very fast in real time That's that's at the end of the day. What do you get? If you ask okay, what is the difference? I mentioned before it's imagine it's like a database if you ask What is the difference between and now let's forget about the elastic search? What is the difference between a database and a search engine any idea? pick any database and pick any search engine any clue and Any thoughts around that think about when you query information What's what's the mean difference? Yeah, that Stealing elastic search you can have a structure you can have like a well-defined fields if you want Actually, if you think that the main difference is that in a search engine Everything is index All the fields are index all the content is index in a database you explicitly need to create the indices Otherwise your queries are going to be a slow So this means that in a database in some way you need to know in advance your access patterns You need to know in advance how you are going to query the information Otherwise if you don't have an index it's going to run a full table scan and again pick any database pick My SQL Oracle Postgres MongoDB pick any database They are going to run a full table scan because there is no way to find that information You need to know in advance the access pattern while in a search engine everything is indexed If you want to start making very complex search criterias up to you everything is indexed So that's that's the the the main difference So let's talk about Specifically about what what we introduce in version 5 For those who are already using elastic search May know that usually when you index text in elastic search Basically, there are like two approaches one approach is I want to treat this text has a has a value And I don't want to run any kind of let's say Process when I index that text so let me give an example if I'm going to index a Twitter user names I don't want to have any process like lowercase. I just want to index the string as it is and This is usually in the in the Lucene world. It's it's called a String that is not analyzed means that we are going to treat the string has a atomic value and that's it On the other side imagine if you want to index like a like a text imagine an article in Wikipedia Probably you want to do some pre-processing, you know, you want to probably lowercase all the text You want to remove the stop wars? You want to apply some some logic behind like synonyms Etc. Steaming etc So what we did in version 5 we make those two different let's say use cases very easy to understand And we introduce two new types The first case the case when we use the text has a text Without any kind of analysis or process. We call keyword and The text when you want to run some kind of analysis like lowercase removing a stop wars, etc. We call it text It's not only about this it's only also about how do we store these keywords and this text on this? So it's going to be much more efficient, especially when you store keywords on this the next interesting thing is It's about relevance so imagine when you go to Google and you search for something imagine you search for PayPal Probably if you search for PayPal the first link that the Google is going to show probably is the PayPal website Why because of that is because of the relevance in all these Results that you get from Google probably hundred thousands or probably more hundred millions of pages that contains text PayPal or something similar to PayPal The most relevant Result is of course the PayPal website And in information retrieval, how do you measure that? Usually we talk about scoring And the idea of a scoring is a number that we are going to assign to all the results That are going to measure how much relevant is a specific document for the query that you just execute Doesn't make sense So one of the things that we change in elastic search version 5 is we change The algorithm that we use to generate these scores So in the old days we used to use an algorithm that it's called turn frequency in verse document frequency TF IDF if you search in the in the docs and now we are using something that According to the information retrievers, especially it's a match a better solution a match. Let's say a state of art I don't know if you have any experience with VM 25 It's something that they again in this information retrieval world There is a lot of people that knows a lot, you know, and it's very low level. How do you measure the relevance? But according to these guys, it seems this is a much better algorithm that TF IDF Another interesting thing is that now in elastic search you can store That's the next slide. You're gonna store very long numbers And basically this can be applied well for different use cases like for example, if you want to store Time stamps with nanoseconds precision or if you want to store IPv6 All these cases basically are a very long number something that doesn't fit in an integer or Or a normal number. Let's bring that way So now we have support for for very big numbers and one of these of course is IPv6 we change Initially what one of the questions outside was a elastic search is a search engine But there are a lot of people that is using elastic search for logging or for metrics for different use cases, which is true And all these kind of use cases Sometimes a lot of the content that you index in elastic search are numbers Like for example, imagine the typical access logs from your Apache web server or your nginx web server you're going to have Response code response size Time stamp there is no text. Actually the only text is the URL probably And on the HTTP request type This means that a lot of people is using elastic search and it's putting a lot of numbers inside the elastic search So we improve a lot. How do we store numbers inside the elastic search? And to do this we are using a new extractor a new data structure is called block KD trees You can find very fancy videos in YouTube about how these these data structures work, but the idea is that Dealing with numbers in elastic search is going to be much more efficient actually right now. It's more efficient Because it's faster to read it's faster to to basically write read and also is a smaller on disk So everything is good We also introduce something called half precision numbers again, I mentioned that Some people they also collect the information about their servers or applications in elastic search Imagine if you collect metrics like CPU consumption memory consumption You're going to have numbers like for example a percentage the CPU usage is 37.5 percent In the old days you pay the price of a float Number that is at least four bytes depends on the implementation at least four bytes Just for storing this two digits precision number So we introduce something that if we call half precision numbers that basically are like a scale floats So it's a float, but internally we store it has an integer So which means that is again faster to read faster to write and a smaller on disk And we can also use this for geo Because elastic search is used a lot also for geo purposes when you want to index information that has a lot long Information so for example when you write over and you can find these videos made by the people of over talking about How do they use elastic search? They aggregate in real time By location, how many cars do they have in each location of the city and based on the demand? They can change or they can drive they can push the drivers to go to other part of the cities So for all these analytics in real time they use elastic search And as you may imagine the information has let long for all these car positions This idea of using block KD trees can also be applied to Geo information and this is a very nice example This is a video is also available in YouTube in this case We have the city of London and we have different points of interest and when the density of points It's higher So if I am outside if I close to Heathrow probably not so many things that are interested, but if I'm Close to the city center of London. There are a higher number of point of interest the size of the sales that Lucine is going to apply when store this information are going to be more dense are going to be a smaller So this is an idea to see how this is working So imagine when I start indexing all these boys at low level when there are a low density of points It's going to use a very big grid size and when the density of points Is higher so into use a smaller cell size So you only pay the overhead of a very small grid size when it's needed You don't pay a very small grid size when you are in Heathrow Does it make sense? So it's an adaptive approach based on these grids and that depends how many points are indexed in that grid in that cell actually And this is something that was also introduced in inversion version five Tinder again, it also has geolocation So same idea These are all public numbers. So if you go to benchmark dot elastic dot co is our website We have these nightly benchmarks That runs every night with all our actually here You have the different versions and you have the master version. So this is coming directly from github So the last source code and we run all these benchmarks every night to find any performance regression in case we Screw up something So when I'm telling you that is two times faster three times faster, you can go and check by yourself Also, the previous one was throughput. Of course the higher the better version five version two version one The time Yeah, this is because we Recently introduced a kind of semi-official docker images and basically is a docker image that internally has elastic search version five So basically what we are measuring is is if there is any additional overhead Because of docker, you know when you run elastic search because of the C groups Let's say it's not because of that because of the C groups nature that docker relies on So again, if you want to check more details on these go to benchmarks dot elastic dot co. You can check it up by yourself painless in elastic search when you Execute queries or when you execute aggregations or when you execute different kind of operations You can execute script if you want like if you want to have a complex processing or aggregate numbers based on a complex Criteria you can execute your own scripts In the old days, we used to rely on different scripting languages So for example in the last one that we rely was groovy But one of the issues with any scripting language that you rely Is that there is no way to run scripting language in a totally safe? sandbox mode You know if I if I'm running a groovy scripting engine Potentially somebody for example may open a socket or May execute something that maybe is going to kill my server or do something that is not allowed And there is no way we talk with these guys from groovy from the programming language And we asked all we need to sandbox groovy. We need to to make groovy completely secure Basically, they told us there is no way There's always going to be a way to work around and execute malicious code if somebody wants So what we did the only way that we found? To make totally sandboxes We create our own scripting language. Of course. This is not a general purpose scripting language So we don't want people using this scripting language for other stuff. No, we are not going that way This is a very small subset of a scripting language So you can get methods to access objects retrieve fields retrieve values, etc. Etc. Of course some Flow control structures like for loop switch, etc. Etc. But that's it But what you get at the end of the day is to two main things the first one is is much faster Because this is very basically a sandbox scripting Environment that only is going to translate this new let's say keywords new operations into pure Java And the most important thing is safe is secure if somebody wants to open a socket There is nothing like a socket in this scripting language. So cannot be done so the main motivation is it is not about performances is about being 100% sandbox Yeah So are you getting to define your language there? Yeah, yeah Actually in version 5 the default is going to be painless that was internal yet if you don't specify it's going to be painless Groovy if you want to use groovy you're going to still use it, but probably you need to Start in inversion to elastic search runs in Java runs using the security manager You know the JVM provides something called security manager and I can imagine that if you want to use groovy You need to provide additional permissions on the security manager The default is painless and and if you check the syntax the syntax is quite similar It's a mix between groovy JavaScript in some way So in this case. Yeah, what I'm going to do I'm going to execute a function score Remember before when we talk about relevancy this is core that measures how well a document match equity So you can define your own functions So in this case, we are defining a function that is generated by this script For these kind of use cases like the one you mentioned like logging or If you want something if you want to index something that is time-based information Time-based information is something with a timestamp, basically Usually the typical approach is that people is going to create time-based indices Maybe one index per day. Maybe one index per week. Maybe one index per month depends on your needs But that's usually the recommended approach So imagine this is an area I have If I want to query, okay, I want to execute a specific query from today up to four days ago I need to hit approximately four different indices but if you check This this kind of queries the nice thing about this index this index this index is that if I execute the same query The response that is going to come from these indices Is exactly the same because the information in these indices didn't change. Is it clear? So I'm indexing my logs so logs from today I'm going are going to the index with today name and so on So if you check the indices, let's say in the middle the information never change Can only change the query condition here and here but never in the middle So one of the things that we introduced in version 5 is something called Shard request cache which means that elastic search is going to keep this shard request Cache in memory So when you hit imagine with the same query actually not exactly the same query because if you say now You execute the query now and four days ago and you execute the same query 10 seconds after four days ago is not exactly the same query the timestamp condition moved 10 seconds But still the indices in the middle are going to return exactly the same results So what we can do is we can be very aggressive on the caching because it's exactly the same result So what we introduce in version 5 is is this shard request cache that is enabled by default and to implement this We need to refactor a lot how the query engine works. So this was a very Very big effort from the engineering team how elastic search internally process all the queries So what do you mean? if you are using Kibana or if you are using any kind of Application that aggregates logs information is going to be much faster Because all the information that is retrieved from the middle is cash Roll over API Let's talk again. Let's talk again about this use case of daily indices. Imagine I have a website Imagine I have PayPal I am like Elon Musk in the beginning. So imagine I have PayPal and I have my logs and I put my logs on daily indices Probably the traffic that PayPal is going to have on weekdays Probably is higher than the traffic that PayPal is going to have on weekends or maybe the opposite I don't know but you see the point. This is not exactly the same Amount of logs that you're going to generate because that depends on your business that depends on your use case That depends on the nature of your business So if you take this approach of having one index per day You may end with something like this some indices Maybe much bigger than other indices depending if the day is weekend or weekday. Is it clear? and this is something that Usually it's not nice to have because a lot of things when elastic search is going to balance the information when it's going to partition information It's going to happen at index level So it's nice to have all the indices approximately of the same size in these kind of scenarios So how do we do that? In version 5 we introduce something called roll over API and The idea is that instead of having for example a daily index imagine logs 2016 November 10 logs 2016 November 11th instead of having that You're going to have something like logs 0001 logs 0002 and Elastic search is going to automatically create a new index It's going to create logs 0003 Based on any condition that you can define for example the number of documents So in that case you can define something like okay create a new index every time the previous index has 10,000 documents or 1 million documents or once the previous index is at least seven days old So elastic search is going to create a new index It's going to create an index alias in elastic search. We have something called index alias imagine. It's like a symbolic link in Linux So it's like an index that can point to one or more indices So it's like a symbolic link in Linux same idea So it's going to have an index alias that is going to point to all these set of indices Does it make sense and for you is transparent what you get at the end of the day is you can easily move from this to this All the indices from the same size And you can define complex conditions based on time based on documents based on size, etc. Shrink API Who is familiar here with the idea of shards in elastic search there is something there is a magic world called shard The idea of a shard when you create an index in elastic search elastic search is going to partition the index Into a smaller pieces That we call shards And the default setting is five means that when you create an index in elastic search elastic search internally is going to create five Lucene indices Does it make sense? So one of the things that sometimes happen is that people Have too many shards for the size of information they have so maybe they define okay I'm going to have for this index 20 shards Maybe they only index at the end of the day They only index I don't know 10 megabytes and they are paying the price of having 20 shards Only for 10 megabytes of data, which is not recommended at all. Usually you can accommodate several gigs per shot So when the people wants to fix that oh, I want to reduce the number of shards if you want to do that You must re-index the information. There is no easy way. Actually There was no easy way until version 5 in version 5 we introduce something shrink API And the idea of shrink API is that we are going to reduce the number of shards to a factor of the original number of shards So imagine if I have 20 shards, I can reduce to 5 to 4 or to 1 Does it make sense? Why it's a factor because at low level this is implemented were all kind of complex sim leaks sim links logic So it's going to start duplicating sim links, and that's why it must be a factor So bottom line bottom line don't use prime numbers in your In your shards numbers So this is something that for example if I have this index and I want to reduce the number of shards I just specify on the shrink API how many number of shards do I want to have and this is going to Shrink all the shards. This is going to be very fast because as I mentioned is based on the implementation is based on symbolic links If you are asking if you are if you ask what happens if I run in Windows I'm telling is going to re-index because in Windows there is no symbolic links But if you are if you are running elastic search on a file system that supports sim links It's going to be very this is going to be very fast. It's very cheap Java res client. I'm going to skip this slide this gentleman. He's going to describe this indeed in detail So I'm not going to talk about this Ingest node For example, you are using elastic search for logging. How do you send information to elastic search? How do you collect these logs? Okay, so you have log such log such is reading the logs and sending to elastic search That is one approach the other guy you were using logging also elastic search. How what do you use to collect the logs? Okay, Kafka and then from Kafka, what do you use to read from Kafka log such or something else log such? Okay, so this is a common use case I want to for example send logs from different places. Maybe I have thousand servers generating logs You know nowadays is this idea of microservices architecture everything is distributed. We collect logs everywhere And we need to ship these logs to elastic search One approach is for example, you can put log stash in your servers and this collect the logs But imagine if you have for example again thousand servers You are paying the price of log stash in your 1000 servers What do I mean with the price of log stash? log stash is not cheap to run From CPU and memory perspective is not cheap. Why because it's Java. I know this is a Java user group. Sorry about that but Let's be honest, you know the the footprint that you get on CPU and memory for Java application It's going to be higher than other technologies. I'm not saying it's good or bad, but it's what it is so That's why we introduce something called beats and beats are these Lightweight low-memory footprint agents that for example if you want to collect logs You can use something called file beat that is a very small program the footprint I think in memory something like 14 megabytes one one four of RAM very low CPU footprint Of course, it's much more limited in functionality compared with log stash. There is no processing There is nothing it's just collect logs and ship it to elastic search collect logs and ship. That's it And some people use beats for this because again, you pay only 14 megabytes of RAM in all your servers Why this is much lightweight by the way beats is implemented in go long go long is a Statically combined language no virtual machine. So it's very small footprint So but what happens is that some people if they want to have some enrichment of the logs or maybe they want to Be consistent on the field names like for example if you want to use something like common event format or any log Standard format that you want to use if they need to process these logs They need a lock such somewhere. So they they hide these file beats sending logs to lock such in lock such you do You run all these pre-process and then lock such ships the locks to elastic search People may say oh, but now I need to introduce another component I need to introduce lock such just to run this minimal enrichment minimum process So that's why we introduce something called ingest node And the idea of ingest node is one node can be one or more in your elastic search class That is going to run these Let's say process before the documents are indexed into the same class So imagine that you send the logs and you want to change the field names Maybe you want to remove fields because there is some Private information or anything that you want to run is a subset of features. You don't need to deploy lock such for that This pre-process can happen In the elastic search class that itself using this new feature Another interesting example is for example if you want to index PDF documents into elastic search That's another common use case also or war documents or Excel files, whatever Usually when you need to extract content from these formats, it's very CPU intensive and memory intensive If you rely on libraries like Apache Tika all these libraries are very CPU memory intensive So some people they rely on external tools to that you can use an ingest node to run that So you can have a dedicated node that is only going to extract the content from these documents And then index into elastic search What you get from this? Simplified architectures you need to deploy less components to achieve what you want to achieve bootstrap checks. I Love Paul fiction by the way This is what we want to avoid Why because we have seen people running production cluster like this, you know So We have seen people that we have a support team worldwide in different places in the of the world and and sometimes these guys They need to deal with with very bad things, you know, like for example corruption file corruption and that happens because They didn't take care of the proper settings of Elastic search or sometimes even the operative system not elastic search the OS level and for example, let me give you an example Elastic search is quite aggressive on the number of file descriptors that is going to use This is because of the nature of Lucene. Lucene is going to create new files Because the Lucene wants to keep the files immutable if something is immutable we can be very aggressive on the caching So elastic search Lucene is going to create a lot of files Usually and these files are going to be merged in the background, etc, etc so but the point is that if the Limits for example of the open file descriptors in your operative system is not big enough When you have all these file descriptors open up in the air reading information writing information and tries to open a new one You reach the limit and you cannot say what you have in memory, you know, everything is dancing up in the air So they call our support team. Hey, I have corruption. I have this so we want to avoid reaching at that point because sometimes There is no way to fix that The sheet already hit the fan so we cannot fix it, you know, sorry about that so What do we have in version 5 we have something called bootstrap checks and Now we introduce a new let's say a new a new mode of running elastic search that it's called development and production And and how we are going to detect basically if you bind your need war interface To a different interface than local host For the transport protocol the protocol the transport protocol is a protocol that elastic search uses internally It's going to assume that you're running in production mode. It's not your laptop It's not the element environment. It's in production mode. And if you're in production mode All these bootstrap checks Must be on green state if any of these bootstrap checks terminates if any of these bootstrap checks is is not green elastic search is not going to start So you cannot start the server Sorry to be so harsh But it's believe me Help me to help you. So we want to avoid you to having these bad issues So one of the things we're going to check as I mentioned file descriptors virtual memory size Be sure that your machine is not swapping if you swap performance goes down You have enough threads from the operative system side JVM running in server mode Etc. Etc. If any of these is not check elastic search doesn't start All of them should be green if you have anything and it doesn't matter if it's virtual physical It is going to be quite blind, you know, like okay if you are running in production mode if any of these is not check I'm not going to start sorry because again, we have seen bad things. Sorry about that Dots in field names, this is a very we probably you know about this This is a very very long story in the old days elastic search Allow you to have field names With dots like I can have a field name for example first dot name last dot name that was valid In version 2 we said no that's not valid anymore because sometimes if you have a JSON object I don't know if the dot is just the field name or I don't know if you have a nested object You know an object inside an object So there is a confusion over there. So in version 2 we went quite radical and we said, okay No more dots in field names. You cannot have it. So if you have dots in your field names, you must re-index your information But for some people that was a bit painful so in version 5 dots are back So if you want to use dots but are not enabled by default you must enable with the settings Actually, the same happens in version 2.4. You must send it if you want to use dots You must enable a special settings, but by default are disabled But if you want Okay, you can do it, but you must change the settings A couple of extra improvements To mention some of them Faster storage, okay, refresh. This is a Very classic use case. Imagine I go to a website. I have a form in a web page I input imagine my personal information click on submit and then on the next screen I'm going to show information. I just fill in in the previous form typical scenario, you know Post redirect after post and show another page with the information what happens if Between the redirect and the get to the next screen the information is not yet available the information is going to be missing and internally elastic search is going to refresh all these indices at low level by default every one second So means that the elastic search is going to refresh all these search structures every one second So means that if I imagine I save And then I go to the screen and that screen arrives before the next refresh the information is not going to be there In version 5 we introduce a way to work around that and basically the idea is that when you save the information You can specify when you save Do not acknowledge the save do not acknowledge the index until the index is being refreshed So you have the guarantee that when you go to the next screen the information is going to be there This is a quite typical use case of about consistency I want to finish with the versions compatibility if you so if you are using version elastic search version one something two something five something If you are using elastic search version five means that you can read indices Created with version five of course and indices created with elastic search version two If you are using elastic search version one and you want to migrate from one to five You cannot just change the binaries. You must re-index the information because it's a different format And and this is a nice diagram and that explains, you know if I want to upgrade from 1.7 to 2.4 You don't need to re-index if I want to index from 2.4 to 5 you don't need to re-index but if you want to go from 1 to 5 you must re-index and And like in any major version upgrade in elastic search major is when we change the first number one two five You must shut down your cluster So if you have three nodes five nodes ten nodes hundred nodes of elastic search, you must shut it down for major version upgrades For minor version upgrades you can do it in rolling fashion like you shut down one node at a time replace the binary restart But for major version upgrades, you must shut it down It's a clear I want to finish with this. I want to finish just with with a quick demo This is for example the new version of Kibana. So let me post the refresh So this is a new version of Kibana a new design So in this case as I mentioned elastic search can be used for different things So in this case I am sampling information from the Singapore LTA Singapore land transport authority We have these public APIs that provides information about the traffic in real time. It's very nice So for example, I can see all the traffic advisories Where these traffic advisories occur which devices are being used you can find all these You know Singapore is a very green green country. So if you search for plant You're going to see all these plant pruning messages, you know when they take care of the plants in the highway So they put all these messages plant pruning plant watering Couple of use cases with Singapore LTA This is car park availability is also provided by Singapore LTA So if you go to Bevo city on Sunday, probably it's not a good idea But if you want if you still want to go and you want to park in Bevo city You can check these API's and see if you can if you can park your car These are traffic incidents so we have road works vehicle breakdown Accidents what all these accidents occur and of course in elastic search as I mentioned is very far very powerful on geo If I want to for example, let me zoom in if I want to focus on accidents on the Around the CBD area. I can filter there and everything is updated in real time. Oh nice We have a wrong mapping here But you see the idea everything is updated in real time and everything is geo I think the last example I have here here. We have earthquake information So we track different earthquakes around the world And also nuclear blasts by the way because from from the sensors it looks like an earthquake It's not the same but looks like the same same same but different, you know This is a very nice one. This is you know these ADS this information that this is used by the airplanes So this is one of my colleagues in Japan They have he has this raspberry in his place and he collects all this information and they put this information This is in real time. So look, this is the last 15 minutes All the flights around Tokyo. So this is my calling in Japan and I think that's all I have. Yeah, this is the new monitoring UI Something nice take a look That's all I have for tonight Any question I Have more slides happy to present but but it's going to be very long but if But if you want I'm going to be around if you want I can I can show you what I have no worries I mentioned at the beginning, you know, we have this elastic stack We have elastic search imagine is for the information leaves Kibana is is a UI if you want to get analytics on the information that you have in elastic search And we have logstash and beats to send information into elastic search. So this is Kibana. This is also open source So no GS JavaScript on the server side and uses the elastic search HTTP rest API Happy happy to take it. Thank you guys