 turns out there's a lot more to this. Let's find out from Philip Crenn, developer advocate at Elastic. Philip, are you there? Yes, hi, good evening. Hi Philip, how are you? Good, good, how are you? Fine, thank you. We just heard our previous speaker was a friend of yours, I believe, Nicolas Frankel. Yeah, Nicolas is a good old friend though. Well, I haven't seen him in a while because of all the travel restrictions, which he would have been fun in person in Spain, but well, not this year. Maybe next year. Okay Philip, we're all we're all ears, we're waiting and ready. Great, let's take it away and I hope we don't have any depth or anything like that. It shouldn't be as dangerous as in the shipping industry, so let's keep that off with the slides and see where we can take this. I'm just waiting a second for the slides to come up and then we will start if I see my slides. I'll confirm when we see them Philip. Okay, we're okay. Go ahead. We're okay? Yeah, perfect. And then I'll go ahead. So thanks for the introduction. Yeah, I'm a developer advocate at Elastic, so mostly I try to show off the good stuff that we have. And while my examples will be about Elastic Search, we apply to any most particularly data store, but any other containers in general as well. It's probably something you have heard of the full tech search engine behind Wikipedia, widely used on guitar or stick overflow that stores the data and keeps it. And then we have other components, but those are mostly state less. So if you want to visualize speeds to collect data log slash to parse an image, but I want to focus mostly on Elastic Search because that keeps the data and that tends to be the most tricky part in any operation. So I quickly want to cover our Docker images and what we consider good and bad practices around state for images or like anything that contains state. We look at health charts and then at Kubernetes operators, which are very fashionable nowadays. So Docker and by the way, the right logo or mascot for Docker should probably be this one because that is also the size of your average Docker container nowadays. So let's dive into what we have with Docker. Internally at Elastic we once had that saying and it's the world's most heavily funded college project because it did try to worry, it did break some stuff over time and something was not always as straightforward as it could be, but it should be better nowadays. So you can totally use Docker to run our software, but for us it doesn't really matter how you get our software and our thing is pretty much like it is the new SIP format. So previously in the past 10 or 20 years you used to download your software and SIP and then you just ran whatever binary you have. Nowadays you probably just get a container and then you just run the container. So we don't really care about the specific format and we still have many other formats do whatever is right for you and what you know and works for you. But containers and operators are very fashionable so that's why we need to support them of course as well because like I said we don't have a strong preference, it's pretty much what do users want to use and get our software up and running. Obviously a new format brings new issues. So let's see some of those issues. So one of the fallacies that we have with Docker is that sometimes people stop caring about the user that is running something and file permissions. So in the past we accepted that running a server process is rude was a bad idea. With Docker that idea suddenly came back I would say and in this little comic here you could replace the snake with a whale and that is pretty much what we're getting and what people are being tempted to not care about permissions which does not work for our maintenance because we have pretty strict requirements about the user and the group ID that we have for the process running. So we have expectations there and if you don't meet them and your data directory doesn't allow read and write from that user you will have problems. We used to get a good amount of GitHub issues because of that and what we especially like when somebody says like I prefer a simple fire and forget Docker containers and I always imagine that people run their data something like this where it's like you know whatever happens to my data happens to my data which is fine if it's for testing but not fine for production and the way we approach this is that we would rather suffer a bit more in development to make it harder to set up and then have a good production setup than having production out there to lose data in production because then people will come to us and complain a lot so we would rather avoid that. So in our opinion we always focus on production of course we try to keep it simple for development as well but we would try to embrace good well tactics or techniques on development already which plays into the next goal that I have here is that those who do not understand units are condemned to reinvent it poorly and the problem here is oftentimes that people were told well Docker is just something and you don't need to know anything about file permissions anymore or users and then unfortunately it's just not true because you still need to know about them because they play into that. Another fallacy that we see every now and then is late test which is kind of unfortunate because Docker Hub where most of us are getting their images or now with the new rate limits maybe not we will see with the late test you would just get whatever is the current image today and on Docker Hub if you go to our official Docker images for example you will see that we have Docker pull elastic search and this is basically implicitly latest here so if you don't specify version at the end here it is by default latest and unfortunately if you copy this command it will just fade because we don't have latest because we consider latest a bad practice why? Let's assume you stand up a three node cluster today and you will we get version 7.10 and in half a year or whenever you want to add two more nodes to your cluster and by then we might have a new major version out and then you might have suddenly a cluster with 7.10 from today and whatever is the next version that will be out when you add the two nodes because you're using latest and that might just break your cluster in an unexpected way because you have a version mismatch and maybe something is not compatible anymore and you will be very angry with us so we try to avoid that and don't have that so you always need to specify the version of elastic search that you want to run and unfortunately this command will then not run another thing that we run into every now and then is that people want to have some runtime mutation which comes I think pretty much from the world where you just install stuff and then we just mutate as you went along so either you provide a shell script when you start the container or you provide an environment variable and then something should happen and we support none of those because the the dockerware of stuff would be you do any customization you build the image you push the image to whatever registry you're using and then you pull that final image but that image is immutable and you never change it so installing a plugin or whatever else you want to add to your image you would do that once and have a baked in static image but you don't do any mutation when you start it up so for example if you want to add any plugins this is a bit elastic search specific but adding any plugins or any libraries or whatever to your images you would do that based on the official docker image that we provide in the version that you want to use you add your plugins and then you create a new image that you push and then you would always pull that final image and it couldn't fail because i don't know the network connection just failed to download this plugin on one node because that might screw up your entire data so that's why we don't support this approach you always need to push the final image and then you just pull the image and take the static final image the only thing that you should not bake into your images are some things that change more or less frequently independently of your data store like tls certificates or any credentials because if your tls certificate expires you don't necessarily want to replace your the entire image automatically you might just want to replace the tls certificate so this here is a quick example details are not so important but this just shows how you would generate the keystone for example with some credentials in with the docker images and this is how you could then mount those from docker compose so something like secrets or certificates you wouldn't bake into the image but those you would mount so you can actually have a different lifecycle of those but packages or anything that is static in the image you should bake into the image up front another thing that is always a big discussion is the base image so our images for quite a while have been based on sent to s which makes them not exactly small but they are pretty common especially our us user base is very used to any redhead-based images so it's kind of like the natural choice for us the good thing is since all our images are using the same base layer they are shared so if you run all the components you will just have the particular base layer once and the others will reuse it also the setup is the same across all our images you just use yum to install the package for example also once you have the jbm like we do because elastic search is java your images will not be very small anyway and while we tried alpine initially with new cel we had some bugs there and gbc might be all the nugget but it's it's pretty beckled tested so we had very few critical bugs with it just because it has seen so much action already the main downside of that approach is that our images are of course larger than if you had alpine based images but again if you have the base image and then you have the jbm and then you have our software it will not be a super slim image in the first place also the line of thinking that we have is if you have a state less image your application for example that you just deploy 20 times a day potentially you care about the image size a lot there with our stateful images like for a production setup elastic search will probably have 100 gigabyte or a terabyte of data on it so if that base image is 100 megabytes larger or smaller will not really make a big difference also you won't deploy elastic search 20 times a day because we don't release that often we only release a new version even if you do every single patch level version we will only release every couple of weeks and you won't dynamically deploy all of that all the time because you just have too much data on that anyway that's why for stateless images i get the image size problem but for stateful images we consider it a bit less of a concern we do by the way now concern or support our 64 images so once the new apple laptops will support docker you can run native images with arm on those as well we will by the way also add new images soon for ubi this is pretty common if you want to be certified to run on on any red certified kubernetes environment you need those images so that's why we are going to support those also in the next major version of elastic search since this is a bit of a breaking change we will actually switch to centOS and the slim down not because of size that's a secondary concern but mostly because of security scanners because the current images have too many dependencies and too many false positives and that takes up a lot of time to answer questions like why do you have this package and my security scanner says like this is a security issue and then we're all like no actually this is not reachable in our code or actually no we don't even use this you cannot access that it's just a lot of unnecessary explaining so that's why we will try to trim down the dependencies as much as possible in the future but not for size but just security scanners which can be pretty annoying okay now kubernetes the thing everybody wants to have or use and when i say that yes oftentimes people don't really consider anymore what was the problem i'm trying to solve or what is the question it's just like i want to have kubernetes fine we're happy to support kubernetes um i assume most of you are kind of familiar with kubernetes by now it's a set of configuration that you deploy kubernetes or you have a configuration that you deploy to kubernetes which looks something like this so you have the qctl that talks to an api server where you can pick up what you do that it keeps some state has some other components in it and then there's little cublets down here there we do the actual work for you what do you get off a lot of in this example um what you get a lot of is yaml and maybe some of you are not really software engineers but yaml engineers by now because what they write most of the day is some form of yaml um yeah and with that lots of yaml you get some interesting problems um i don't know maybe that somebody knows what goes wrong here if you try to use that port mapping in docker if you run that through a linter something unexpected will happen and this is actually from the docker docs because it's kind of a common thing so we're trying to map port 80 to port 80 and port 20 to port 20 when you run that through a linter what you will get is the mapping 80 to 80 is fine but the 20 to 20 mapping is weird why is that because yaml has this width feature if you have colon and it's lower than 60 it will expect that this to be the base so this here the 20 colon 20 is 20 base 20 rather than base 10 as we would normally expect the solution of course is to quote that but this is just one of many tricky things that you might run into with yaml um so beware for us we use environment variables a lot and have dots in those kubernetes didn't support that for a long time for no obvious reason um though 1.8 has been out for quite a while so since then you could without problem run our step or elastic search on kubernetes but since i said all the yaml you probably don't want to start writing all of that from scratch so you want to have a bit of a nicer way to actually interact with that and that's where helen charts come in helen charts are like a package manager for kubernetes um it's called an advanced package manager advanced or not i don't know it's up for discussion but it does provide support for templates and then you can specify more complex resources so what is the nice thing about helen charts is that they build on existing primitives so you have a stateful set that gives you data and you have the service you have a deployment and all of those are still there with helen charts it's just a nice way to to wrap them or package them and then you fit a template and then it will do the thing for you so for example we have official helen charts for most of the products that we do provide they are GA so you can use them in production and they would be covered by support and everything and you can just roll out your step with helen charts what that looks like or before i show you some samples for that the stateful set is generally the thing that keeps you data and even if you replace the port that you have running the stateful set could be reattached to a new port this is especially helpful let's assume we have a three node cluster and each node has 500 gigs of data and you want to upgrade to the next version of elastic search then you don't want to replace a full image and then have to reload 500 gigs from another node but you want to basically detach the storage and reattach the same storage so what we do is basically we replace that port or image of elastic search we have running there and just reattach the existing data to it which makes any upgrade process a lot faster and if you have to pay for data transfer also a lot cheaper so if you do a rolling upgrade you could just replace the node one by one wait until the replace node is coming back up before you go to the next one but you don't have to transfer a lot of data around you just reattach the data to one of the new nodes so that makes an upgrade process a lot simpler we currently mostly test this on the google container or Kubernetes engine and do support it officially there but you can also run it on various other environments we have samples for those the good thing or nice thing about the Hamilton side that they are very unopinionated so basically they just expose environment variables and you can get model good tlx certificates and you can configure your secrets just like you did with the docker images before there are multiple upgrade strategies that you could follow so if you want to have or do things in a specific way you can totally do that also if you have lots of services running like elastic search and 20 other services if you want to run them in one specific way the health charts give you kind of a way to run everything the way you want to approach it whereas the operator that we will get next to has a different approach to that just remember that the health charts are pretty unopinionated and don't let you pick like multiple approaches to do things so how to run that you can add the helen repository which we provide at helenelastic.co you update it and then you just install the elastic search health charts and you can set the image version that you want to run i would set that to the latest version right now then you can just change into that directory of examples for example that is in architecture repository i have pulled that here and then you could for examples use the default setting that we have there and with make we always provide make files for those you could just apply that setting and what you would have and that for example here one that one would work for miniq for example i would say that the heap size for an elastic search node should be 128 megabytes it should have a pretty strict cpu limit and also the memory limit for the entire port is pretty small and then i also create a 100 megabyte volume claim so i have a volume the statehood set in the background of 100 megabytes which is very small but will be would be fine for a demo also i'm setting the affinity here to soft so if i run it on miniq on my laptop it's only a single physical machine normally cooperators would try to space or we would try to space out your instances across multiple physical machines if one crashes that it doesn't crash the entire cluster so elastic search cluster in that case if you run on your local laptop you want to soften that affinity that it actually sets up multiple nodes on one laptop and that's it the downside is that head charts kind of like have a limit because they really like the package manager they're like i want to install three nodes in this version and then it does that or i want to upgrade my three nodes to a five node cluster it does that or i want to upgrade the version and then you run that but to have a more longer lasting monitoring and lifecycle management approach you need the operator framework and that is what the operator that we provide has been or is doing so the idea behind operators are that you extend the Kubernetes interfaces so you can actually run like in terms of services of a custom application and not just Kubernetes primitives anymore so for the operator you can think of elastic search kibana and apm rather than thinking in terms of ports and services and secrets and stateful sites they are still there they're just somewhere in the background basically how you do that is you have a custom resource definition where you can define of like what your service should look like and then you can just fill that out and run your service one thing that sometimes is confusing there two similar concepts here there is the custom resource definition that's like the blueprint or the class if you're into object oriented programming and then there is the custom resource that's the actual instance that you have for an object in the object oriented world so don't be confused the custom resource definition is like the blueprint and the custom resource is then the cookie that you have created out of the cookie cutter for example or the CRD is basically the cookie cutter and then you have the cookie coming out of that so the concrete instance that you're interested in is a custom resource and the custom resource definition is kind of like what is behind all of that and what brings the custom resource definition to life is the so called reconciliation loop it's basically an infinite loop that continuously runs in the background and always checks what have you could figure out in your custom resource what is your current running state how do i move over the application to that state that you want to have and it will do upgrades automatically for you it will create secrets for you it will generate tls certificates a lot of that is fully automated and happens in the background there used to be a kubernetes operator from the community but that has been discontinued about a year ago or so or sorry no by now it's already two years ago and that's why we started our own operator at around the same time so so two years ago we started our own operator and this one has like five plus full time developers which is quite a lot but you should see that as an advantage that we kind of like provide a lot of effort into that so you don't have to so the operator is also ga it supports elastic search kibana all the beats and the apm server don't be confused it's not called operator but it's called cloud on kubernetes because it should give you a cloud like feeling or like result and we call it eck elastic cloud on kubernetes but it is an operator just don't be confused if you search for it and you don't find an operator there is an operator it might just be called differently sometimes this causes confusion unfortunately it is put together with golang we have cube builder which talks to the sdk for our api so basically that thing is talking to the api server of kubernetes to set stuff up and we also use customized to patch together things for very old kubernetes versions that we still support but need a bit of a different handling so that's what we have put together so how does that look like in production so the third thing you do is you install the operator this is a its own pod that is continuously running in your cluster and just checks the state of these cluster resources here that i i'm configuring them and then basically checks what is configured here and what do you have running in your cluster and if those two don't match it will try to bring your cluster into the state that you have defined in the cluster resources so it will continuously check okay you wanted three nodes you wanted them with these resources you wanted them in this version is the cluster in that state if no it will just bring it to that so if you apply any change here for example rather than having three nodes you add the configuration that you want to have five nodes then that controller will pretty quickly pick up oh there is a change you want five nodes instead of three and then it will just add two more nodes to your cluster say for the version number you upgrade the version number and then it then it will just roll forward one node after the other in your cluster and replace the version and to upgrade you to that next version or new version whatever you have configured the big difference to the hand chart is that the operator is opinionated so we do have best practices encoded in depth and some operational knowledge so for example we enforce security we will always set up users and passwords for you we will always generate tls certificates with the hand charts you have the option do you want to use tls certificates and hdps connections or not you don't have to with the operator we don't leave you much choice because we think this is the right approach so this is big thing so most cases this is a big advantage if you want to deviate from our path it will not be that easy though okay things that the operator can do for example it can scale down so if you say like instead of five nodes i just want to have three anymore then it would drain the nodes so basically it would move off the data of those two nodes so it picks two nodes it would move the data off once all the data has been drained it would only then shut down those nodes or for example it knows some tricks of how to make an upgrade procedure quicker or safer and it would apply those automatically in the background for you you can still shoot yourself in the photo so for example if you have a single copy of your data and you do a rolling upgrade where you replace one node after the other if you replace the node with the single copy of your data that data won't be available until the node joins the cluster so the upgraded node joins the cluster again unfortunately there's no way around it and the operator won't stop you to do things like that so there are still things where you can hurt your availability for example but only if you set them up in a specific way and don't think about how the operator will actually work with that okay so how to run that there are two ways actually to get our operator the first one is you can just cube CTL apply or cube cut will apply or whatever you call it i'll try to skip over the discussion today and then you just pick this all in one operator which will then run one part in your cluster you could check why the logs of the operator itself and then apply some setting we will look at that setting or setup for a cluster over the next couple of slides there's another way to install the operator and that's actually Helium so you can use Helium charts to bootstrap the operator and that was only added in the latest version of the operator in 1.3 that we actually provide Helium charts for before Helium also had some technical limitations that we wanted to avoid but a lot of people really wanted this feature to not have to basically run some random script here from the internet and install it but rather run that through a proper repository um so this is the same repository that you have seen before but rather than having an elastic search resource you have an elastic operator resource that you can install now into i installed it into the elastic system namespace and i'm actually creating that namespace so the operator now will run in that elastic system namespace and then your resources can run in an elastic namespace you could check the logs from your application or from the operator again and then get all the pods that we are running and you will see that there will be one pod from the operator that is running so what do we have here this is one of three slides where i actually show you the configuration this is the configuration we provide to the operator to actually set up our cluster and this and the next two slides are one configuration file so what you see here is that this is not a Kubernetes principle anymore or concept now this is what i meant initially when i said now you can work with elastic principles so we have an elastic search namespace in version one and then you have a kind elastic search so now you're just thinking in elastic search terms and not in Kubernetes primitive terms anymore basically and then you can say okay this thing is called elastic search set but this is how we will reference this later on as well this is the version i want to have a single node i want to have two gigs of memory and i want to have two gigs of data volume as well so this will just set this up for you then i want to run an apm server with that as well and i reference the elastic search sample in the previous slide so this is how this is connected so this is elastic search ref references the configuration on the previous slide and again you see the apm server has its own namespace and its own kind so we have these elastic concepts here finally kibana is pretty much the same thing both apm server and kibana are stateless all their state is in elastic search and you just reference the elastic search cluster and then it would set you up one node in that version that you put to go so this is all you need to do and then in the background the operator would start the elastic search cluster it would set up the elastic certificates it will generate some user and password it will start up noted apm and connect it to it with the right credentials using hdp cf hdps certificates so you don't need to do any of that setup the operator will do all of that for you that's also why it has a lot more resources behind it because all of that magic had to be implemented in go in our case you could use another language but go is the most common one to do that actually to run that on mini cube for example um or actually this is not running a mini cube this could run anywhere once you have that running you can run just qctl you get give me there's three resources and it would show you that you have three nodes running them you could expose for example um kibana to the outside world to access it and i said elastic search is generating users and passwords where you automatically the last command here that would fetch the user you can use to actually log into kibana then we would always generate random uh passwords so you need to fetch that from the json this is complicated so i always copy this command because i can never type it correctly um but this would fetch the right password uh from the cluster state which is just a community secret in the background and then you could log into your cluster okay if you want to make any changes um just change the email file with whatever you want and then apply the change the operator would pick that up automatically and apply that to your cluster it is pretty widely supported on various environments and tested um and again in the background you have a state full set so rolling upgrades are pretty easy i have some more slides um about it which might go a bit too deep into that so i i'll just skip over that um one more word here there are various operators out there you can see them on operator hub io and well ours is one of them but if you want to run any of these data stores or various other systems you can go to operator hub to see what their operators can do fetch them and then set up your software with that so that should get you started pretty quickly running data stores for those many other things on Kubernetes so to wrap up and also leave some kind time for questions um oftentimes people are saying containers are disrupting the industry and i'm never sure if they mean it in a good way or in a bad way because they have some downtime with docker or kubernetes uh but that is up for you to decide the question we get pretty frequently is can i run elastic search on docker or kubernetes and yes you can but that's not really the right question the question you should be asking is should i do that and that's really up to you if you run everything on docker or kubernetes and know how to operate that probably you should if it's the first thing you want to run maybe don't start with your data stores because any stateful service is probably trickier to run than any state-class application but it is totally an option one interesting thing that we have seen is what i call the kubernetes paradox um that we saw before we had an operator so when you talk to customers they would ask that do you have an operator to run elastic search and we said no we don't and then said well we cannot use your software sorry and then you turn the question around and say like oh so the majority of your workload is actually running on kubernetes already and they're like oh no no we're just testing like two percent of our workload is running on kubernetes but it's still a hard requirement to have an operator so while we must have this operator it doesn't mean that everybody's actually using it while we have some large customers and paying customers running on it it doesn't mean that everybody does that or it would be a requirement so don't feel bad if you don't use an operator it can just make your life easier if you want to handshots versus operators i hope it i made it clear the simpler and less opinionated versus more like the full service environment but like once this big path to follow and what the operator is especially good at is the so-called day two operation so scaling up upgrading making sure backups happen all of these are like there's a lot of logic baked into the operator to help you with these day two operations so not just starting the cluster but also keeping it alive and keeping it in a healthy state final point before i wrap up um we have a booth we have a a a quiz if you want to get any swag from us um do well in the quiz and we will ship you some swag um because swag is normally very popular at physical events and our way is well we have a quiz so the top people can get some swag as well at this one um if you don't get the link it's also on our booth so you can find it there um so if you want swag that's where to go that's all from me do we have any questions i think we have like two or three minutes left for questions thank you so much phillip we are actually a little bit tight on time but thank you so much very um very well presented talk and i say that someone who professionally advises people on how to create appealing presentations i thought visually this was very nice and clear and i like the videos and the metaphors so that was great could you could you um perhaps tell us a little briefly because we don't have much time about how you see the future of the container how how will it evolve will it standardize will it change um what are your views on that i think containers have kind of standardized already so um in terms of artifacts like we support so many artifacts like you have an msi and dev and rpm and the tarji set and then you have so many ways to install it so i think in some way it it has already standardized um that you have these different kubernetes um versions or flavors then does make it trickier but i think you often try to wrap it away or we test on all of them and try to wrap it away so i think it is already standardizing and i would only see more investment into that environment and more of it uh than less going forward so i think it is helping in that area okay that's great phillip and uh thank you so much indeed um i'm definitely going to take a look at that quiz i want the swag what is the swag come on don't leave us like this is it an elastic t-shirt or what what do i get here i'm not sure if we have t-shirts right now i thought it was a backpack and a water bottle and maybe some stickers um but i'm i'm not the right person to to ship you this swag so i'm never sure i i wouldn't know the answer to the quiz questions um though we don't want to cheat um okay well you send that privately to me that's that's fine so phillip once again thank you so much you've got uh several compliments here on that talk and some more questions which we don't have time to go into right now but i encourage uh attendees please get your questions directly to phillip so once again yeah thank you so much behind the platform thank you bye bye phillip thanks