 Hello everyone. So my name is Sayum and I am here with my colleague Ravi. We both are from Walmart Labs and today we'll be talking about getting dirty with monitoring and auto-skilling features for self-managed Kubernetes cluster. Whoa, big name, right? I don't know who gave that but that's good. So a bit about myself. My name is Sayum Bhattak and I'm very active on social media platform, Twitter. You treat me now. I might reply you. So I'm a blogger. I write about the new tech. It's on Medium. I'm a Docker Bangalore community leader. So I handle the meetups over there for Docker. I am a Rancher and Influx Bangalore meetup organizer. I'm an Influx Ace, Rancher, Ranch Hands member and a Kubernetes member as well. So apart from the day-to-day work we try to contribute to the open-source community for different projects. So just before the big question, I would like to set the context over here. So at Walmart what we are doing is we are handling a machine learning platform and that platform runs on a multi-tenant Kubernetes cluster. So the Kubernetes itself is on-prem and it has a huge presence in the public cloud like Google and Microsoft Azure. More than 500 nodes and it's learning the machine learning models over there. So as a developer, they will just run their models. They'll learn the trainings on them. But beneath, they are running as pods on the cluster. So keeping this thing in mind, we'll be discussing two major concepts of Kubernetes, which is monitoring and auto-scaling. So what, why and how of monitoring and Kubernetes and auto-scaling in Kubernetes. So we have a lot to cover. So I'll try to be a bit fast and explain everything. So first, why Kubernetes monitoring? The thing is Kubernetes itself exposes some set of metrics that it keeps to have it in the desired state. So the current state matching to the desired state is a metrics that Kubernetes is already exposing. But apart from that, it do not have any information about the downtime. So if a pod goes down and you don't have any sort of monitoring mechanism, you won't know what happened when. So for that, you would need monitoring. Reliable platform. Once you have a historic data of your platform, you would know how reliable your platform is behaving and what is the performance of that. How many pods went down in the last one week. You won't know if you haven't implemented any sort of monitoring on top of your Kubernetes cluster. Next is the event. So if you want to monitor specific events, then definitely Kubernetes monitoring is required. Learn, observe, predict. Think of a scenario that a machine learning engineer, he has submitted a model. Now, the model is taking four days to train. So the pod has to run for four days. So after three days, if a pod goes down, then what about the training that has already been done? So it goes waste. So what we need to do is we need to actively monitor what all pods are running, though there are other mechanisms which we can handle. But from the Kubernetes perspective, we need to make sure that the pod do not goes down. And for that, we need extensive monitoring. We need millions of metrics that are being sent. And we have to monitor that so that we can predict the failures that are happening. And we can observe it how they are happening. So this was why do we need the Kubernetes metrics? So this is by default on the left hand side. If you see the by default metrics that is exposed by Kubernetes. So you have a cube Kubernetes API server metrics that is exposed. You have Kubernetes node metrics that is exposed. You have metrics endpoints, cube state metrics, endpoint metrics, Kubernetes components. So all these metrics are by default exposed by Kubernetes in a Prometheus format, Prometheus readable format. Now, this apart from this part, let's move to the right hand side. So right hand side, you can see telegraph. So in this particular session, we will be discussing about telegraph and influx that are the monitoring solutions that we have implemented for our large scale project. So telegraph, telegraph is a tool that can capture the metrics from different sources. So there are hundreds of sources from which you can scrape the metrics, not only the Prometheus format, but there are other formats as well. I'll show you. So this is telegraph plugins. So as you can see, telegraph plugins, so telegraph has input plugins. And these are all the input plugins that are supported by telegraph natively. And you can see they are not one, two, three, 10, 20, they are more than 200 plugins, which are readily available. So 200 plugins means 200 different sources are there from which you can scrape metrics through telegraph and send to a database 200 different sources. So like why would you need so Kubernetes is just one part for a machine learning platform at a large scale, you would not need only the Kubernetes metrics. So we have extensive billing that we have to provide. So we need the billing metrics as well. We need extensive pod metrics as well. We need system level metrics, we need application level metrics. So all those metrics can be readily scraped by the plugins that are available. I'll quickly show you a few. You can see readily available Kubernetes and cube inventory plugins are there. Prometheus plugin is there. You have kernel level plugins, you have CPU plugins, and you have Docker plugins as well. So there are hundreds of plugins that you can readily use apart from the Kubernetes one from a single place. So telegraph is the only tool that is required. And this is how the configuration looks like. So suppose I go to the Prometheus plugin. What you have to do is just two lines input Prometheus and the URL, what it has to scrape, that's it. You don't have to give anything extra in the configuration files. So obviously yes, below there can be a username password. So all those things are there, but bare minimum, this is the only thing that you want to do when you want the input plugin to be there in the configuration files. Yeah. So we have discussed why we have used telegraph just in order to have multiple sources from where we are gathering the data. Now we are pushing this data to influx DB. So this logo is for influx DB, which is a time series database. Now for machine learning platform, it is very important that we have all the timestamps with the data because we need to analyze them. So for that we have used time series database and time series database influx has extremely tight integration because it's their product telegraph is their product. And it has very tight integration with telegraph. So you can directly send the push the metrics to influx DB. It has integration with Grafana. So all the dashboard all the visualizations can be easily drawn from influx DB and visualized in Grafana. And based on that you can do the alert in mechanism as you want like Slack, Mail, Pager, Duty, all those can be configured in Grafana itself. So this is how that we are using telegraph and a quick intro to the txtag architecture. So we have discussed the two I'll quickly skim through it. Telegraph is taking the metrics from k8 stats, Prometheus, endpoints, the networking metrics, messaging queues, apps and the databases, system stats, logs and traces. So this is the telegraph agent, then it is sending data to influx DB. And flux DB is a time series database. These two we are not explaining in this session, but chronograph what chronograph does is it's just like a query builder, a query visualization tool where you can write the queries and get the results immediately. And capacitor is for real time data processing, the high complex queries that you want to run, you can use capacitor for that. They all are open source products. The most important part of this talk is how you can go back and have a high availability setup for your Kubernetes cluster. So when you talk about again, I come back to those terms again, when you talk about machine learning platform, it is very critical to have a highly available and disaster recovery implemented solution for monitoring piece. For that, this approach is what we have followed. So this is our current setup that is running in production, handling the machine learning workloads. So the green ones are the VMs or the virtual machines, which are having influx and relay. And on top of that, there are two load balancers which are connected to a main load balancer to which the telegraph is sending data. So in the telegraph con file, we'll give the metal LB load balancer main LB load balancer. After that, it goes to one of the VMs. It goes to one of the relay relay is responsible for sinking between both the influx rights. Now why we have done this open influx DB open source do not provide high availability by default. If you want high availability for from influx DB itself for the open source one, you won't get you have to buy the enterprise license for that. And that involves cost. And for a production cluster, you can imagine the hundreds of dollars that would be involved in getting the licenses for influx DB. So to avoid that approach, we have used relay. So relay is responsible for sinking between both the influx DB instances that are running. On the right hand side, if you see, you can have some sort of configuration done on telegraph level itself. So you can see two things, the small cylinders are batch and buffer. So batch is something that you can define how much batch sizes should be sent at a time to your influx DB instance. And buffer is like suppose this influx DB instance goes down. So it will not stop sending the data, what it will do is it will hold the data in the buffer, the size that you have mentioned. Suppose you have given two GB based on your workload. So for that, it will stop start storing that particular metrics in the buffer. And as and when the influx instance will come up, it will send the metrics directly to the influx DB. This approach is also a valid approach. But for millions of metrics, it will be problem. You have to have a Kafka queue in between telegraph will send to Kafka queue. And after that, it will send to influx DB that in requires complexity that will increase the complexity. That's why we went for a relay. You don't have to do any single configuration in relay. Just we can use the relay as is provided. And you just give the URLs where you want to write the data or where you want to sync the data between the influx instances. Also, you can monitor telegraph itself using a plugin called internal. So it's very important and some people skip this particular plugin. So internal plugin, if you have you can keep an eye on the buffer size and make sure your alert on metrics dropped. So even if your metrics is getting dropped, and you have internal already in place, the plugin is already in place. So you can have a check on that and you can do some alerting mechanism that and based on the alerting, you can take other actions running some script or something to self heal that. So this is the influx high availability in the DR setup. Yeah, DR how we have done is we have we are taking snapshots snapshots of the where live influx DB. So that particular WAL. So they have a data value all those folders. We are taking a snapshot. And if everything goes down, then obviously we can recover that with the simple script, just to make creation. And after that, the same script, which will refer to a new snapshot and it will be with the same data. So no need for changing the load balancers or something. This is a sample telegraph config file, though I showed you the inputs. But this is how the config file look like. First, you have the agent configuration. In the agent configuration, you can specify the metric batch size, the metric buffer limit that I was talking a few seconds ago. After that, in the outputs, you can give where you want to have the output for this particular metrics. So we have just given a plain URL of influx DB. So where where your influx DB instance is running, whether it's a VM or where it's a Docker container or whether it's inside a Kubernetes service. So you just give that particular URL and your telegraph will send all this metrics to this particular influx DB. After that, you have input section. This is the main section which tells you which all plugins are you using. So for this instance, you can see we are using Kubernetes. We are using Docker, Prometheus URL, CPU, System, Disk, DiskIO, Mem, Processes, NetSwap. So they will give the metrics because that's the only thing that we have to write in the input file. If you want any additional stuff that you want to do any configuration for that, that also there from the documentation you can take. But you can see how easy it is to just use a plugin. You just write in the inputs what all things I have to use. So here only you have used more than 10 plugins. And if you go it without telegraph, then you can see how much difficult it can be to monitor different scrap metrics from different sources. So I have a quick demo where what I want you from this particular demo is not a high available setup. This is a small, very small setup on a two node cluster. But when you go back, you can do this exact same scenario for your dev clusters and see how your monitoring is working. And then if you want to apply a highly available architecture for your Kubernetes clusters, then probably one of the diagrams for the high available architecture that I shown, your diagram should kind of match to that particular architecture. That's when you will get the HA and the DR for your influx as well. So let's move on to the demo. So this is a plain CataCoda cluster. This is a free two node cluster. So I'm cloning Git repo over here. So I'll also show you what all things are there in the Git repo. So this is the repo, influx DB examples. It's a public repo. You can also just clone it. So we have different things. Control plane, influx DB, nodes and Prometheus. So Kubernetes setup will create the namespace and the RBAC roles which are required. Then you have Kubernetes nodes and you have a Telegraph config file and a YAML file. So here when you do it inside Kubernetes, you will create a small config map. You will create a config map and the config map again, we give the same things. Output, influx DB, input is Kubernetes. So this is the Telegraph YAML but will be as a config map. And if you see the Telegraph deployment that is being done as a demon set. So there are different kinds of deployment that you can do in Telegraph. One is demon set. Another one is a simple deployment that you can see here. For the control plane, the config map, we are using the cube inventory. Output is still the same and here you can see instead of demon set we have a deployment. So that depends on the things that you are going to monitor. You can have it as a demon set for your Kubernetes all the monitoring and for the application level you can have a simple deployment of a Telegraph where the application is sending the metrics. There is also one more kind of deployment which is a sidecar pattern. So you can also have Telegraph deployed as a sidecar but deploying it as a sidecar will definitely give you good amount of metrics from that particular pod but it increases the complexity of maintenance and in machine learning workloads you already have some of the pods which are multi-containers, multi-container pod and if you increase one more pod, one more container in that particular pod then it becomes more tedious job to debug that. If anything goes wrong. So let's see the demo now. So these were the files. I think we'll start again. So Katagoda just gives you a two node cluster where one is master and one is node. So cloning the git repo, it has everything in it. We go to the example folders and see in the Kubernetes section we have control plane influxdb make file. So what you're going to do is just type make and it will install everything, ease everything for you. So when you type make it will create a namespace for you, it will create a cluster role for you, it will create a rback role for you and then it will start deploying all the things that we just saw for nodes, for Kubernetes, for control plane and the Prometheus exporter. So all those things will be installed and this is via the helm and the tiller. So all those things getting installed and after once these things are installed we can actually visualize this how this is happening. Okay it has already done. So all the deployments have been created. Now we'll see the pods are up or not. So the namespace monitoring was also created and you can see all the pods are in the container creating states. Then we see the service, whether our influx service is up or not. So it has created a service off type node port on the port 31266. So it has created a service and if you see all the pods are in running state now. So what we are going to do is we are going to open that particular node port where the influx is running. It has a UI built in where we can log in and see that the data is actually coming in or not. So what we do is it will ask you for a username in the password. Username I have just given my password is super secret password that you can see in my git repo. So you can see that this is there. So quickly we'll create a dashboard. So it comes with pre-built dashboards. So we'll go to the dashboards and add a Kubernetes dashboard. So once you add the Kubernetes dashboard it's a pre-built you don't have to do anything. It is already taking the metrics by default from the sources. So you can see your pods, available pods, available memory. So you just took a cluster, you cloned a repo and you can see how easy it was to set up a metrics. And believe me actually it is not that difficult for a HA environment as well. You just have to have few more configurations that you need to add and few more plugins that you would need to scrape the metrics. You might be having a zookeeper. You might want to have a jupiter notebook sending your data. So you might need additional plugins that you want in the telegraph confile. Apart from that there is very limited that you will have to do apart from this particular demo. So I hope with this you have some idea like how you can monitor your Kubernetes cluster on a small scale and a large scale and probably you can go home and try this out. So that was my giveaway for you that you should try this out. Now next is another important topic which is autoscaling and without wasting a time I would like to call Ravi who is my colleague at Walmart and he'll be talking about Kubernetes autoscaling. Thank you. Sorry for the delay. So infrastructure issues. Yeah. So we'll start right away. So my name is Ravi. We are part of the machine learning platform. The entire platform is managed, handled from Bangalore and it also caters to all the data scientists across the globe in Walmart. So in terms of scale where are we? So we are probably in the platform group, Walmart platform group. We are the top one of the top three users in terms of capacity. The one of the slogans which Walmart has is you know save money live better so that applies internally as well. So we have it is imperative for us that we have to save costs on whatever we do. Right? Just before we get into autoscaling and delve into what is the how we exactly do it just quick two words about me. So I'm an architect at Walmart Labs. So I'm I also happen to be a contributor for Kubernetes autoscaling. I've spoken at various forums and you can catch me on LinkedIn. I'm not on Twitter. So I lost a few goodies here because of that. Anyways, so coming to the main topic of Kubernetes autoscaling. So we have the machine learning platform in Walmart is hybrid and we are running about 500 or 600 nodes cumulatively in open stack that is on-premise. We are in Azure, we are in GCP. And we are hybrid because of the reason that we can't exactly force platform users or data scientists to stick to a particular cloud. They could be using various tech stacks, various programming languages, various IDEs and it has to be a it has to be a one-size-fits-all kind of a solution which we have to provide as a platform. So so how do we save cost when we are actually having a Kubernetes cluster? Right? Think of a use case for example where we have a particular data scientist is actually training model on a two core 5GB notebook or a two core 5GB pod and there is one more data scientist from another team who is working on a four let's say a 96 core 400 GB pod. Right? So how do we cater to these both use cases and the whole range in between? Right? So we can't have a static cluster and it's really not optimal for us to work that way. So which is why we have on-demand cluster autoscaling in all hybrid clouds? Again the direct benefits of these are in cost optimization and apart from it it is also the lesser the number of pods, the lesser number of nodes, the lesser the overhead on various you know ETCD and zookeeper and all of the other control planes and so that also helps for us. Again why we chose Kubernetes autoscaler is again because it already provides native support for various hybrid clouds. So there is a recent support for Magnum as well, OpenStack Magnum and there is also extensive documentation and support for Azure and also GCP. So just before we actually delve into how an autoscaler works and what how we have actually implemented it, we'll show a demo as well down the line. Just a bit on the pod lifecycle. So this is how generally a pod works right? So that someone has a workload through our machine learning UI or through other APIs or batch APIs, someone creates a workload or a pod. Now this if there are no resources for the pod then the pod goes into a pending state and if the pod is in a pending state then the cluster autoscaler piece kicks in. It understands so basically the cluster autoscaler has two parts to it. The first part is the Kubernetes watcher which looks out for pending pods. The other is its integration with the public cloud or the private cloud where it is able to interact and create nodes, delete nodes and so on. So the cluster autoscaler identifies that there is a pod pending. Then it spawns up a new node and the pod then goes on to the running state. So this is the primary premise of how a cluster autoscaler would actually work. But in practice, so what do we essentially do when we have to set up a cluster autoscaler? So first thing is there should be some resource which we have to create in the public or private cloud. That's the first part. Then we have to build some kind of a cluster autoscaler demon which can integrate with Kubernetes and the public cloud. So in a sense, for example, if you take GCP, GCP has this concept of managed instance group. It is the same concept as what you would call in Azure as virtual machine scale sets. You would also call it as node groups in OpenStack. So the basic prem approach remains the same for all but it's called in different terms in each cloud. So we create a managed instance group. So once we create a managed instance group, again instance group is actually a set of nodes, homogeneous nodes of similar capacity. So we could create multiple node groups of four core, eight core, 16, 64, 96 and so on based and this is what we use in our machine learning platform to actually schedule. And once we have created a node group, the next step is actually to create an autoscaler demon. This is running as a pod in our Kubernetes cluster itself in the control plane. So if you look at the YAML, there are some interesting parameters which are passed to the cluster autoscaler. The first is actually the cloud provider. The cloud provider could be GCE or it could be AWS or it could be OpenStack Magnum or it could be AKS and so on. And also we could see that we also have to pass for the cluster autoscaler to actually integrate with the public cloud. We have to pass a key file, a JSON or an API key file. We could store it as a Kubernetes secret. And also one of the other important parameters which we pass is the node group. So we have to pass the location of the node group or the name of the node group. So once we have this and the autoscaler is running, then the question arises that how does cluster autoscaler actually know that which of the node groups it should actually monitor and how does it know that okay I have to scale a four core node group or an eight core node group. The actual magic happens in the labeling. So what happens here is that let's assume I take a scenario where we are creating a pod and the pod has a label of let's say microservice equal to true. Now when the pod gets created there is no such node available with that matching label so the pod gets stuck in pending state. The autoscaler kicks in and the autoscaler which is monitoring not just one node group but many node groups caches the data and it exactly knows that if the pod label matches one of the node labels it is going to scale that particular node group and then it creates a new VM. So and then what happens is the VM then joins the Kubernetes cluster and that's how we get the pod back into running state. Okay so we can just jump into a quick demo and then we could actually discuss in detail more on how it works. Yeah so let's start with the clean slate. So we have no pods in the cluster so again this is just a simple two core a two node cluster. So we start with the simple premise that we have one cluster autoscaler running which is monitoring a node group. Now let's take an example right so we have we have a simple nginx or busybox some kind of a pod which we are running some kind of a service we are running. So if you could look at it so there are two important parameters here one is the replicas we can keep the replicas one the other is the node selector which is microservice equal to true. So with this when we actually apply this YAML now the pod gets created but there is no available node matching that particular label okay so then what happens is that the pod gets stuck in pending and it just keeps stuck there. So then what we do is the autoscaler comes in the autoscaler identifies that there is a particular pending pod and now it goes and hits the GCP APIs right so we could see here this is the node group which we have created it now hits the GCP APIs and it creates a node okay so this is what generally happens generally the spawning time what we have seen in public cloud sees around two to three minutes for the nodes to get spawned up based on sizes so we could also extend this not just to CPUs but also to GPUs so this is also very important for us because our machine platform machine learning platform we do a lot of deep learning use cases as well where GIFS and GPUs are expensive we would want to have autoscaling on that as well now we can actually extend this demo and see you know what if we can actually scale so the pod has so node has just joined the cluster so in a minute or two probably it will yeah it has joined the cluster now and now the pod will actually come into the running state because the label matches the label of the node okay so we can extend this demo now and we can actually see if we can scale up and see what happens so if we can increase the replicas maybe just to some bigger number other than one then again the same process is seen the pods are getting pods get created but they since there are no resources available they would be in pending again the so the cluster auto scalar is a demon which keeps doing this every minute we could set that time but it keeps doing this and it watches for the pending pods and if there are any pending pods it knows exactly which node group to scale so again what happens here we could go back to the browser and see that it takes about a minute or so generally and we could see that it has actually scaled nodes now how does it scale nodes we'll also come into the integrities of this that the auto scalar has a logic in the intelligence inbuilt into it that it just doesn't scale nodes but it actually scales based on certain certain criteria such as cost or size and so on so we'll get into that as well so so now the nodes have actually what created and and it'll take a few minutes for this to again join the cluster yeah there it is and once these nodes are ready they would be actually yeah so now the nodes are ready and these would actually the pods would actually get scheduled on this right yeah so can we have the slides back so the other important aspect of this cluster auto scalar is the monitoring of the auto scalar itself so the cluster auto scalar natively provides Prometheus endpoint where we could actually scrape the metrics now this is really critical for us why because in the machine learning platform we would want to know we are a multi-tenant platform and we would want to know exactly as in it watched what time of the day the nodes are scaling up and when are they scaling down we need to find let's say we can find a pattern out of it and which of which are all of our tenants our customers are actually using this feature more they are actually creating more nodes or you know creating more pods and so on so these are the things which would like to understand and the the metrics endpoint as these gives this entire picture so again the way we could actually scrape these metrics is again as part of the telegraph plugin which we saw before so we could just give this URL in the configuration and we could get all these metrics and push them into influx so it would be that easy now coming to the cluster auto scaling so there is extensive documentation on it maybe we could just show few yeah so this is the extensive documentation on cluster auto scalar so if you could see here there's a ton but the critical one could be what are the parameters the CA so here if you see there are some interesting parameters which we send one is the cloud config which is actually the JSON or the key file and the other is the scale down enabled whether the beauty of the auto scalar is that it could actually scale down we could actually set scale down a needed time so what this parameter what we could do is that if a particular node has been created in the cluster and it is not having any pods schedule for default of say 10 minutes then the node is automatically removed and surrendered back to public cloud or to open stack so with this what happens is we actually save on a lot of cost and and there are also further parameters which we can just go through here so one of them is actually the cloud provider the cloud provider could be GCE AWS there's a new one which has been added I think about a few months back which is open stack Magnum so that also could be explored and there are a few two more parameters which we can actually see so there is a estimator and an expander so what does an expander do right so a cluster auto scalar can monitor multiple node groups so there is a use case for us that we run both critical in a platform we run both critical workloads and non-critical workloads now what would happen is that in GCP at in GCP and Azure there is a concept of spot instances the spot instances are instances which can be taken back anytime by the public cloud and it's as part of the buffer or excess capacity and they come at maybe 10 or 15 percent of the actual cost of the VM imagine they're using this for GPUs it will save a ton of money now what would happen is that we can actually have an auto scalar with two node groups one node group is configured to use spot instances and the other node group is configured to use the actual VMs so we can have an expander property which says okay use always low cost so in that case the the cluster auto scalar will first try to find a compute in the in the spot instances node group and we could actually run non-critical workloads with that so this is pretty much what we had um on the cluster auto scalar and on monitoring so if you have any questions comments I say you want to save some money on the influx db enterprise license so but I didn't see how you synchronize the influx db instances if one one of them goes down and you have to re synchronize the data because this replay or really really I'm sorry really it was just writing to the instance one by one yeah so the that is the HA for that and once we have created a mig like Ravi showed the manage instance group so when the instance goes down it automatically comes up its responsibility of the mig to come back up and the the ip doesn't matter because we have a load balancer and load balancer will keep on uh targeting the new vm okay but how do you sync back that node the data is already there because we are running it as a docker container and the hard disk persistent uh persistent disk is being used so that disk itself is getting attached to the newer so so what happens is relay also has a buffering mechanism okay so let's say there are the relay has to send to let's say two influx dbs but one of the influx dbs down okay so we expect that the influx db which is down would come back within the next minute so that's how we have configured it and the relay would store the data for that minute and then ensure that both the dbs are always in sync and is it a part of the stack as well this relay component it is open source okay yes why not putting the data from Kubernetes by telegraph directly why not putting the data Kubernetes by telegraph directly but from but from Prometheus so Prometheus endpoint is already available for the Kubernetes cluster so what we are doing is we are just scraping that matrix and storing in influx db so telegraph directly has a Prometheus plugin and we just give the URL for that the point is that so we have to cater to a lot of other use cases as well so when we talk about a machine learning platform we have not just Kubernetes but a whole lot of applications around it as well so there could be an NFS network file system there is also a zoo keeper running there is an etcd running and basically we have to monitor other applications as well so it's an ecosystem and Kubernetes is one main part of it okay so we are looking for basically a monitoring framework which which caters to all of this okay not just to Kubernetes alone so Kubernetes provides cube matrix for example right so cube matrix is also drawing from the Prometheus endpoints of API server and controller manager and scheduler scheduler and the cubelet so what we do is we essentially do the same thing but we are using telegraph because we use that for the entire landscape of our platform thank you all for coming