 Okay, hello, everybody. My name is Ori and I'm excited to be here to share a success story of my team It all started when we decided to expose our service to new audiences Which presented new challenges that resulted in a lower adoption rate we looked back in the past and We found out a method from the aviation industry almost 100 years old and adopted it to our service Does your service or product depend on the user in on the user's cloud? If so, maybe you can take some key takeaways from the session In the year 1935 a prototype Boeing B 17 crashed in Ohio killing two pilots After that a group of engineers decided to come up with a new method called free flight checklist to ensure the flight Safety, this is performed until today before each flight Free flight checklist can include checking the fuel quantity the baggage weight the air crew documents and more Like airplanes our service is also in the cloud. So we decided to adapt this method I'm a developer in the open shift cluster manager team OpenShift is a Kubernetes based platform that enable developers to deploy their application You can use our managed service in the link above to create clusters with a single API call To deploy more machines to schedule upgrade to get the latest version get metrics about your cluster a CPU sedge and memory usage and more How does it work in a high level the user is making a single API call to create a cluster? Our service is starting validating the cluster spec Running some business logic and if everything goes fine We are continuing to the installation phase triggering the open shift installer And if anything is wrong will return an error to the user they can fix and retry again After 40 minutes if if the installation is successful You can deploy our application on the cluster if it went wrong We get an error and you have to investigate it and to retry and start all the process again Up until here everything worked fine Then we decide to expose our product to new audiences presenting the customer cloud subscription So on the left hand side You can see our previous offering where if you wanted to have a cluster We had a pool of AWS account Pre-configured and ready we would allocate one of them create a cluster and Provide to you access to the AWS cloud and you can manage your cluster No customer cloud subscription has many advantages first This is cost less you have your own customized account. You can bring your own VPC and more But then all of a sudden we are starting to see a lot of installation failures Sometimes the account is not configured or missing quota. I want you to imagine a user Trying to create a cluster and they they are sitting it is moving to the installation phase and they are sitting in front of the monitor Waiting 10 minutes 20 minutes 30 minutes just to find out that the cluster is in an error state How frustrating is that they have now to delete the cluster delete resources in the cloud call the support and start all over again and There is no guarantee that on the next time it is going to be successful Maybe something else would be wrong with the cloud account So this is not a good user experience and probably not a way to attract new customers to our service So what do we do? We go back to the preflight checklist and we are adding a new layer in our service You can see the purple rectangle the preflight layer Where we are making multiple API calls to the AWS cloud of the user And we are verifying that the account is ready to create a cluster like like an airplane It's fuel for a flight a cluster need resources elastic IP easy to instances load balancer Like an air crew need documents and permission our service need permissions to provision a cluster and manage it in the user account So from here we have two options option number one if anything is wrong with the account We return a bed request the user can adjust the AWS account and start over all over again We are preventing an installation failure if everything is fine We are moving to the installation phase with a much higher success rate Let just before that I have to say to be honest. There is a trade of fear first Previously we could return a response to the user In one second, but now we are making multiple API calls and it takes much longer The user can wait up to 10 seconds The second thing. This is not 100 percentage bullet proof because we are checking the user account in a very specific Time like a snapshot So if the account was valid and ready to provision a cluster and after that we move to the installation phase and Something change. We don't have any protection against that Let's see how does it look like from the user perspective For that we are going to use the UI creating a cluster choosing the AWS cloud Here we in the wizard we have the option To configure and the cluster a spec You can one option is to choose next and get the default configuration for the cluster So specifying the AWS account then a set of roles to grant permissions in the AWS account And here you have to add the option To click next and get to the default spec So we're going quickly through this and then at the last step We're going to make an API call to create the cluster choosing the cloud region the version Setting the machine pools and the network configuration and so on In the last stage we are going to see the cluster a spec a summary And then clicking create cluster making API call and now the preflex checks are running in the background so here in this case We can see you need the last one available elastic IP address to create your cluster We have just prevented an installation failure. The user can jump to the AWS console here in service quota. They can make a request to increase the quota. This is one option Making this request the other option is to release redundant resources that they don't need in this case We are going to release one elastic IP and after that they are going to have enough available quota to provision a cluster Going back to the redhead console creating a cluster Now again, the preflights are going to run in the background once all of them pass we can move to the installation phase A closer look at that How do we get a quota? We have the applied quota in a specific region in this case then we're calling get service quota to get the Limit in a specific region then for the utilization. We're calling describe addresses with the EC2 Service once we have the utilization and the limit we can calculate the available quota and and to ensure this is enough to provision a cluster We have multiple preflights for quota for authorization in the user account We are also supporting bring your own VPC and Validating the configuration for that To summarize we decided to expose our service to new customers it introduced a new challenge We combined an old aviation method with the With the cloud SDK it increased the product adoption and The success rate and you can find our open source repo Rosa This is the CLI tool that you are using to provision cluster and you are welcome to contribute And This is it. Thank you and a great time for questions. Okay. Yeah So okay and the question was you asked if it works from the console and if it will work for from marketplaces in the cloud So specifically when you saw the UI our back end with the preflights are in the back end So if this is from the UI or from our CLI tools, all of them are making API calls to the same back end and When you create a cluster trigger the cluster creation we run the preflights It's in the back end and it works for the CLI and for the UI Yeah, okay, thanks a lot