 So, hello everyone, first of all very good morning to all of you and thanks for joining this session. So, we are here to present a session on how to manage known state in OpenStack deployment. In this session we will see the different components of known state, we will see the definition of known state and we will see how we can bring an inconsistent site from an inconsistent state to a consistent state and we will see the different phases in achieving this consistency in the network. So, before doing that let me introduce myself and my team member here. My name is Vinod Yadav and I am working as a Cloud Deployment Engineer for Ericsson under Cloud Platform. I have with me Shashank Gupta who is also working as a Cloud Deployment Engineer for Ericsson under Cloud Platform. So, why we have chosen this topic? We have chosen this topic because as a Cloud Deployment Engineer we have worked on different infrastructures as a deployment and as a support engineer and we have seen the environments where we have for example 10 to 20 different sites deployed and most of this sites have a different level of hardware, different level of software, a different configuration of settings, different kernel bias version and other security level configurations. So, this kind of environments are inconsistent environment and they have many challenges like frequent application crash, they have frequent system outages that can significantly impact the tenants, it can impact the overall stability, functionality and profitability of the business. So, we have also seen the environments where we have for example 10 to 20 different sites deployed and all these sites are on the same level of hardware, software and other configuration settings. So, these are the consistent deployments and they have many advantages over the inconsistent deployments like they are most stable and because of the stability the tenants have a positive impact and that adds up to the overall profitability of the business. So, basically in this session we will try to go through the challenges which we face in an inconsistent deployment and the way forward to bring this inconsistent deployment from inconsistent state to the consistent state. So, what we hope to cover today in this session is first we will go through the definition of known state and then after that we will go through the different components of known state. We will see the different components like hardware, software, configuration, file settings, we see the kernel bias versions and other security level configuration. And then after that we will go through the benefits of known state and then important part to cover today in this session will be how to achieve known state where we will go through the different phases in achieving the known state. And then we will conclude our presentation followed by a question and answer session. So, what is known state? So, known state is basically knowing the stable state of the deployed system. Now, the deployed system consists of many open stack and non open stack nodes and these nodes consist of many components like hardware, they have softwares deployed, they have configuration files and they have kernel bias version, they have firmwares and other security level configuration. So, known state is basically creating a baseline for a stable infrastructure and deploying this infrastructure in the deploying this baseline into the infrastructure making sure that the baseline is implemented into the into the different sites in the infrastructure. So, creating a baseline is a important aspect in achieving known state. So, baseline can be created by taking a snapshot from the deployed site by snapshot I mean gathering the data from the deployed site and then doing an analysis on the on the captured data and then based on the analysis final finalizing the baseline for the deployment. So, known state basically aims in maintaining consistency in the network. A high degree of consistency means that for example, if we have 200 compute nodes for example, deployed in a site. So, a high consistency means that all this deployed compute should be on the same hardware levels, they should have the same software deployed, they have the same configuration file settings and other configuration should be same across all these deployed sites. So, a high consistency in the network avoids mismatch of the features across the infrastructure and it is easy to maintain security compliances as per industry standards. Now, with the high degree of consistency we have a high stability in the network that reduces the overall life cycle management cost of the deployed system. Now, here we will see some difference between the consistent versus inconsistent deployment. So, in this slide if you can see we have two infrastructure, Infra 1 and Infra 2. So, let us see Infra 1, in Infra 1 we have for example, 200 computes deployed. Infra 2 also we have 200 computes deployed, but if you see closely on Infra 1, most of the computes are on a different level. For example, let us say compute 1 it is on version X, compute 2 is on version X, but compute 3 is on version Y and the last compute is on version Z. So, most of the computes are at a different level and this is an inconsistent deployment. So, basically this inconsistency in the network may be caused because of different reason. For example, let us say if a particular compute there is a issue observed in a particular compute and then the fix is implemented only in that particular compute. So, the fix is implemented in a particular compute. So, that compute will be at a different version and the rest of other computes deployed will be at a different version. So, that can cause the inconsistency. There may be scenarios where as a part of continuous integration and continuous deployment, new racks are added to an existing deployment. For example, if a new rack is added and that have for example, 40 or 50 computes compute nodes and it may be possible that the important updates may be missed in the newly added computes. So, in that case it can also cause inconsistency in the network. And this kind of deployments are unstable and they are more security vulnerable. On the other hand, if you see in for two we have the same number of computes deployed, but if you see closely compute one is on version X, compute two is on version X and the last compute is also on version X. So, all these computes are on the same version and this is a perfect example of consistent deployment. So, these type of deployments are more stable, they are more secure, and it puts you in control of your infrastructure. So, in the few couple of slides we will see how we can create a baseline from an infrastructure and how our environment looks with a known state deployed. So, I have taken reference infra here. So, let's see, let's for example, if this infra consists of horizon, keystone, compute, they have controller, database and fault manager. So, in the production environment we may have different other kinds of node deployed. So, just an example I have taken here for infrastructure. So, for creating a baseline we first have to take the snapshot of this infrastructure. By snapshot I mean we have to capture the data from this infrastructure. Now this capturing the data can be done through a different way. For example, we can use different configuration management tools like we can use Ansible, we can use Puppet to capture the data, or we can use the data collection tool like Foreman to collect the data. And once the data is collected, we have to analyze the data. This analysis involves basically recommendation from architects. It involves design and testing recommendation. And also it involves referring to any release nodes, relevant release nodes for the infrastructure. And based on the analysis, we finalize the baseline. And once the baseline is finalized, it is deployed into the network. So, we'll see this in detail in the coming section, where we'll see different phases in achieving known state. So, here you can see we have six different sides. And these all sides are pointing to a stable known state infrastructure. So that means all these sides have same, they are on the same level. And they are pointing to a stable known state infrastructure. So, we have created this stable known state infrastructure from the baseline. So that means all the six sides are pointing to this stable known state. That means we have these sides on the same level. And this is a perfect example for a stable environment and an example for a consistency in the network. With this kind of deployments, it's easy to troubleshoot any problem in the network. It's very easy to locate the issues in the network. And it's very easy to troubleshoot. And it has a higher end user satisfaction. So, let's move to the next section, which is known state components, where we'll be seeing the different components like hardware, software, kernel, bios, and other security level configuration. So I invite my colleague to go through this section. Thank you, Manoj. Good morning, everyone. So the first question arises, what are components and why do we need to know the components? So components are nothing but a part of whole environment. And we need to know the components so that we are clear and specify the requirements. Once we identify the right components, we can identify the changes required, structure the delivery, identify the risk, and plan for the mitigation action, make right decision on the right time, and identify the overall cost for deploying this known state. So let's go through each and every component one by one. So the first component which we included is reference architecture. With reference architecture, we refer to standard nomenclature of the servers so that we can identify why these servers were created, and what is the purpose behind for these servers. So in future, if there are any issues with respect to reference architecture, a troubleshooter can identify the servers by looking at its name. And he can save a lot of time in looking these information outside the documents and can easily troubleshoot the issues. With reference architecture, we also aim to clean up the artifacts. These artifacts might be created during the pre-deployment phase or the deployment phase or during the testing or during the troubleshooting phase. By cleaning up these artifacts, we can fully utilize our cloud environment, and hence, it can improve the performance of our system. The next component which we included is BIOS, BIOS of OpenStack nodes. An inconsistent version of BIOS can leverage to numerous threats to our environment. So it is important that all the BIOS in all the nodes should be at the consistent version so that these risks can be mitigated easily. The third component is kernel. Kernel of OpenStack nodes. Inconsistent kernel version can also impact and can also risk our environment. And it also impacts availability of new features in our environment. Since OpenStack is used globally by system admins, so it is important that the kernel version should be consistent across multiple nodes, across multiple sites. The fourth component which we included is software. Software of OpenStack nodes. It may be NOVA, Neutron, Cinder, Lance, any core packages. This is the base of our environment. So it is important that these should be also included in knownStack, a knownStack project. These software contains bug fixes which is identified by the global system admins globally. They also include new features. So it is important that these software must be updated on all the nodes across multiple sites. The fifth component is configuration. Configuration of OpenStack nodes parameter. It may be related to NOVA, NOVA.conf, Neutron, Neutron.conf, Heat, Cinder, Keystone, everything. This is a very important component for achieving knownStack. During our analysis, we found that system admins or support teams update the parameter of some files where they have tenant issues. But it is important for knownStack that these parameters must be updated in all the nodes so that other servers or other VMs do not face similar issues. The last component which we have included in knownStack is security. Stay away from the danger. It is not only a necessity, but it's a requirement. So why not a cloud infrastructure? Whether it is a compute node, controller, or even a fuel node, these must be updated regularly with the latest security packages as per the company compliances. So summarizing knownStack, what we address with the components here. We address standard nomenclature. We clean up the artifacts which were created during the deployment phase or during the testing. We address the configuration dependencies. We fix the file system space. Maybe it's a var log or root space. We update the kernel version. We update the BIOS version. We remove the inconsistency of package across all nodes. And at the end, we secure our node by node hardening. So let's look what benefits knownStack gives to our environment. It gives us a stable environment and hence making it a profitable growth and improve our ability to mitigate the volatility in demand. It also reduces the issues in our environment and thus reducing a lot of time which is involved in troubleshooting. And thus it saves a lot of cost. After achieving knownStack, it also gives us a better software quality with better speed and better efficiency along with all the quality attributes. knownStack also gives us a global scaling and a superior performance using automation via standard and centralized process. Moving next, we'll see how we can achieve knownStack by keeping important factors in our mind. I'll hand over the slides to Vinod and he will continue in brief details about those. Thank you, Sushant. So how to achieve knownStack? So knownStack can be broadly achieved in we have three phases in achieving the knownStack. The first phase is the pre-analysis phase and then planning and deployment phase. And the last phase is post-analysis and audit. So let's go through the first phase, which is the pre-analysis phase. So in the pre-analysis phase, we identify a stable infrastructure for creating a baseline for the deployment. So once the stable infrastructure is identified, we capture the data from the infrastructure. Now we can capture the data through configuration management tools like Ansible Puppet or we can use the other data collection tools like Foreman. So once we collect the data, we do an analysis and based on the analysis, we finalize the baseline. Now if you see here in the diagram, we have a baseline ready and it's created and it's ready to deploy into an inconsistent site X. So now this inconsistent site X, in the pre-analysis phase, we have to take the data from this inconsistent site as well and we have to do analysis. And once we collect the data, we do the analysis and then we compare the data from the inconsistent site from the baseline and we have the mismatch data between the baseline and the data from the inconsistent site. Now our aim is to implement this mismatch data into an inconsistent site X. So to summarize, in the pre-analysis phase, we create a baseline and we have the mismatch of the data ready to deploy into the inconsistent site. Now we then move to the next phase, which is planning and deployment. So in the planning and deployment phase, we basically do the development and deployment. So in this phase, automation of the baseline is done for the deployment. Now which automation tool to select? It depends upon the organization needs and requirements. They can use any automation tool based on their requirement. So once the development is done, then it's very important to capture the deployment procedure into a procedure document. So during the development and documentation, it is very important to consider any customization, which is already there in the network. And we have to make sure this customization is captured into the procedure document and it should be addressed into the development phase as well. So once the development is done and once the documentation is completed, we go to the deployment phase where we deploy the baseline into the inconsistent site. And then once the baseline is deployed, we do a complete system validation. We have to make sure all our services, all the configuration files, and all the customization are preserved and our services are all up and running. And then so to summarize, in the planning and deployment phase, we have done the development and deployment and we have executed the system validation test. And then finally, we enter into the third phase, which is post analysis and audit. So in this phase, we again capture the data from the deployed site. And then once the data is captured, we do an analysis and then we make sure that the captured data matches with that of baseline. So in that way, we have moved the inconsistent site from an inconsistent state to a consistent state. So we'll see in detail all these three phases in the next coming slide. So let's go to the first phase, which is the baseline creation. So in this step, it's basically a two-step process. So first process is the fact collection, where we collect the data from the deployed site. Now the data may consist of any standard data, which is there, or any customized data, which is present in the deployed site. So we have to consider both this data and we have to collect this data. And once we collect the data, we have to do an analysis on the collected data. So the analysis involves architect recommendation. It involves design and testing recommendation. And we also refer to any relevant release notes from the deployed site. And based on that, we finalize our baseline. So this slide basically mentioned what I just talked. So for creating a baseline, it is required to identify a stable configuration, as per most appropriate end user requirement. And it involves the collection of configuration settings present in all nodes. It involves collection of the standard data, as well as any customization in the network. And it involves recommendation from architects. It involves design recommendation, testing recommendation, and also it involves referring to the release nodes. With that, let's move to the second phase, which is planning and deployment. And I invite Shashank to go through it. So now we have done the pre-analysis and identified the baseline. The next step is to deploy those changes, deploy the node state delta. Since our environment is huge, we have hundreds of compute nodes and multiple sites. So we have to do rollouts in that manner that it should be very speedy. But keeping in mind that we should deliver a quality service with full efficiency. So by automating solution, we cover all these three factors, which is speed, quality, and efficiency. As automation helps us limiting the response time between task and activity, it also speeds up the delivery by making task and activity, consuming less times with lefts or efforts. By automation, also it enables us to easy access to our dashboard with real time data and for fast governance and decision making. Automation directly supports the delivery with high quality services, with higher precision and accuracy rate. With automation, we can limit our amount of rework as compared to the manual deployments. Automation also can adapt to new features easily. And thus, we can shift our focus to value driven activities. Thus, automation is key to accelerate the process execution. So let's see what value does automation gives to known state. It can marginally reduce the cost by saving a lot of time in deployments. Plus, it gives a better software quality with high precision and accuracy. We can easily speed up the delivery in multiple sites using automation. And thus, it empowers us to utilize our full capabilities. So it gives a good margin, a rapid growth, and market protection. But to achieve this known state, we have to choose the right path, that is correct configuration management tools. Here, we are specifying some industry-spended configuration management tools. The first one is Puppet. It's an open-stack source tool. It's a very reliable and scalable tool. Another tool which we can use for automation are Ansible Playbooks. These are quite famous in the open-stack industry. It is also very scalable, easy to operate, grow, and upgrade. With Ansible, it also empowers us to control the deployments in a specific manner. Apart from doing the deployments, doing the changes, we need some other tools for data collection so that we can do a pre-analysis and the post-analysis. We can use tools like Foreman, easily create Rubyfacts, install customized plugins. And this gives us a real-time data. And we can compare the data for multiple sites at the same time. Apart from this, we need some tools to validate our environment. We can use any industry-standard tools like Tempest or anything of your choice. So once we identify our tools, the next step is to deliver the known state delta for various components, whether it is a reference architecture, BIOS and kernel, security, software packages, or customized configuration. At this stage, we address the discrepancies in software in all the sites and update the packages. We also update kernel and BIOS version, fix the server nomenclature with the standard specifications, clean up the artifacts, update the configuration, and making sure that our customization still persists. Once we did all these deployment, the next step is to validate our changes, whether our changes are not breaking the system or it's not introducing any new issues. Once the validation is complete, we move to the final audit. So let's see what phases are there in final audit. In final audit, we can use any configuration management tool like Foreman, because running the audit in manual way will take a lot of time and can have errors. With Foreman, we can easily create Rubyfacts for various things, whether it is packages, whether it is for security, whether it is for kernels. We check whether the deployed system with pre-analysis state and see whether the consistency has been achieved or not. We validate all the customization are in place, all the configurations are updated properly, and the system is secure. So once we done the final analysis and we achieved the known state, now I want to conclude with some recommendations, how can we always be on the known state? So the first recommendation is that we should have a baseline. A golden standard configuration should be set and it should be well documented for future reuse. So whenever a new site in future is rebuilt or any rack is expanded or any site is rebuilt, these standards, these documents should be always referred. And it should be updated regularly when there are changes as per the requirement. The second recommendation which we make is that there should be a proper life cycle management of system updates, whether it is a small update or a large deployment or a security package update or a software update. It should be done with the proper life cycle management process. The rollout should be done in all the nodes. The third recommendation we make is that rollout should be automated using the configuration management tools. It may be of your choice, but it helps achieving the known state properly. The next recommendation which we make is that the deployment should be in a controlled manner. We should identify whether this update is required in our system and it adds a value to us, then only we should plan for its delivery in proper manner and in all nodes across multiple sites. The last recommendation is that we should properly optimize our admin resources so that we can save our time and cost. With this, we conclude our session and we are open for question and answer sessions. Thank you, everyone. Thank you, everyone. If you have any questions later on as well, you can reach us at the email ID here. So thanks again. Thanks for joining this session today. Thank you. Thank you.