 So hello, good morning, everyone. Good morning, welcome to the Open Infra Summit. So I'm Nisha. Meet my colleague Bharatwaj and Vishwajit. So we are here from Tokyo. Both of us work at Line Corporation and my colleague Vishwajit works at KPMG. So our talk today is about beginners. So apart from coming from Tokyo, one thing that is common between three of us is that we have just started our career. We are new grads and one year into our cloud career. So this is literally a beginner's journey of one year, of operating production-level private cloud using OpenStack and managed Kubernetes. So let's start. So this is the topic overview. I start with a beginner's view followed by Line's cloud and our first task, our work challenges, the current project that we are working on, and a little bit about Kubernetes. So if I start from very basic, so what is cloud computing? I think all of you are aware about it. So just a slide about cloud computing and the perks because of which we are sitting here today. So this is the layers of cloud computing. And I think I did not need to mention that OpenStack and Kubernetes lie at the very bottom level, that is the IAS level. And there's a simple analogy about the lease, car, taxi, bus. It's fun to think about layers of cloud computing in that way. So yeah, what is OpenStack? So OpenStack is a tool to manage and create smaller large cloud, public or private cloud. And this explains the scale of OpenStack and why all of you here are in OpenInfo Summit and majorly focusing on OpenStack today. So these are a few of major OpenStack components that we also work on. And let's get to the more good part. So what is WordA? WordA is the name of Lyme's private cloud. So Lyme is basically where me and Bharadwaj work. And we work in the cloud team, which is the WordA team. So WordA provides managed services. We have a few of them. To give you a rough overview of the scale of WordA, so these are the number of physical servers, bare metal servers, hypervisors, and virtual machines at WordA as of Jan 2023. I think that's pretty much. Yeah, so this explains the resource provisioning at WordA. So our end users are basically Lyme developers. And we are there at the infra team. So let me dive deeper into our first task, is the personal WordA. So as beginners, when we started, we wanted some playground to test and work around with OpenStack. So there were multiple options, for example, DevStack. But it comes with its pros and cons. So at Lyme, we have personal WordA. And what is personal WordA is basically we create our personal OpenStack cluster using Ansible. And if you look closely at the diagram, it's basically the Dev cluster hypervisors. There are VMs on top of these hypervisors, which we configure as personal cluster hypervisors. And then we use this environment as our personal WordA to create and play around with VMs. So it's fun to think that it's basically a cluster on top of a cluster. And that is how we started. Apart from this first playground, we use personal WordA for personal test environment before we move on to the actual staging and production environments. So that's pretty much about personal WordA. Next, we dive more deeper into the work we have done over the past few months. So I will talk a little bit about NOVA. NOVA is one of the OpenStack components. And these are two features that we contributed, one of them being the user script feature for PM. So PM is a physical machine. It's a line's original NOVA bare metal driver. And the user script feature actually OpenStack provides it for VMs. It is that you can, as you can see in command number one, you can provide a user script, which will be run at the VM boot up. So we exactly do this feature for the physical machines. And the second one is scheduling VM on specified aggregates with the specified aggregate. As you can see in the command number B, we can schedule a VM in the specified aggregate that is achieved by adding a new filter. And yeah, so it matches the key value pair. So that is pretty much about NOVA. Next, I would like to hand over the mic to my colleague Baharudwaj to explain more details. Thank you, Nisha. I would like to continue on the same topic of the work that we have done at line. So a major part of our time has been dedicated to containerizing various services in OpenStack. As we all know that many of these services, they run parallelly on the same hypervisor, be it the controller node, be it the controller node or the compute nodes. And with many applications on the services on the same hypervisors, we know that it can cause many such problems, like the package dependency and many other such problems. So in line, we have managed to containerize our Neutron agent that we call as L2Isolate agent. And we have used Podman for the same. So that has enabled us to use different packages, versions independently. So that's one part of it. I would like to move on to the next section, which are the challenges. This is a bit more interesting. As beginners, of course, we all face a lot of challenges when we start working with any tool for that matter. So within OpenStack as well, firstly and foremost, there are a lot of open RC files, as you all must be aware. And same as the case with us. And switching between many of them could be difficult. And at times, it takes times as well. So there are a few tricks, which are basic bash RC scripts and similar shell commands. But it does make your life a lot easier using them. Moving on, the next set of challenges we face at our company and in our team is when we try to deploy our changes to the large number of hypervisors that we have in our company, in our cloud. Usually, it takes a few hours for any small changes to deploy to all our compute nodes. So to make it, of course, we cannot completely eliminate the time factor. But we can always reduce the time to a certain extent. And we have used some steps, like using smaller playbooks in Ansible or Handlers instead of running the entire task file or dividing the set of compute hosts that we have into smaller subgroups and using them in the Ansible host file. Moving on to a bit more technical challenges that usually we face in our day-to-day life. One such example, an error like fail to allocate network, which you might usually see when you try to do an OS server show. Usually, for such errors, we try to get the tab device ID of the VM, of the interface for the VM. And you can use the following command, the Newton port list command. And from there, you can get the tab device ID and you can always go on to the compute node and grab the tab ID and check for more details. So once you have the tab device ID ready, you can move on to different scenarios, which could be the possible reason behind the error that you're seeing. So one such could be that the NOVA API is failing to receive any interface attachment info from the Newton server. Usually in such cases, the messaging driver, be it RabbitMQ or any other driver that you're using, usually it has stopped working or it is not working as intended. So always check that. And the next scenario could be that the tab interface that is usually created hasn't been created on the compute node. So you can do the following checks on the compute node, like check for the neutron agent status and search and search. Another similar issue which is more or less, is more common is like the IP is assigned to the VM but it is not usually pingable. In such cases, usually the error lies at the DHCP level. The DHCP server like DNS mask could have stopped on the compute node. So you can also use TCP dump to check the communication between the VM and the DHCP server. Also another problem could be with the security group rules. Usually we always forget to check these. So that could be one of the reasons. And here are some of the tools which are more or less commonly known, which could be used. Moving on to the current work that we are doing. So we have decided to upgrade from the older version that we were using in our cluster to a newer Z version and for many understandable reasons. To achieve the same, we have set up a test cluster with the newer version, Z version, before actually implementing in the production cluster. And for the same, there are some other steps which are usually followed in such upgrade tasks. So we take the upstream Z code and we add it to our patches and our code, right? And we make sure that it passes all the unit tests and functional tests in our local environment. And using the modified code, we install the various open-stack components in all the respective hypervisors, the compute nodes or the control nodes or the network nodes. And also modify the configuration files as Z version has its own set of configuration files. And then make sure that the end-to-end functionality works. Like this was provisioning works. Yeah, the challenges that we faced during this project, there were a lot many, but these are some of the prominent ones. Like many of the configurations have been deprecated from the older version to the newer versions like Z. Also, a lot of newer packages like the following are required on the compute nodes which have to be installed either by yourself manually or by any deployment tool. And also the Linux version had to be upgraded to more recent versions to enable many of the network features that the newer Z version offers. Yeah, so that's more or less the challenges, a few of them, of the many that we had faced. Moving on, I would now request my friend, Vishwajit Kumar to take on. Hi, yeah, so my name is Vishwajit and I work at KPMG Ignition Tokyo. Today I will be sharing my learnings of Azure Kubernetes service because I use it in my daily work and how my team uses its features. So some prominent features of Kubernetes service are that it has a fully managed control plane, so we don't need to manage and orchestrate the control plane itself. We can schedule auto repair and auto upgrade for the nodes so we don't need to take care of the underlying nodes infrastructure. It has inbuilt support for monitoring and SIEM solution offered by Azure. AKS has is deeply integrated with managed databases and storage services. AKS has easy network setup with Azure application gateway, access control list and firewall. AKS is also deeply integrated with Azure Active Directory and it provides identity and access management controls. So now how do we use it in our infrastructure team? So my platform team is ISO 27001 and 27017 certified because we have audit and some very high confidential data. So multiple teams build their containerized application on top of our platform. So we use a hub and spoke model for networking. This allows us to enforce network policies platform wide and gives us more visibility into how the network flow is happening for each of the applications. Data security was kept in mind while building the platform and there is a deep level of data segregation for each of the applications that run on the platform. So we maintain this platform as infrastructure as code. It helps us to deploy and provision a new environment within an hour and this helps our developers to fast iterate and prototype. So using Azure, we were able to solve data residential issues where we were able to keep the data in the region where the application was running. So this is from my side and that's all from our presentation. If anyone has any questions, we have one minute left I guess. So that's all, thank you.